Is there a way to merge two images into one and then input it into an LLM's vision module (in a format like file that can be accepted)?

Is there a way to merge two images into one image and then input it into an LLM’s visual recognition analysis (in a format such as file that can be accepted as input)?