Exam Generative-AI-Leader Topic 2 Question 15 Discussion
Actual exam question for Google's Generative-AI-Leader exam
Question #: 15
Topic #: 2
Question #: 15
Topic #: 2
A travel app asks users to take a photo of a famous landmark and then returns a written overview with historical notes and nearby attractions. The system's capability to interpret the picture and produce natural language output reflects what kind of model?
Suggested Answer: B Vote an answer
This scenario requires understanding visual content from a photo and then generating a textual explanation. That means the system consumes one modality as an image and produces another modality as text. This cross-modality capability is exactly what a multimodal approach provides, since it jointly handles vision and language to produce coherent natural language output based on visual input.
by Morgan at Jun 23, 2026, 10:45 PM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).