Exam Generative-AI-Leader Topic 2 Question 15 Discussion

Actual exam question for Google's Generative-AI-Leader exam
Question #: 15
Topic #: 2
A travel app asks users to take a photo of a famous landmark and then returns a written overview with historical notes and nearby attractions. The system's capability to interpret the picture and produce natural language output reflects what kind of model?

Suggested Answer: B Vote an answer

This scenario requires understanding visual content from a photo and then generating a textual explanation. That means the system consumes one modality as an image and produces another modality as text. This cross-modality capability is exactly what a multimodal approach provides, since it jointly handles vision and language to produce coherent natural language output based on visual input.

by Morgan at Jun 23, 2026, 10:45 PM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10