Exam NCA-GENM Topic 1 Question 375 Discussion
Actual exam question for NVIDIA's NCA-GENM exam
Question #: 375
Topic #: 1
Question #: 375
Topic #: 1
You're training a multimodal model for image and text retrieval. Given an image, the model should retrieve the most relevant text description from a database, and vice-vers a. You're using a dual-encoder architecture, where one encoder processes images and the other processes text, projecting them into a shared embedding space. What is the most effective way to train the model to ensure that semantically similar images and texts have close embeddings, while dissimilar ones have distant embeddings?
Suggested Answer: B Vote an answer
Contrastive loss functions are specifically designed for learning embeddings where similarity is defined by distance. They directly encourage similar items to be close and dissimilar items to be far apart. Independent training doesn't enforce the multimodal relationship. Reconstruction loss focuses on regenerating the input, not similarity. Adversarial training aims for indistinguishability, not meaningful embeddings. L1 Loss is a basic distance metric but less effective than contrastive losses for learning semantic similarity
by Eileen at Oct 13, 2025, 10:43 PM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).