Exam NCA-GENM Topic 1 Question 20 Discussion

Actual exam question for NVIDIA's NCA-GENM exam
Question #: 20
Topic #: 1

Assume you have trained a text-to-image diffusion model using a large dataset of landscape photographs. You now want to adapt this model to generate images of photorealistic portraits. Which of the following fine-tuning strategies is most likely to yield the best results with the least amount of training data and time?

A. Retrain the entire diffusion model from scratch using the portrait dataset. B. Fine-tune only the CLIP model with portrait-related text descriptions and corresponding images. C. Fine-tune the IJ-Net architecture of the diffusion model while keeping the CLIP model fixed. D. Fine-tune both the CLIP model and the U-Net architecture with the portrait dataset, using a smaller learning rate than the initial training. E. Only fine tune the final layer of the IJ-Net model with the portrait dataset.

Suggested Answer: D Vote an answer

Fine-tuning both the CLIP model and the IJ-Net architecture is the most effective approach. The CLIP model needs to learn the semantic relationship between portrait-related text and images, and the U-Net needs to adapt to generating portraits instead of landscapes. Using a smaller learning rate prevents overfitting and allows the model to leverage its existing knowledge from the landscape dataset. Retraining from scratch is wasteful, and fine-tuning only one component may not be sufficient for good performance. Simply fine-tuning the last layer will not change much.

by Godfery at Nov 04, 2025, 06:09 PM

Limited Time Offer

15%

Off

Get Premium NCA-GENM Questions as Interactive Self Test Engine or PDF

Comments

0 Happy Clients

0 Shares

0 Demo Downloads

10 Years in Business