Exam NCA-GENM Topic 1 Question 20 Discussion
Actual exam question for NVIDIA's NCA-GENM exam
Question #: 20
Topic #: 1
Question #: 20
Topic #: 1
Assume you have trained a text-to-image diffusion model using a large dataset of landscape photographs. You now want to adapt this model to generate images of photorealistic portraits. Which of the following fine-tuning strategies is most likely to yield the best results with the least amount of training data and time?
Suggested Answer: D Vote an answer
Fine-tuning both the CLIP model and the IJ-Net architecture is the most effective approach. The CLIP model needs to learn the semantic relationship between portrait-related text and images, and the U-Net needs to adapt to generating portraits instead of landscapes. Using a smaller learning rate prevents overfitting and allows the model to leverage its existing knowledge from the landscape dataset. Retraining from scratch is wasteful, and fine-tuning only one component may not be sufficient for good performance. Simply fine-tuning the last layer will not change much.
by Godfery at Nov 04, 2025, 06:09 PM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).