Exam NCA-GENL Topic 2 Question 3 Discussion
Actual exam question for NVIDIA's NCA-GENL exam
Question #: 3
Topic #: 2
Question #: 3
Topic #: 2
In transformer-based LLMs, how does the use of multi-head attention improve model performance compared to single-head attention, particularly for complex NLP tasks?
Suggested Answer: B Vote an answer
Multi-head attention, a core component of the transformer architecture, improves model performance by allowing the model to attend to multiple aspects of the input sequence simultaneously. Each attention head learns to focus on different relationships (e.g., syntactic, semantic) in the input, capturing diverse contextual dependencies. According to "Attention is All You Need" (Vaswani et al., 2017) and NVIDIA's NeMo documentation, multi-head attention enhances the expressive power of transformers, making them highly effective for complex NLP tasks like translation or question-answering. Option A is incorrect, as multi-head attention increases memory usage. Option C is false, as positional encodings are still required. Option D is wrong, asmulti-head attention adds parameters.
References:
Vaswani, A., et al. (2017). "Attention is All You Need."
NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/intro.html
References:
Vaswani, A., et al. (2017). "Attention is All You Need."
NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/intro.html
by Ian at Oct 09, 2025, 05:47 PM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).