Exam NCA-GENL Topic 6 Question 3 Discussion

Actual exam question for NVIDIA's NCA-GENL exam
Question #: 3
Topic #: 6
Imagine you are training an LLM consisting of billions of parameters and your training dataset is significantly larger than the available RAM in your system. Which of the following would be an alternative?

Suggested Answer: B Vote an answer

When training an LLM with a dataset larger than available RAM, using a memory-mapped file is an effective alternative, as discussed in NVIDIA's Generative AI and LLMs course. Memory-mapped files allow the system to access portions of the dataset directly from disk without loading the entire dataset into RAM, enabling efficient handling of large datasets. This approach leverages virtual memory to map file contents to memory, reducing memory bottlenecks. Option A is incorrect, as moving large datasets in and out of GPU memory via PCI bandwidth is inefficient and not a standard practice for dataset storage. Option C is wrong, as discarding data reduces model quality and is not a scalable solution. Option D is inaccurate, as eliminating semantically equivalent sentences is a specific preprocessing step that does not address memory constraints.
The course states: "Memory-mapped files enable efficient training of LLMs on large datasets by accessing data from disk without loading it fully into RAM, overcoming memory limitations." References: NVIDIA Building Transformer-Based Natural Language Processing Applications course; NVIDIA Introduction to Transformer-Based Natural Language Processing.

by Alice at Feb 06, 2026, 11:08 AM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10