Exam NCP-AII Topic 1 Question 59 Discussion
Actual exam question for NVIDIA's NCP-AII exam
Question #: 59
Topic #: 1
Question #: 59
Topic #: 1
A distributed training job using multiple nodes, each with eight NVIDIA GPUs, experiences significant performance degradation. You notice that the network bandwidth between nodes is consistently near its maximum capacity. However, 'nvidia-smi' shows low GPU utilization on some nodes. What is the MOST likely cause?
Suggested Answer: B Vote an answer
High network bandwidth utilization combined with low GPU utilization on some nodes strongly suggests a data imbalance. Some nodes are likely waiting for data from other nodes, causing them to be idle while the network is saturated. This is a common problem in distributed training and requires addressing the data distribution strategy. While other factors (overheating, outdated drivers, faulty NICs, CPU load) could contribute, they are less likely to be the primary cause given the observed symptoms.
by Otis at Oct 28, 2025, 02:33 AM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).