Exam NCP-AIO Topic 3 Question 60 Discussion
Actual exam question for NVIDIA's NCP-AIO exam
Question #: 60
Topic #: 3
Question #: 60
Topic #: 3
You are troubleshooting a performance bottleneck in a distributed training job using NCCL. You suspect the network is the issue. Which Magnum IO component is MOST relevant to investigate first?
Suggested Answer: B Vote an answer
GPUDirect RDMA allows GPUs to directly access network adapters, bypassing the CPU and reducing latency for inter-GPU communication, which is crucial for NCCL-based distributed training. Therefore, it's the most relevant component to investigate for network-related bottlenecks. NVSHMEM is more related to shared memory programming. CUDA-Aware MPI handles inter-process communication, but GPUDirect RDMA directly affects the network path. GPU Affinity ensures processes run on the correct GPUs but doesn't directly address network performance. Storage Direct helps bypass the CPU for data access, not inter-GPU communication.
by Una at Jan 14, 2026, 12:57 AM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).