For an NVIDIA Enterprise AI Factory with 256 GPUs, which storage solution characteristic is most critical to validate during scaling tests?
A systems administrator is preparing a new DGX server for deployment. What is the most secure approach to configuring the BMC port during initial setup?
An administrator is configuring node categories in BCM for a DGX BasePOD cluster. They need to group all NVIDIA DGX H200 nodes under a dedicated category for GPU-accelerated workloads. Which approach aligns with NVIDIA ' s recommended BCM practices?
An administrator installs NVIDIA GPU drivers on a DGX H100 system with UEFI Secure Boot enabled. After reboot, the drivers fail to load. What is the first action to resolve this issue?
After a firmware upgrade on a DGX H100, the administrator notices that one GPU is not detected by the system. Which troubleshooting step should be performed first to identify the root cause?
During HPL execution on a DGX cluster, the benchmark fails with " not enough memory " errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?
An infrastructure engineer runs an NCCL burn-in on an eight-node GPU cluster. Over a 12-hour period, all GPUs are tested with repeated all-reduce collectives. Monitoring tools show the following observations:
Aggregate bandwidth remains within 5% of documented reference for the hardware on every run.
No errors or timeouts are reported in NCCL logs.
On three occasions, one GPU logged single-run bandwidth dips of 15–20% compared to its normal performance, but performance recovered on the next run and stayed stable afterward. System logs show no hardware or driver errors.
Two minor NCCL WARN-level messages about “unexpected latency spike” appear in system logs for separate nodes, but could not be reproduced.
Which conclusion is the best strategy before releasing the cluster to production?
A DGX H100 system shows intermittent “Link Down” errors on a 200G DAC cable. CVT reports “No Signal” despite physical connection. What is the first hardware check?
A system administrator wants to configure MIG for seven slices on an H100 GPU in an NVIDIA HGX system. Which command should be used?
A user wants to restrict a Docker container to use only GPUs 0 and 2. Which command achieves this?
TESTED 27 Jun 2026