NVIDIA improves Linux NUMA distance interface to improve performance

NVIDIA engineers worked on NUMA distance metrics in the Linux kernel to replace the simple local/remote NUMA preference interface currently used by some drivers for NUMA-aware memory allocations. In their testing, this improvement in NUMA distance handling has “significant performance implications” for throughput and CPU utilization.

This work by NVIDIA is not part of their graphics driver effort but rather the Mellanox networking side of the house. Tariq Toukan summed it up in the latest release of these kernel patches:

Implement and expose the CPU propagation API based on the scheduler’s sched_numa_find_closest(). Use it in mlx5 and enic device drivers. This replaces the binary NUMA preference (local/remote) with an improved preference that takes actual distances into account, so that short distance remote NUMAs are preferred over farther NUMAs.

This has significant performance implications when using NUMA-compliant memory allocations, improving throughput and CPU utilization.

So far, results with Mellanox and ENIC network drivers seem very good based on NVIDIA’s tests on AMD EPYC 7763 servers:

See this series of patches for more details.