The post NVIDIA NCCL 2.28 Revolutionizes GPU Communication with New Device API appeared on BitcoinEthereumNews.com. Rebeca Moen Nov 10, 2025 23:56 NVIDIA’s latest NCCL 2.28 release introduces a device API, enhancing communication and computation fusion for GPU networks, boosting performance and efficiency. The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, NCCL 2.28, a significant leap forward in GPU communication technology. This update focuses on the fusion of communication and computation, aiming to enhance throughput, reduce latency, and maximize GPU utilization across multi-GPU and multi-node systems, according to NVIDIA. Key Features of NCCL 2.28 NCCL 2.28 brings several new features, including GPU-initiated networking, device APIs for communication-compute fusion, and copy-engine-based collectives. These innovations are designed to empower developers to create efficient, scalable distributed applications. The release also includes expanded APIs, improved tooling, and cleaner integration paths, facilitating the development of custom communication kernels. Device API and Copy Engine Collectives The new device API allows for the development of custom device kernels that integrate communication within NVIDIA CUDA kernels, removing the need for host-initiated operations. This integration reduces synchronization overhead, thus increasing throughput and reducing latency. Three operation modes are introduced: Load/Store Accessible (LSA), Multimem, and GPU Initiated Networking (GIN), each supporting different communication scenarios. Moreover, the copy engine-based collectives enable efficient NVLink transfers by offloading communication tasks from streaming multiprocessors (SMs) to dedicated hardware. This approach minimizes resource contention, allowing simultaneous execution of communication and computation tasks. NCCL Inspector for Enhanced Profiling The NCCL Inspector, a new profiling tool, provides always-on observability and analysis of NCCL communication patterns. It offers detailed performance and metadata logging, allowing developers to analyze and debug collective operations efficiently. The plugin tracks each NCCL communicator individually, offering insights into performance patterns across different communication contexts. Developer Experience Improvements NCCL 2.28 enhances the developer experience with new APIs for operations like… The post NVIDIA NCCL 2.28 Revolutionizes GPU Communication with New Device API appeared on BitcoinEthereumNews.com. Rebeca Moen Nov 10, 2025 23:56 NVIDIA’s latest NCCL 2.28 release introduces a device API, enhancing communication and computation fusion for GPU networks, boosting performance and efficiency. The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, NCCL 2.28, a significant leap forward in GPU communication technology. This update focuses on the fusion of communication and computation, aiming to enhance throughput, reduce latency, and maximize GPU utilization across multi-GPU and multi-node systems, according to NVIDIA. Key Features of NCCL 2.28 NCCL 2.28 brings several new features, including GPU-initiated networking, device APIs for communication-compute fusion, and copy-engine-based collectives. These innovations are designed to empower developers to create efficient, scalable distributed applications. The release also includes expanded APIs, improved tooling, and cleaner integration paths, facilitating the development of custom communication kernels. Device API and Copy Engine Collectives The new device API allows for the development of custom device kernels that integrate communication within NVIDIA CUDA kernels, removing the need for host-initiated operations. This integration reduces synchronization overhead, thus increasing throughput and reducing latency. Three operation modes are introduced: Load/Store Accessible (LSA), Multimem, and GPU Initiated Networking (GIN), each supporting different communication scenarios. Moreover, the copy engine-based collectives enable efficient NVLink transfers by offloading communication tasks from streaming multiprocessors (SMs) to dedicated hardware. This approach minimizes resource contention, allowing simultaneous execution of communication and computation tasks. NCCL Inspector for Enhanced Profiling The NCCL Inspector, a new profiling tool, provides always-on observability and analysis of NCCL communication patterns. It offers detailed performance and metadata logging, allowing developers to analyze and debug collective operations efficiently. The plugin tracks each NCCL communicator individually, offering insights into performance patterns across different communication contexts. Developer Experience Improvements NCCL 2.28 enhances the developer experience with new APIs for operations like…

NVIDIA NCCL 2.28 Revolutionizes GPU Communication with New Device API

2025/11/12 08:01
2분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다


Rebeca Moen
Nov 10, 2025 23:56

NVIDIA’s latest NCCL 2.28 release introduces a device API, enhancing communication and computation fusion for GPU networks, boosting performance and efficiency.

The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, NCCL 2.28, a significant leap forward in GPU communication technology. This update focuses on the fusion of communication and computation, aiming to enhance throughput, reduce latency, and maximize GPU utilization across multi-GPU and multi-node systems, according to NVIDIA.

Key Features of NCCL 2.28

NCCL 2.28 brings several new features, including GPU-initiated networking, device APIs for communication-compute fusion, and copy-engine-based collectives. These innovations are designed to empower developers to create efficient, scalable distributed applications. The release also includes expanded APIs, improved tooling, and cleaner integration paths, facilitating the development of custom communication kernels.

Device API and Copy Engine Collectives

The new device API allows for the development of custom device kernels that integrate communication within NVIDIA CUDA kernels, removing the need for host-initiated operations. This integration reduces synchronization overhead, thus increasing throughput and reducing latency. Three operation modes are introduced: Load/Store Accessible (LSA), Multimem, and GPU Initiated Networking (GIN), each supporting different communication scenarios.

Moreover, the copy engine-based collectives enable efficient NVLink transfers by offloading communication tasks from streaming multiprocessors (SMs) to dedicated hardware. This approach minimizes resource contention, allowing simultaneous execution of communication and computation tasks.

NCCL Inspector for Enhanced Profiling

The NCCL Inspector, a new profiling tool, provides always-on observability and analysis of NCCL communication patterns. It offers detailed performance and metadata logging, allowing developers to analyze and debug collective operations efficiently. The plugin tracks each NCCL communicator individually, offering insights into performance patterns across different communication contexts.

Developer Experience Improvements

NCCL 2.28 enhances the developer experience with new APIs for operations like AllToAll, Gather, and Scatter. It introduces flexible configuration management through an environment plugin API, facilitating programmatic version matching and configuration storage agnostic setups. Additionally, the release supports CMake for Linux builds, streamlining integration into larger build pipelines.

For further details on NCCL 2.28 and its features, visit the official NVIDIA blog.

Image source: Shutterstock

Source: https://blockchain.news/news/nvidia-nccl-2-28-revolutionizes-gpu-communication

시장 기회
NodeAI 로고
NodeAI 가격(GPU)
$0.02288
$0.02288$0.02288
+1.55%
USD
NodeAI (GPU) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!