NVIDIA's new Hybrid-EP communication library achieves up to 14% faster training for DeepSeek-V3 and other MoE models on Grace Blackwell hardware. (Read More)NVIDIA's new Hybrid-EP communication library achieves up to 14% faster training for DeepSeek-V3 and other MoE models on Grace Blackwell hardware. (Read More)

NVIDIA Hybrid-EP Slashes MoE AI Training Communication Overhead by 14%

2026/02/03 03:39
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

NVIDIA Hybrid-EP Slashes MoE AI Training Communication Overhead by 14%

Alvin Lang Feb 02, 2026 19:39

NVIDIA's new Hybrid-EP communication library achieves up to 14% faster training for DeepSeek-V3 and other MoE models on Grace Blackwell hardware.

NVIDIA Hybrid-EP Slashes MoE AI Training Communication Overhead by 14%

NVIDIA has released Hybrid-EP, a communication optimization library that delivers up to 14% faster training speeds for large-scale Mixture-of-Experts AI models—the architecture behind DeepSeek-V3 and other frontier systems driving the current AI infrastructure buildout.

The technical breakthrough, detailed February 2, 2026, addresses what's become a critical bottleneck in training hyperscale MoE models: communication overhead that can consume more than 50% of total training time. For companies racing to train competitive AI models, that's expensive GPU time sitting idle.

Why This Matters for AI Infrastructure

MoE architectures have emerged as the dominant approach for building massive AI models efficiently. Rather than activating every parameter for each input, these models route tokens to specialized "expert" subnetworks—typically activating only 8 out of 256 experts per token in systems like DeepSeek-V3. The catch? All that routing requires constant communication between GPUs.

Expert Parallelism distributes these experts across multiple GPUs, but the all-to-all communication pattern creates serious overhead. Tokens must be dispatched to correct experts, processed, then routed back—a process that's been notoriously difficult to optimize due to its dynamic, sparse nature.

Performance Numbers

NVIDIA's benchmarks on Grace Blackwell hardware show meaningful gains across multiple model configurations:

DeepSeek-V3 with 256 experts achieved 943 TFLOPS per GPU using Hybrid-EP, compared to 829 TFLOPS with the previous DeepEP implementation—a 14% improvement. The Qwen 3 235B model saw 9.9% gains when running MXFP8 precision, jumping from 728 to 800 TFLOPS.

Perhaps more significant than raw throughput: Hybrid-EP achieves near-maximum NVLink bandwidth using only 4 streaming multiprocessors, compared to the typical resource consumption of standard implementations. On the GB200NVL36 configuration, it fills NVLink bandwidth with just 16 SMs. That leaves substantially more GPU compute available for actual model training rather than communication overhead.

Technical Architecture

The library implements two core operators—dispatch and combine—that handle token routing between attention layers and expert networks. It leverages NVIDIA's IBGDA technology for RDMA networks and TMA commands for NVLink communication, combining intra-node and inter-node bandwidth into a hierarchical pipeline.

Each CUDA block operates as an independent data channel, processing chunks through multiple pipeline stages without cross-block synchronization. This design masks most communication latency through overlapping data transfers with computation.

Availability and Integration

Hybrid-EP is now available in the DeepEP/Hybrid-EP branch on GitHub, with PyTorch operators ready for integration into existing Megatron Core training pipelines. The implementation uses a worst-case buffer preallocation strategy to handle the dynamic token routing inherent to MoE models.

For AI infrastructure investors and operators, the release signals continued optimization headroom in training efficiency—particularly relevant as competition intensifies around training costs for frontier models. The 8-14% efficiency gains translate directly to reduced compute costs and faster iteration cycles for labs pushing model capabilities.

Image source: Shutterstock
  • nvidia
  • ai training
  • moe models
  • deepseek-v3
  • gpu optimization
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

A Netflix ‘KPop Demon Hunters’ Short Film Has Been Rated For Release

A Netflix ‘KPop Demon Hunters’ Short Film Has Been Rated For Release

The post A Netflix ‘KPop Demon Hunters’ Short Film Has Been Rated For Release appeared on BitcoinEthereumNews.com. KPop Demon Hunters Netflix Everyone has wondered what may be the next step for KPop Demon Hunters as an IP, given its record-breaking success on Netflix. Now, the answer may be something exactly no one predicted. According to a new filing with the MPA, something called Debut: A KPop Demon Hunters Story has been rated PG by the ratings body. It’s listed alongside some other films, and this is obviously something that has not been publicly announced. A short film could be well, very short, a few minutes, and likely no more than ten. Even that might be pushing it. Using say, Pixar shorts as a reference, most are between 4 and 8 minutes. The original movie is an hour and 36 minutes. The “Debut” in the title indicates some sort of flashback, perhaps to when HUNTR/X first arrived on the scene before they blew up. Previously, director Maggie Kang has commented about how there were more backstory components that were supposed to be in the film that were cut, but hinted those could be explored in a sequel. But perhaps some may be put into a short here. I very much doubt those scenes were fully produced and simply cut, but perhaps they were finished up for this short film here. When would Debut: KPop Demon Hunters theoretically arrive? I’m not sure the other films on the list are much help. Dead of Winter is out in less than two weeks. Mother Mary does not have a release date. Ne Zha 2 came out earlier this year. I’ve only seen news stories saying The Perfect Gamble was supposed to come out in Q1 2025, but I’ve seen no evidence that it actually has. KPop Demon Hunters Netflix It could be sooner rather than later as Netflix looks to capitalize…
Share
BitcoinEthereumNews2025/09/18 02:23
GBP trades firmly against US Dollar

GBP trades firmly against US Dollar

The post GBP trades firmly against US Dollar appeared on BitcoinEthereumNews.com. Pound Sterling trades firmly against US Dollar ahead of Fed’s policy outcome The Pound Sterling (GBP) clings to Tuesday’s gains near 1.3640 against the US Dollar (USD) during the European trading session on Wednesday. The GBP/USD pair holds onto gains as the US Dollar remains on the back foot amid firm expectations that the Federal Reserve (Fed) will cut interest rates in the monetary policy announcement at 18:00 GMT. At the time of writing, the US Dollar Index (DXY), which tracks the Greenback’s value against six major currencies, holds onto losses near a fresh two-month low of 96.60 posted on Tuesday. Read more… UK inflation unchanged at 3.8%, Pound shrugs The British pound is unchanged on Wednesday, trading at 1.3645 in the European session. Today’s inflation report was a dour reminder that UK inflation remains entrenched. CPI for August was unchanged at 3.8% y/y, matching the consensus and its highest level since January 2024. Airfares decreased but this was offset by food and petrol prices. Monthly, CPI rose 0.3%, up from 0.1% in July and matching the consensus. Core CPI, which excludes volatile items such as food and energy, eased to 3.6% from 3.8%. Monthly, core CPI ticked up to 0.3% from 0.2%. The inflation report comes just a day before the Bank of England announces its rate decision. Inflation is almost double the BoE’s target of 2% and today’s release likely means that the BoE will not reduce rates before 2026. Read more… Source: https://www.fxstreet.com/news/pound-sterling-price-news-and-forecast-gbp-trades-firmly-against-us-dollar-ahead-of-feds-policy-outcome-202509171209
Share
BitcoinEthereumNews2025/09/18 01:50
XMR Technical Analysis Mar 21

XMR Technical Analysis Mar 21

The post XMR Technical Analysis Mar 21 appeared on BitcoinEthereumNews.com. XMR is experiencing a strong pullback at the $349 level on the daily chart, approaching
Share
BitcoinEthereumNews2026/03/21 14:52