NVIDIA's new Groq 3 LPX delivers 315 PFLOPS and 35x better inference throughput per megawatt, targeting agentic AI workloads on the Vera Rubin platform. (Read MoreNVIDIA's new Groq 3 LPX delivers 315 PFLOPS and 35x better inference throughput per megawatt, targeting agentic AI workloads on the Vera Rubin platform. (Read More

NVIDIA Unveils Groq 3 LPX Rack System for Ultra-Low Latency AI Inference

2026/03/17 05:19
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

NVIDIA Unveils Groq 3 LPX Rack System for Ultra-Low Latency AI Inference

Timothy Morano Mar 16, 2026 21:19

NVIDIA's new Groq 3 LPX delivers 315 PFLOPS and 35x better inference throughput per megawatt, targeting agentic AI workloads on the Vera Rubin platform.

NVIDIA Unveils Groq 3 LPX Rack System for Ultra-Low Latency AI Inference

NVIDIA has pulled back the curtain on the Groq 3 LPX, a rack-scale inference accelerator built around 256 interconnected Language Processing Units that the company claims delivers up to 35x higher throughput per megawatt for trillion-parameter models. The system arrives as the seventh chip in full production for the Vera Rubin platform, following NVIDIA's $20 billion acquisition of Groq's intellectual property.

The timing matters. As AI workloads shift from batch processing toward real-time agentic systems—where multiple AI agents coordinate continuously—the bottleneck isn't raw compute anymore. It's latency. NVIDIA is betting that the future demands infrastructure capable of generating tokens at speeds approaching 1,000 per second per user, fast enough to enable what the company calls "speed of thought computing."

What's Actually Inside the Box

The LPX rack houses 32 liquid-cooled compute trays, each packing eight LP30 LPU chips. At full scale, the system delivers 315 PFLOPS of inference compute with 128 GB of on-chip SRAM and 40 PB/s of memory bandwidth. Scale-up bandwidth hits 640 TB/s across the 256-chip configuration.

The architecture diverges sharply from traditional GPU approaches. Where GPUs rely on massive parallel throughput and external High Bandwidth Memory, LPUs keep their working set—weights, activations, KV cache state—entirely in on-chip SRAM. The compiler controls data movement explicitly rather than depending on hardware cache heuristics. NVIDIA claims this produces more deterministic execution with reduced latency jitter.

Each LPU connects through 96 chip-to-chip links running at 112 Gbps, enabling 2.5 TB/s of bidirectional bandwidth per chip. The plesiosynchronous protocol aligns hundreds of accelerators to operate as a single coordinated system.

The Heterogeneous Inference Play

NVIDIA isn't positioning LPX as a GPU replacement. Instead, it's designed to work alongside Vera Rubin NVL72 systems in what the company calls "attention-FFN disaggregation." GPUs handle the heavy lifting—long-context prefill, decode attention over accumulated KV caches—while LPUs accelerate the latency-sensitive feed-forward network execution within the decode loop.

The NVIDIA Dynamo orchestration layer manages this split, routing work based on latency targets and shuffling intermediate activations between processors. For speculative decoding, LPX can serve as the draft-generation engine while GPUs handle verification.

NVIDIA claims this heterogeneous approach unlocks up to 10x more revenue per megawatt compared to GB200 NVL72 systems for premium interactive workloads. The math assumes operators can charge meaningfully more for responsive AI services than for throughput-optimized batch processing.

Market Implications

The LPX rack is slated for availability in the second half of 2026. For data center operators weighing infrastructure investments, the announcement signals NVIDIA's conviction that inference economics will increasingly favor specialized hardware as agentic AI scales.

Whether the 35x efficiency gains hold up under real-world production loads remains to be seen. But for anyone building multi-agent systems or interactive AI products where response latency directly impacts user experience, the architectural shift toward heterogeneous inference is worth tracking closely.

Image source: Shutterstock
  • nvidia
  • groq
  • ai inference
  • vera rubin
  • lpu
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

The post Polygon Tops RWA Rankings With $1.1B in Tokenized Assets appeared on BitcoinEthereumNews.com. Key Notes A new report from Dune and RWA.xyz highlights Polygon’s role in the growing RWA sector. Polygon PoS currently holds $1.13 billion in RWA Total Value Locked (TVL) across 269 assets. The network holds a 62% market share of tokenized global bonds, driven by European money market funds. The Polygon POL $0.25 24h volatility: 1.4% Market cap: $2.64 B Vol. 24h: $106.17 M network is securing a significant position in the rapidly growing tokenization space, now holding over $1.13 billion in total value locked (TVL) from Real World Assets (RWAs). This development comes as the network continues to evolve, recently deploying its major “Rio” upgrade on the Amoy testnet to enhance future scaling capabilities. This information comes from a new joint report on the state of the RWA market published on Sept. 17 by blockchain analytics firm Dune and data platform RWA.xyz. The focus on RWAs is intensifying across the industry, coinciding with events like the ongoing Real-World Asset Summit in New York. Sandeep Nailwal, CEO of the Polygon Foundation, highlighted the findings via a post on X, noting that the TVL is spread across 269 assets and 2,900 holders on the Polygon PoS chain. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 Key Trends From the 2025 RWA Report The joint publication, titled “RWA REPORT 2025,” offers a comprehensive look into the tokenized asset landscape, which it states has grown 224% since the start of 2024. The report identifies several key trends driving this expansion. According to…
Share
BitcoinEthereumNews2025/09/18 00:40
World Gold Council’s Pivotal Framework Promises Unprecedented Market Trust

World Gold Council’s Pivotal Framework Promises Unprecedented Market Trust

The post World Gold Council’s Pivotal Framework Promises Unprecedented Market Trust appeared on BitcoinEthereumNews.com. Tokenized Gold Revolution: World Gold Council
Share
BitcoinEthereumNews2026/03/20 03:58
Shiba Inu Price Prediction 2026: SHIB Fights to Reclaim Its Glory While Pepeto Offers the 150x Early Window That SHIB Already Closed

Shiba Inu Price Prediction 2026: SHIB Fights to Reclaim Its Glory While Pepeto Offers the 150x Early Window That SHIB Already Closed

A truck driver put $650 into Shiba Inu in 2020 and quit his job after his bag grew to $1.7 million. Two brothers invested $7,900 during the COVID lockdowns and
Share
Blockonomi2026/03/20 04:32