NVIDIA Nemotron 3 Super Hits Together AI With 1M Token Context Window

Jessie A Ellis Mar 11, 2026 21:43

NVIDIA's 120B-parameter Nemotron 3 Super model now available on Together AI, offering 5x throughput gains for multi-agent AI systems and enterprise workloads.

NVIDIA Nemotron 3 Super Hits Together AI With 1M Token Context Window

Together AI announced availability of NVIDIA's Nemotron 3 Super on its Dedicated Inference platform March 11, giving enterprise developers access to a 120-billion-parameter reasoning model optimized for multi-agent AI systems. NVIDIA stock traded at $186.03, up 0.66% on the news.

The timing matters. Nemotron 3 Super represents NVIDIA's second open-weight model in the Nemotron 3 family, following December's Nano release, and targets a specific pain point in production AI: the computational overhead of running complex agent workflows at scale.

Why the Architecture Matters

Here's what makes this model different from the typical parameter-count arms race. Despite its 120B total parameters, only 12B are active during inference. The hybrid design—combining Transformer attention with Mamba sequence processing—delivers what NVIDIA claims is 5x higher throughput than the previous Nemotron Super model.

The 1-million-token context window addresses what developers call "context explosion." Multi-agent applications can consume 15x more tokens than standard chat interactions, and most models choke on that load. Nemotron 3 Super handles entire codebases, lengthy document stores, and extended agent trajectories without the performance cliff.

Multi-Token Prediction training allows the model to generate several tokens simultaneously per forward pass. For code generation or structured outputs, NVIDIA reports 50% faster token generation compared to leading open models.

Together AI's Play

Running a 120B hybrid model with million-token context typically demands distributed compute across multiple nodes. Together AI's Dedicated Inference offering simplifies deployment to single NVIDIA H200 or H100 GPUs—no GPU provisioning required on the developer's end.

The platform promises 99.9% uptime SLA and SOC 2 compliance, positioning this as enterprise-ready infrastructure rather than research-grade experimentation.

Production Applications

Target use cases include developer assistants analyzing codebases, enterprise document processing systems, cybersecurity vulnerability triage, and orchestration layers routing tasks across specialized agents.

The open-weights approach—released under NVIDIA's Nemotron Open Model License—allows teams to fine-tune for specific environments and deploy on-premise, a critical consideration for enterprises with data sovereignty requirements.

NVIDIA also announced NemoClaw on March 10, an open-source platform for AI agents that could complement Nemotron 3 Super deployments. Developers can access the model through Together AI's dedicated inference tier immediately.

Image source: Shutterstock