Author | ZeR0 Junda, Zhidongxi Edited by | Mo Ying On January 5th, C114 reported from Las Vegas that NVIDIA founder and CEO Jensen Huang delivered his first keynoteAuthor | ZeR0 Junda, Zhidongxi Edited by | Mo Ying On January 5th, C114 reported from Las Vegas that NVIDIA founder and CEO Jensen Huang delivered his first keynote

Jensen Huang announced eight new products in 1.5 hours, with Nvidia fully betting on AI inference and physical AI.

2026/01/06 12:30

Author | ZeR0 Junda, Zhidongxi

Edited by | Mo Ying

On January 5th, C114 reported from Las Vegas that NVIDIA founder and CEO Jensen Huang delivered his first keynote address of 2026 at CES 2026. As always, Huang wore a leather jacket and announced eight important developments in 1.5 hours, providing an in-depth overview of the entire new generation platform, from chips and racks to network design.

In the field of accelerated computing and AI infrastructure, NVIDIA released the NVIDIA Vera Rubin POD AI supercomputer, NVIDIA Spectrum-X Ethernet co-packaged optics, NVIDIA inference context memory storage platform, and NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

The NVIDIA Vera Rubin POD uses six of NVIDIA's self-developed chips, covering CPU, GPU, scale-up, scale-out, storage, and processing capabilities. All components are co-designed to meet the needs of advanced models and reduce computing costs.

Among them, the Vera CPU adopts a custom Olympus core architecture, and the Rubin GPU, after introducing the Transformer engine, has an NBFP4 inference performance of up to 50 PFLOPS, with an NVLink bandwidth of up to 3.6TB/s per GPU. It supports third-generation general-purpose confidential computing (the first rack-level TEE) and realizes a complete trusted execution environment across CPU and GPU domains.

These chips have all been returned to the factory, NVIDIA has validated the entire NVIDIA Vera Rubin NVL72 system, partners have begun running its integrated AI models and algorithms, and the entire ecosystem is preparing for Vera Rubin deployment.

Other announcements include: NVIDIA Spectrum-X Ethernet co-packaged optics significantly optimizes power efficiency and application uptime; the NVIDIA Inference Context Memory Storage Platform redefines the storage stack to reduce redundant computations and improve inference efficiency; and NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72 reduces the token cost of large MoE models to 1/10.

Regarding open models, NVIDIA announced an expansion of its open-source model suite, releasing new models, datasets, and libraries. This includes the addition of the Agentic RAG model, a safety model, and a speech model to the NVIDIA Nemotron open-source model family, as well as a brand-new open model applicable to all types of robots. However, Jensen Huang did not elaborate on these details in his presentation.

In terms of physical AI, the ChatGPT moment for physical AI has arrived . NVIDIA's full-stack technology enables the global ecosystem to transform industries through AI-driven robotics. NVIDIA's extensive AI toolkit, including the new Alpamayo open-source model suite, enables the global transportation industry to quickly achieve safe Level 4 driving. The NVIDIA DRIVE autonomous driving platform is now in production and is featured in all new Mercedes-Benz CLAs for Level 2++ AI-defined driving.

01. Brand New AI Supercomputer: 6 self-developed chips, single-rack computing power reaches 3.6 EFLOPS

Jensen Huang believes that the computer industry undergoes a complete reshaping every 10 to 15 years, but this time, two platform revolutions are happening simultaneously: from CPUs to GPUs, from "programming software" to "training software," accelerating computing and AI are reshaping the entire computing stack. The $10 trillion computing industry of the past decade is undergoing a modernization.

At the same time, the demand for computing power has skyrocketed. The size of the model grows tenfold every year, the number of tokens used for model thinking grows fivefold every year, while the price of each token decreases tenfold every year.

To meet this demand, Nvidia has decided to release new computing hardware every year. Jensen Huang revealed that Vera Rubin is now in full production.

NVIDIA's new AI supercomputer, NVIDIA Vera Rubin POD, uses six self-developed chips: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 (CX9) smart network card, BlueField-4 DPU, and Spectrum-X 102.4T CPO.

Vera CPU: Designed for data movement and agent processing, it features 88 custom NVIDIA Olympus cores, 176 threads of NVIDIA Space Multithreading, 1.8TB/s NVLink-C2C support for CPU:GPU unified memory, system memory up to 1.5TB (3 times that of Grace CPU), SOCAMM LPDDR5X memory bandwidth of 1.2TB/s, and supports rack-level confidential computing, doubling data processing performance.

Rubin GPU: Introduces the Transformer engine, with NVFP4 inference performance up to 50 PFLOPS, which is 5 times that of Blackwell GPU. It is backward compatible and improves BF16/FP4 level performance while maintaining inference accuracy; NVFP4 training performance reaches 35 PFLOPS, which is 3.5 times that of Blackwell.

Rubin is also the first platform to support HBM4, which boasts a bandwidth of 22TB/s, 2.8 times that of the previous generation, providing the performance required for demanding MoE models and AI workloads.

NVLink 6 Switch: Single lane speed increased to 400Gbps, using SerDes technology to achieve high-speed signal transmission; each GPU can achieve 3.6TB/s of full interconnect communication bandwidth, twice that of the previous generation, with a total bandwidth of 28.8TB/s, and in-network computing performance of 14.4TFLOPS at FP8 precision, supporting 100% liquid cooling.

NVIDIA ConnectX-9 SuperNIC: Each GPU provides 1.6Tb/s of bandwidth, optimized for large-scale AI, and features a fully software-defined, programmable, and accelerated data path.

NVIDIA BlueField-4: An 800Gbps DPU for smart network cards and storage processors, equipped with a 64-core Grace CPU and a ConnectX-9 SuperNIC, to offload network and storage-related computing tasks while enhancing network security capabilities. It delivers 6 times the computing performance of the previous generation, 3 times the memory bandwidth, and up to 2 times the speed at which the GPU accesses data storage.

NVIDIA Vera Rubin NVL72: Integrates all the above components into a single-rack processing system at the system level, with 2 trillion transistors, NVFP4 inference performance of 3.6 EFLOPS, and NVFP4 training performance of 2.5 EFLOPS.

The system has 54TB of LPDDR5X memory, which is 2.5 times that of the previous generation; 20.7TB of total HBM4 memory, which is 1.5 times that of the previous generation; 1.6PB/s of HBM4 bandwidth, which is 2.8 times that of the previous generation; and a total vertical scaling bandwidth of 260TB/s, which exceeds the total bandwidth of the global Internet.

Based on the third-generation MGX rack design, the system features a modular, hostless, cableless, and fanless compute tray, making assembly and maintenance 18 times faster than the GB200. Assembly that previously took two hours now takes only about five minutes, and the system now uses 100% liquid cooling instead of approximately 80%. The system itself weighs two tons, reaching 2.5 tons with the added coolant.

The NVLink Switch tray enables zero-downtime maintenance and fault tolerance, allowing the rack to continue operating even when the tray is removed or partially deployed. The second-generation RAS engine enables zero-downtime operational status checks.

These features improve system uptime and throughput, further reduce training and inference costs, and meet the data center's requirements for high reliability and maintainability.

More than 80 MGX partners are ready to support the deployment of Rubin NVL72 in hyperscale networks.

02. Three major new products revolutionize AI inference efficiency: new CPO device, new context storage layer, and new DGX SuperPOD.

At the same time, NVIDIA released three important new products: NVIDIA Spectrum-X Ethernet co-packaged optics, NVIDIA inference context memory storage platform, and NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

1. NVIDIA Spectrum-X Ethernet co-packaged optics

The NVIDIA Spectrum-X Ethernet co-packaged optics are based on the Spectrum-X architecture, employ a two-chip design, utilize 200Gbps SerDes, and each ASIC can provide 102.4Tb/s bandwidth.

The switching platform includes a 512-port high-density system and a 128-port compact system, each with a speed of 800Gb/s.

CPO (Co-packaged Optical) switching systems can achieve 5x energy efficiency, 10x reliability, and 5x application uptime.

This means that more tokens can be processed each day, thereby further reducing the total cost of ownership (TCO) of the data center.

2. NVIDIA Inference Context Memory Storage Platform

The NVIDIA Inference Context Memory Storage Platform is a POD-level AI-native storage infrastructure for storing key-value caches. It is accelerated by BlueField-4 and Spectrum-X Ethernet, and tightly coupled with NVIDIA Dynamo and NVLink to achieve collaborative context scheduling between memory, storage, and network.

The platform treats context as a first-class data type, achieving 5x the inference performance and 5x the energy efficiency.

This is crucial for improving long-context applications such as multi-turn dialogues, RAG, and Agentic multi-step inference, which rely heavily on the ability to efficiently store, reuse, and share context throughout the system.

AI is evolving from chatbots to Agentic AI, which can reason, invoke tools, and maintain state over the long term. Context windows have expanded to millions of tokens. These contexts are stored in a KV cache, and recalculating them at every step wastes GPU time and introduces significant latency, thus requiring storage.

However, while GPU memory is fast, it is scarce, and traditional network storage is too inefficient for short-term contexts. The bottleneck in AI inference is shifting from computation to context storage. Therefore, a new type of memory layer, optimized specifically for inference, is needed, positioned between the GPU and storage.

This layer is no longer a reactive patch, but must be designed in conjunction with network storage to move context data with minimal overhead.

As a new storage tier, the NVIDIA Inference Context Memory Storage Platform does not reside directly in the host system but is connected to the computing device via BlueField-4. Its key advantage lies in its ability to scale the storage pool more efficiently, thereby avoiding redundant computation of the key-value cache.

NVIDIA is working closely with storage partners to bring the NVIDIA Inference Context Memory Storage Platform to the Rubin platform, enabling customers to deploy it as part of a fully integrated AI infrastructure.

3. NVIDIA DGX SuperPOD built on Vera Rubin

At the system level, NVIDIA DGX SuperPOD serves as a blueprint for large-scale AI factory deployments. It employs eight DGX Vera Rubin NVL72 systems, uses NVLink 6 to extend the network vertically, uses Spectrum-X Ethernet to extend the network horizontally, incorporates the NVIDIA inference context memory storage platform, and has undergone engineering verification.

The entire system is managed by NVIDIA Mission Control software for maximum efficiency. Customers can deploy it as a turnkey platform, completing training and inference tasks with fewer GPUs.

Thanks to its highly collaborative design across six chips, trays, racks, pods, data centers, and software, the Rubin platform has achieved a significant reduction in training and inference costs. Compared to the previous generation Blackwell, training a MoE model of the same size requires only 1/4 of the number of GPUs; and at the same latency, the token cost for large MoE models is reduced to 1/10.

The NVIDIA DGX SuperPOD, which uses the DGX Rubin NVL8 system, was also released at the same time.

Leveraging the Vera Rubin architecture, NVIDIA is working with partners and customers to build the world’s largest, most advanced, and lowest-cost AI system, accelerating the mainstream adoption of AI.

Rubin infrastructure will be provided through CSPs and system integrators in the second half of this year, with Microsoft and others being among the first to deploy it.

03. Expanding the Open Model Universe: A Key Contributor to New Models, Data, and the Open Source Ecosystem

At the software and model levels, NVIDIA continues to increase its investment in open source.

According to data from mainstream development platforms such as OpenRouter, the usage of AI models has increased 20-fold in the past year, with about a quarter of the tokens coming from open-source models.

In 2025, NVIDIA was the largest contributor to open-source models, data, and recipes on Hugging Face, releasing 650 open-source models and 250 open-source datasets.

NVIDIA's open-source models consistently rank highly on various leaderboards. Developers can not only use these open-source models, but also learn from them, continuously train them, expand datasets, and use open-source tools and documentation technologies to build AI systems.

Inspired by Perplexity, Jensen Huang observed that agents should be multi-model, multi-cloud, and hybrid cloud, which is also the basic architecture of agentic AI systems, and almost all startups are adopting it.

With the help of open-source models and tools provided by NVIDIA, developers can now customize AI systems and use cutting-edge model capabilities. NVIDIA has now integrated this framework into "Blueprints" and integrated them into its SaaS platform. Users can use Blueprints for rapid deployment.

In the on-site demonstration, this system can automatically determine whether a task should be handled by a local private model or a cloud-based frontier model based on the user's intent. It can also call external tools (such as email APIs, robot control interfaces, calendar services, etc.) and achieve multimodal fusion to uniformly process information such as text, voice, images, and robot sensor signals.

These complex capabilities were unimaginable in the past, but are now commonplace. Similar capabilities can be found on enterprise platforms such as ServiceNow and Snowflake.

04. Open-source Alpha-Mayo model, enabling self-driving cars to "think".

Nvidia believes that physical AI and robotics will eventually become the world's largest consumer electronics segment. Everything that can move will eventually become fully autonomous, driven by physical AI.

AI has gone through the stages of perceptual AI, generative AI, and agentic AI, and is now entering the era of physical AI. Intelligence is entering the real world, and these models can understand physical laws and generate actions directly from the perception of the physical world.

To achieve this goal, physics AI must learn the common sense of the world—object permanence, gravity, and friction. Acquiring these capabilities will rely on three computers: a training computer (DGX) to build the AI model, an inference computer (robot/vehicle chip) for real-time execution, and a simulation computer (Omniverse) to generate synthetic data and verify the physical logic.

The core model is the Cosmos World Foundation Model, which aligns language, images, 3D, and physical laws to support the entire chain from simulation to training data generation.

Physical AI will appear in three types of entities: buildings (such as factories and warehouses), robots, and self-driving cars.

Jensen Huang believes that autonomous driving will be the first large-scale application scenario for physical AI. Such systems need to understand the real world, make decisions, and execute actions, placing extremely high demands on safety, simulation, and data.

In response, NVIDIA released Alpha-Mayo, a complete system consisting of open-source models, simulation tools, and physical AI datasets to accelerate the development of safe, inference-based physical AI.

Its product portfolio provides global automakers, suppliers, startups and researchers with the basic modules for building Level 4 autonomous driving systems.

Alpha-Mayo is the industry's first truly open-source model that enables self-driving cars to "think." It breaks down problems into steps, reasons about all possibilities, and selects the safest path.

This reasoning-based task-action model enables autonomous driving systems to handle complex edge cases they have never encountered before, such as traffic light failures at busy intersections.

Alpha-Mayo has 10 billion parameters, large enough to handle autonomous driving tasks, yet lightweight enough to run on workstations designed for autonomous driving researchers.

It can receive text, surround-view camera data, vehicle history status, and navigation input, and output driving trajectory and reasoning process, allowing passengers to understand why the vehicle took a certain action.

In the promotional video played at the event, it was shown that, driven by Alpha-Mayo, the self-driving car can autonomously complete tasks such as avoiding pedestrians, anticipating left-turning vehicles, and changing lanes to avoid them without any intervention.

Jensen Huang stated that the Mercedes-Benz CLA equipped with Alpha-Mayo is already in production and has just been rated the world's safest car by NCAP. Every line of code, chip, and system has undergone safety certification. The system will be available in the US market and will launch with enhanced driving capabilities later this year, including hands-free driving on highways and end-to-end autonomous driving in urban environments.

NVIDIA also released a subset of the datasets used to train Alpha-Mayo, as well as Alpha-Sim, an open-source inference model evaluation simulation framework. Developers can fine-tune Alpha-Mayo using their own data, or generate synthetic data using Cosmos, and train and test autonomous driving applications based on a combination of real and synthetic data. In addition, NVIDIA announced that the NVIDIA DRIVE platform is now in production.

NVIDIA announced that leading global robotics companies, including Boston Dynamics, Franka Robotics, Surgical, LG Electronics, NEURA, XRLabs, and Logic Robotics, are building their systems on NVIDIA Isaac and GR00T.

Jensen Huang also officially announced a new collaboration with Siemens. Siemens is integrating NVIDIA CUDA-X, AI models, and Omniverse into its portfolio of EDA, CAE, and digital twin tools and platforms. Physical AI will be widely used across the entire process from design and simulation to manufacturing and operations.

05. Conclusion: Embracing open source with one hand, and making hardware systems irreplaceable with the other.

As the focus of AI infrastructure shifts from training to large-scale inference, platform competition has evolved from single-point computing power to a systems engineering approach covering chips, racks, networks, and software. The goal is to deliver the maximum inference throughput with the lowest TCO, and AI is entering a new stage of "factory-style operation".

NVIDIA places great emphasis on system-level design. Rubin achieves both performance and cost-effectiveness improvements in training and inference, and serves as a plug-and-play alternative to Blackwell, allowing for a seamless transition from Blackwell.

In terms of platform positioning, NVIDIA still believes that training is crucial because only by quickly training state-of-the-art models can the inference platform truly benefit. Therefore, NVIDIA has introduced NVFP4 training in Rubin GPUs to further improve performance and reduce TCO.

At the same time, this AI computing giant continues to significantly enhance its network communication capabilities in both vertical and horizontal scaling architectures, and regards context as a key bottleneck to achieve collaborative design of storage, network, and computing.

While Nvidia is aggressively opening up its open-source strategy, it is also making its hardware, interconnects, and system design increasingly "irreplaceable." This closed-loop strategy of continuously expanding demand, incentivizing token consumption, driving inference scaling, and providing cost-effective infrastructure is building an even more impregnable moat for Nvidia.

Market Opportunity
1 Logo
1 Price(1)
$0.022509
$0.022509$0.022509
+8.37%
USD
1 (1) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.