Together AI Sets New Benchmark with Fastest Inference for Open-Source Models

Felix Pinkston
Dec 01, 2025 19:07

Together AI achieves unprecedented speed in open-source model inference, leveraging GPU optimization and quantization techniques to outperform competitors on NVIDIA Blackwell architecture.

Together AI has announced a groundbreaking achievement in the realm of open-source model inference, delivering up to twice the speed compared to previous benchmarks. This leap in performance is attributed to advancements in GPU optimization, speculative decoding, and low-bit quantization formats, according to Together AI.

Technological Innovations Driving Performance

Central to this achievement is the integration of next-generation GPU hardware, notably the NVIDIA Blackwell architecture. Together AI has re-engineered its inference engine to maximize the potential of these GPUs, employing optimized kernels and advanced quantization techniques such as FP4. This comprehensive overhaul allows the system to function as a high-efficiency unit, optimizing compute kernels, memory layout, and execution graphs.

Quantization and Speculative Decoding

Together AI’s quantization strategy plays a crucial role in its performance gains. By converting large model weights to low-bit formats, the company maintains high accuracy while significantly enhancing speed. Their speculative decoding algorithms further boost efficiency, ensuring high output speed while maintaining quality across various data domains.

Benchmark Results

Independent benchmarks from Artificial Analysis confirm Together AI’s platform as the fastest among GPU-based providers for demanding open-source models, including GPT-OSS and Qwen series. The platform’s output speed surpasses competitors, with some models achieving up to 2.75 times faster inference.

Future Developments

Looking ahead, Together AI is focused on expanding its capabilities, including faster generation for downstream applications and enhanced support for hybrid quantization. The company is committed to advancing the performance and scalability of open-source AI models.

For more information, you can visit the Together AI website.

Image source: Shutterstock

Source: https://blockchain.news/news/together-ai-fastest-inference-open-source-models

Together AI Sets New Benchmark with Fastest Inference for Open-Source Models

Technological Innovations Driving Performance

Quantization and Speculative Decoding

Benchmark Results

Future Developments

You May Also Like

Japan-Based Bitcoin Treasury Company Metaplanet Completes $1.4 Billion IPO! Will It Buy Bitcoin? Here Are the Details

InvestCapitalWorld Updates Platform Features to Support Broader Multi-Asset Market Access

Why X Banned Information Finance Apps In 2026

Trending News

Japan-Based Bitcoin Treasury Company Metaplanet Completes $1.4 Billion IPO! Will It Buy Bitcoin? Here Are the Details

InvestCapitalWorld Updates Platform Features to Support Broader Multi-Asset Market Access

Why X Banned Information Finance Apps In 2026

Unlikely to break the support at 6.9520 – UOB Group

Trump says ‘Venezuela leaker’ jailed as Polymarket accounts go quiet

Quick Reads

How Collective Intelligence Is Reshaping Crypto's Future: The Secret Behind BEEG Prediction Market Becoming Sui Ecosystem's Ultimate Compass in 2026

2026 Passive Income Playbook: How BEEG Liquid Staking Doubles Your Sui Ecosystem Rewards

From Meme to GameFi Ruler: How BEEG is Rewriting Sui Chain Gaming Payment Standards In 2026

Why BEEG Has Become the "Hard Currency" of Sui's Social Protocol in 2026: Unveiling the Tipping Economy Revolution

XRP Price Prediction 2026: Why Seasoned Traders Choose MEXC to Position for XRP's Next Bull Run

Crypto Prices