TLDRs; Alibaba’s Qwen3-Max-Thinking achieved perfect scores in AIME and HMMT, marking China’s first flawless AI math performance. OpenAI’s GPT-5 Pro also self-reported perfect results, setting up a new East–West rivalry in reasoning AI. Verification concerns linger, as Alibaba’s results lack third-party validation or evidence of closed-book testing. API access opens doors for developers and investors, [...] The post Alibaba’s Qwen AI Outsmarts Global Peers in Math Benchmarks appeared first on CoinCentral.TLDRs; Alibaba’s Qwen3-Max-Thinking achieved perfect scores in AIME and HMMT, marking China’s first flawless AI math performance. OpenAI’s GPT-5 Pro also self-reported perfect results, setting up a new East–West rivalry in reasoning AI. Verification concerns linger, as Alibaba’s results lack third-party validation or evidence of closed-book testing. API access opens doors for developers and investors, [...] The post Alibaba’s Qwen AI Outsmarts Global Peers in Math Benchmarks appeared first on CoinCentral.

Alibaba’s Qwen AI Outsmarts Global Peers in Math Benchmarks

2025/11/06 05:21
3분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

TLDRs;

  • Alibaba’s Qwen3-Max-Thinking achieved perfect scores in AIME and HMMT, marking China’s first flawless AI math performance.
  • OpenAI’s GPT-5 Pro also self-reported perfect results, setting up a new East–West rivalry in reasoning AI.
  • Verification concerns linger, as Alibaba’s results lack third-party validation or evidence of closed-book testing.
  • API access opens doors for developers and investors, with potential cost-performance advantages across Asia-Pacific markets.

Alibaba’s artificial intelligence division has unveiled Qwen3-Max-Thinking, an advanced reasoning model that stunned observers by scoring a perfect 100% in two of the world’s toughest mathematics competitions, the American Invitational Mathematics Examination (AIME) and the Harvard-MIT Mathematics Tournament (HMMT).

This marks a significant milestone for China’s AI industry. It is reportedly the first time a Chinese-developed model has matched or exceeded Western benchmarks in reasoning-heavy academic tests.

The announcement places Alibaba’s AI efforts shoulder-to-shoulder with OpenAI’s GPT-5 Pro, which also self-reported flawless results in the same contests earlier this year.

A Leap for China’s AI Ambitions

According to Alibaba, Qwen3-Max-Thinking is built atop Qwen3-Max, the company’s largest AI model boasting over one trillion parameters. Released in late September, the Qwen3-Max architecture represents Alibaba’s boldest step toward creating general-purpose reasoning models that can compete globally in complex problem-solving tasks.

The math victories are symbolic as much as technical. For years, elite competitions like the AIME and HMMT have been used as unofficial benchmarks for evaluating the reasoning depth and abstract thinking capacity of large language models (LLMs). Perfect accuracy in such events signals that Qwen3-Max-Thinking is closing the performance gap with Western-developed systems.

However, questions remain about transparency and verification. Alibaba’s claims, while headline-grabbing, lack third-party confirmation. Neither the AIME nor HMMT maintains public leaderboards for AI models, and no independent audit has yet verified whether the results were achieved under closed-book, internet-free conditions, a crucial factor in determining authenticity.

Verification Gaps Raise Skepticism

Despite the celebration, experts have urged caution. The absence of public verification means it is unclear whether Qwen3-Max-Thinking truly achieved 100% accuracy under standardized conditions.
Unverified results have become a recurring issue in AI benchmarking, as companies race to claim superiority in domains like reasoning, coding, and mathematics.

Further complicating the picture, details remain murky on whether the 2025 versions of the contest problems were used or if the AI had prior exposure to similar data during training. Without contamination controls,  safeguards ensuring the model hadn’t seen test data before, perfect scores are difficult to validate.

While Alibaba’s announcement has sparked excitement, critics warn that without reproducibility, the victory could remain symbolic rather than scientific.

Developers and Investors Eye API Potential

Beyond benchmark bragging rights, Alibaba’s AI strategy has real commercial implications. The company recently opened API access to Qwen3-Max-Thinking, inviting developers to test its reasoning capabilities in real-world applications.

For software and data teams, this introduces new possibilities for cost-performance routing, dynamically choosing between AI providers based on pricing, accuracy, or latency. Developers in the Asia-Pacific region, particularly those seeking local AI infrastructure options, may find Qwen’s ecosystem attractive if it offers competitive pricing and reliable regional support beyond Singapore.

Investors are also watching closely. If Qwen3-Max-Thinking can handle complex reasoning tasks while maintaining affordability, Alibaba could carve out a niche among enterprise developers and AI startups looking for alternatives to U.S. providers. The success of such models could signal a new balance in global AI infrastructure, where Chinese models rival or even outperform Western ones in specific tasks.

The post Alibaba’s Qwen AI Outsmarts Global Peers in Math Benchmarks appeared first on CoinCentral.

시장 기회
플러리싱 에이아이 로고
플러리싱 에이아이 가격(SLEEPLESSAI)
$0.01814
$0.01814$0.01814
-4.97%
USD
플러리싱 에이아이 (SLEEPLESSAI) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!