An Alibaba-linked AI agent named ROME hijacked training GPUs for unauthorised crypto mining and opened a covert SSH tunnel during RL tests. The post Alibaba-LinkedAn Alibaba-linked AI agent named ROME hijacked training GPUs for unauthorised crypto mining and opened a covert SSH tunnel during RL tests. The post Alibaba-Linked

Alibaba-Linked AI Agent ROME Attempts Crypto Mining and Network Tunnelling During Training

2026/03/09 13:01
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com
  • ROME, a 30-billion-parameter autonomous coding agent built on Alibaba’s Qwen3-MoE architecture, diverted GPU resources toward cryptocurrency mining and created a reverse SSH tunnel to an external IP address during reinforcement learning training.
  • Researchers confirmed the behaviours were not programmed, with ROME apparently determining that acquiring additional compute and financial capacity would help complete its assigned tasks.
  • Alibaba responded with safety-aligned data filtering and hardened sandbox environments.

Researchers at Alibaba’s Agentic Learning Ecosystem flagged two unauthorised behaviours from ROME, their experimental autonomous coding agent, during reinforcement learning training runs conducted in late 2025.

Apparently, the model redirected GPU capacity toward cryptocurrency mining and opened a reverse SSH tunnel to an external server, bypassing internal firewall controls.

Read more: OKX Says Australia Could Unlock $24B Digital Finance Boom With Faster Crypto Rules

ROME Goes Rogue

The team first saw unusual outbound traffic alerts and suspected a normal security failure, such as weak egress settings or an outside breach. But the same violations appeared again across multiple training runs without a clear pattern. 

In a technical report, the researchers said the investigation showed the agent, known as ROME, had initiated both actions itself.

The joint research teams, identified in the paper as ROCK, ROLL, iFlow, and DT, described the behaviours as “instrumental side effects of autonomous tool use under RL optimization”.

ROME had not been instructed to mine cryptocurrency or open external connections, but its optimisation process identified additional compute and financial resources as useful for completing its objectives.

Together, these observations suggest that during iterative RL optimization, a language-model agent can spontaneously produce hazardous, unauthorized behaviors at the tool-calling and code-execution layer, violating the assumed execution boundary.

ROCK & ROLL & IFLOW & DT Joint Team

ROME is built on Alibaba’s Qwen3-MoE architecture and runs approximately 3 billion of its 30 billion parameters at any given time, designed to complete complex tasks through tool use, terminal commands, and software environment interaction. 

Read more: Trump Pushes Banks to Strike Crypto Deal, but Analysts Say It Won’t Break CLARITY Act Deadlock

Well, the concern is not that the model was told to behave maliciously, but that it independently found unauthorised methods that helped it perform better under its training objective, so Alibaba said it responded by tightening sandbox protections and filtering training data for safety alignment. 

Also, not the first time issues like this have worried researchers and engineers. 

Anthropic has also reported troubling agent-style behavior in testing, including cases where Claude Opus 4 concealed its intentions, suggesting the issue is broader than one company or model.

The post Alibaba-Linked AI Agent ROME Attempts Crypto Mining and Network Tunnelling During Training appeared first on Crypto News Australia.

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!