The cost model leverages SMT‑based solving (Z3) to achieve optimal decoding speed under CPU, I/O, and memory constraints.The cost model leverages SMT‑based solving (Z3) to achieve optimal decoding speed under CPU, I/O, and memory constraints.

How PowerInfer‑2 Turns Your Smartphone Into an AI Workstation

2025/11/04 03:56

Abstract and 1. Introduction

  1. Background and Motivation
  2. PowerInfer-2 Overview
  3. Neuron-Aware Runtime Inference
  4. Execution Plan Generation
  5. Implementation
  6. Evaluation
  7. Related Work
  8. Conclusion and References

5 Execution Plan Generation

Today’s smartphones are equipped with a variety of hardware specifications, such as differing CPU capabilities, I/O throughput, and DRAM sizes. Users deploying LLMs on these devices also have diverse objectives. Some may prioritize a balance between generation speed and memory usage, while others aim to maximize hardware utilization for increased speed. Additionally, the models themselves vary in weight numbers, structures, and sparsity levels. To manage this complexity, PowerInfer-2 includes an offline planner specifically designed to develop execution plans that optimally meet these varied requirements.

\

5.1 Execution Plan

\

5.2 Input Parameters

Table 2 also lists three categories of input parameters:

\ • Hardware: Parameters profiled from the hardware, such as CPU FLOPS, I/O throughput, and memory bandwidth.

\ • User: Parameters specified by the user, such as CPU constraints, memory limit, and lower bound of decoding speed.

\ • Model: Parameters about the model collected by an offline profiler, such as the size of the model, sparsity levels and caching characteristics, etc.

\

\

5.3 Cost Model

After collecting the input parameters, the planner uses a cost model to generate the execution plan. The goal is to maximize the generation speed s (as defined by Equation 1) while adhering to user-specified constraints (Formulas 3-5). The decoding speed s is inversely proportional to the time taken to decode one token (Equation 1), which is determined by the computation times for that token (Equation 2), as we efficiently overlap the computation and I/O operations. As we have defined the objective function and the constraints, the constructed model can be solved by mature SMT solvers. In our implementation, we utilize the Z3 solver [11] to solve the cost model.

\

\ To compute the decoding time, we first model the times for computation. As we observed that memory opeartion is not a significant factor compared to the computation, we do not consider it in the computation time. Computation time (Equation 6) is primarily influenced by the attention blocks, predictors, and FFN blocks. The calculation involves dividing the computational workload of these components by the CPU flops (defined in Equation 7- 8). The flops of the selected CPU cores are specified in Equations 9.

\

\ Table 2: Symbols used in execution planning.

\ As FFN block computation overlaps with neuron loading, the planner must also account for I/O transmission time. This is calculated by dividing the volume of neurons transferred from flash storage (Equation 10) by the I/O bandwidth. This transferred volume depends on both the activation rate and the cache miss rate.

\

\ Finally, the planner calculates the time to load neurons from memory, which relates to the weight sizes of attention blocks, predictors, and neurons activated at runtime. The memory time is determined by dividing the total weight of activated neurons for one token by the memory bandwidth (Equation 11).

\

6 Implementation

PowerInfer-2 is developed on top of PowerInfer [30], a stateof-the-art serving framework designed for sparsely-activated LLMs, by integrating an additional 12K lines of C++ code into PowerInfer [30]. These enhancements encompass several key areas, including the polymorphic neuron engine, neuron cache, flexible neuron loading, and neuron-cluster-level I/O pipeline.

\ Since PowerInfer-2 depends on privileged system APIs (e.g., mlock that locks pages in memory) that needs the root permission, we built it on the Android [5] platform. Even though there is no need to alter the system kernel, a rooted Android system still provides us with considerable flexibility in developing and debugging our system. Furthermore, PowerInfer-2 is inherently designed with no modifications to the kernel, making it easily portable to other operating systems, including iOS [14] platform.

\ The current implementation of PowerInfer-2 supports a diverse array of LLMs with varying model sizes, including Llama-2 family [27] (7B, 13B), TurboSparse-Mistral [31] (7B), and TurboSparse-Mixtral [31] (47B).

\ Table 3: Hardware specifications of smartphones we used in the evaluation. “DRAM” is the physical memory size. “Available” is the maximum memory size that can be occupied by an application.

\

:::info Authors:

(1) Zhenliang Xue, Co-first author from Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(2) Yixin Song, Co-first author from Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(3) Zeyu Mi, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University (yzmizeyu@sjtu.edu.cn);

(4) Le Chen, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(5) Yubin Xia, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(6) Haibo Chen, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University.

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Grayscale debuts first US spot crypto ETPs with staking

Grayscale debuts first US spot crypto ETPs with staking

The post Grayscale debuts first US spot crypto ETPs with staking appeared on BitcoinEthereumNews.com. Grayscale Investments has just launched the first US-listed spot crypto exchange-traded products (ETPs) offering staking. The Grayscale Ethereum Trust ETF (ETHE) and Grayscale Ethereum Mini Trust ETF (ETH) now enable Ether staking, while the Grayscale Solana Trust (GSOL) has activated staking capabilities ahead of its proposed uplisting as a spot Solana ETP. The move provides traditional brokerage investors with access to staking rewards — previously confined to native crypto platforms — through regulated vehicles. The products are not registered under the Investment Company Act of 1940, meaning they operate outside the framework governing traditional mutual funds and ETFs. Staking, the process of locking up tokens to secure proof-of-stake blockchains like Ethereum and Solana in exchange for rewards, introduces yield potential but also adds operational and network risks.  Grayscale said staking will be managed through institutional custodians and diversified validator networks to reduce single-party risk. This marks the first time US investors can access staking yield through exchange-traded exposure to Ethereum and Solana, expanding upon regulatory acceptance that began with spot Bitcoin ETFs in January 2024 and spot Ether ETFs in July 2024.  Grayscale CEO Peter Mintzberg called the initiative “first mover innovation,” underscoring the firm’s role in shaping institutional crypto access. This is a developing story. This article was generated with the assistance of AI and reviewed by editor Jeffrey Albus before publication. Get the news in your inbox. Explore Blockworks newsletters: Source: https://blockworks.co/news/grayscale-us-spot-crypto-etps-staking
Share
BitcoinEthereumNews2025/10/06 21:29
Spot XRP ETFs Nears $1B AUM Milestone as Streak of No Outflows Continues

Spot XRP ETFs Nears $1B AUM Milestone as Streak of No Outflows Continues

The post Spot XRP ETFs Nears $1B AUM Milestone as Streak of No Outflows Continues appeared on BitcoinEthereumNews.com. The U.S. Spot XRP ETFs is now near the $1 billion mark of assets under management in less than a month since their launch. This follows from the product maintaining consistent inflows with no single outflow recorded yet. XRP ETFs See Continuous Inflows Since Launch Since its first launch on November 14, spot XRP funds have seen continued inflows. According to data from SoSoValue, the total inflows into these funds have now risen to $881.25 million. The funds attracted $12.84 million of new money yesterday. The daily trading volumes remained stable at $26.74 million. Source: SoSoValue Reaching nearly $1 billion in less than 30 days makes the product among the fastest growing crypto investment products in the United States. Notably, Spot Solana ETFs also accumulated over $600 million since their launch. On the other hand, Bitcoin and Ethereum ETFs are holding about $58 billion and about $13 billion in assets under management respectively. Much of the early growth traces back to the first Canary Capital’s XRP ETF. Its opening on November 13 brought one of the strongest crypto ETF openings to date. It saw more than $59 million in first-day trading volume and $245 million in net inflows. Shortly after Canary’s launch, firms like Grayscale, Bitwise, and Franklin Templeton introduced their own XRP products. Bitwise’s fund also did well on its launch, recording over $105 million in early inflows. Meanwhile, the market is getting ready for yet another addition. 21Shares’ U.S. spot XRP fund also got the green light from the SEC. It will trade under the ticker TOXR on the Cboe BZX Exchange. XRP Products Keep Gaining Momentum in the Market The token’s funds continued to expand this week. REX Shares and Tuttle Capital have launched the T-REX 2X Long XRP Daily Target ETF. This new ETF allows traders…
Share
BitcoinEthereumNews2025/12/05 14:11
Headwind Helps Best Wallet Token

Headwind Helps Best Wallet Token

The post Headwind Helps Best Wallet Token appeared on BitcoinEthereumNews.com. Google has announced the launch of a new open-source protocol called Agent Payments Protocol (AP2) in partnership with Coinbase, the Ethereum Foundation, and 60 other organizations. This allows AI agents to make payments on behalf of users using various methods such as real-time bank transfers, credit and debit cards, and, most importantly, stablecoins. Let’s explore in detail what this could mean for the broader cryptocurrency markets, and also highlight a presale crypto (Best Wallet Token) that could explode as a result of this development. Google’s Push for Stablecoins Agent Payments Protocol (AP2) uses digital contracts known as ‘Intent Mandates’ and ‘Verifiable Credentials’ to ensure that AI agents undertake only those payments authorized by the user. Mandates, by the way, are cryptographically signed, tamper-proof digital contracts that act as verifiable proof of a user’s instruction. For example, let’s say you instruct an AI agent to never spend more than $200 in a single transaction. This instruction is written into an Intent Mandate, which serves as a digital contract. Now, whenever the AI agent tries to make a payment, it must present this mandate as proof of authorization, which will then be verified via the AP2 protocol. Alongside this, Google has also launched the A2A x402 extension to accelerate support for the Web3 ecosystem. This production-ready solution enables agent-based crypto payments and will help reshape the growth of cryptocurrency integration within the AP2 protocol. Google’s inclusion of stablecoins in AP2 is a massive vote of confidence in dollar-pegged cryptocurrencies and a huge step toward making them a mainstream payment option. This widens stablecoin usage beyond trading and speculation, positioning them at the center of the consumption economy. The recent enactment of the GENIUS Act in the U.S. gives stablecoins more structure and legal support. Imagine paying for things like data crawls, per-task…
Share
BitcoinEthereumNews2025/09/18 01:27