In this section, we introduce Theorem 2, an important theoretical extension showing that merely giving a Transformer additional unguided computing space does not overcome the "local reasoning barrier." In the context of a "agnostic scratchpad," the model is permitted to produce a polynomial-length sequence of intermediate tokens (a scratchpad) with no oversight over their content. This theorem generalizes the earlier finding (Theorem 1). The high-locality "cycle task," which requires the model to identify if three particular nodes in a graph belong to the same cycle, is still the task at hand.In this section, we introduce Theorem 2, an important theoretical extension showing that merely giving a Transformer additional unguided computing space does not overcome the "local reasoning barrier." In the context of a "agnostic scratchpad," the model is permitted to produce a polynomial-length sequence of intermediate tokens (a scratchpad) with no oversight over their content. This theorem generalizes the earlier finding (Theorem 1). The high-locality "cycle task," which requires the model to identify if three particular nodes in a graph belong to the same cycle, is still the task at hand.

Research from Apple and EPFL Explains Why AI Models Can’t Truly “Reason” Yet

Abstract and 1. Introduction

1.1 Syllogisms composition

1.2 Hardness of long compositions

1.3 Hardness of global reasoning

1.4 Our contributions

  1. Results on the local reasoning barrier

    2.1 Defining locality and auto-regressive locality

    2.2 Transformers require low locality: formal results

    2.3 Agnostic scratchpads cannot break the locality

  2. Scratchpads to break the locality

    3.1 Educated scratchpad

    3.2 Inductive Scratchpads

  3. Conclusion, Acknowledgments, and References

A. Further related literature

B. Additional experiments

C. Experiment and implementation details

D. Proof of Theorem 1

E. Comment on Lemma 1

F. Discussion on circuit complexity connections

G. More experiments with ChatGPT

D Proof of Theorem 1

\

\

\ At this point, we are finally ready to prove Theorem 1 as follows.

\

\

\

\

\

D.1 Extension to agnostic scratchpads

Theorem 1 can also be generalised to Transformers trained with agnostic scratchpads in order to get the following.

\ Theorem 2. Let G be a directed graph which consists of a cycle of length 3n with probability 2/3 and 3 cycles of length n otherwise. Next, if there are 3 cycles pick one vertex from each and if there is one cycle pick three vertices that are each n edges apart. Then, label uniformly at random these vertices with a0, b0, c0. Next, number every other vertex by the distance from one of these three to it, and for each i, label uniformly at random the vertices at distance i by ai, bi, and ci and store in X the edges between ai − 1, bi − 1, ci − 1 and ai, bi, ci; i.e.

\

\ where e(v) represents the vertex that v’s edge points to, all of the instances of i or i + 1 should have the appropriate value substituted in and the symbols in black should be used exactly as stated. See Figure 2 for an example. Finally, let Y report whether a0, b0, c_0 are in the same cycle or not. Now, consider training a T-regular neural network with a scratchpad of polynomial length on (X, Y ) generated in this manner. For any given (X, Y ), we will regard the net’s loss on (X, Y ) as the expectation over all possible scratchpads that it might generate on X of the loss of its eventual output. If we train it on (X, Y ) using population[14] gradient descent with polynomial hyperparameters[15] and a differentiable loss function then the network fails to weakly learn to compute Y.

\ The proof of this theorem is not meaningfully different from the proof of the previous version, but for completeness we include it below.

\

\

\

\ That in turn means that

\

\

E Comment on Lemma 1

For S such that |S| < n, X[S] is independent of Y , since the distribution of such subsets of edges is the same for both classes.

\ Let S be such that |S| = n. Let ZS be the ternary random variable that records whether there is a cycle or an open path on S. Then,

\ Thus

\

\ Therefore, even for sets of size n, the mutual information is exponentially low, implying that loc(D) is greater than n + 1

\

:::info Authors:

(1) Emmanuel Abbe, Apple and EPFL;

(2) Samy Bengio, Apple;

(3) Aryo Lotf, EPFL;

(4) Colin Sandon, EPFL;

(5) Omid Saremi, Apple.

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

[14] This would also be true for batch GD with batches of size n c with c chosen as a function of the other hyperparameters.

\ [15] I.e., either polynomial learning rate, polynomial clipping [12, 31], and weights stored using a logarithmic number of bits of precision and random rounding: for a < b < c if b needs to be rounded to a or c then it rounds to c with probability (b − a)/(c − a), or with polynomial learning rate, polynomial clipping and polynomial noise added to the gradients.

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Ozak AI Crosses $3.2 Million Raised—How Early Investors Are Now Sitting On 100x Returns And Triple-Digit Profit Percentages

Ozak AI Crosses $3.2 Million Raised—How Early Investors Are Now Sitting On 100x Returns And Triple-Digit Profit Percentages

The post Ozak AI Crosses $3.2 Million Raised—How Early Investors Are Now Sitting On 100x Returns And Triple-Digit Profit Percentages appeared on BitcoinEthereumNews.com. Ozak AI’s presale has reached an impressive milestone, raising over $3.2 million and selling over 905 million $OZ tokens. With the presale now in Phase 6, the price of $OZ stands at $0.012, offering a unique opportunity for investors. As the presale continues, the price will grow to $0.014 at the next stage, as the presale proceeds, which will indicate the presence of a great growth prospect. This has already seen first movers make gains of up to 100 times their original investment, and it is currently one of the most anticipated crypto events in recent months. It will continue to increase in price, and the final goal will be one dollar per token, with early investors having the ability to get high percentages of profits. Presale Details and Upcoming Milestones The presale has been an interesting event, with a total of 905 million tokens being sold and raising a total of $3,270,894.70. There has been a rush by investors to purchase their tokens at the prevailing price of $0.012. The price will rise to $0.014 during the next stage of the presale, which will also boost the returns of the early adopters. The presale gives a target price in the future of $1.00 per token, giving the investors an opportunity to enjoy the returns of up to 100x, in addition to the triple-digit percentage profits. The presale is getting attention not only because of its price trend but also due to the high-tech underpinning behind Ozak AI, which is a combination of machine learning frameworks and blockchain technology. The combination has created a buzz on the possibility of real-time market forecasting and risk assessment. The increasing number of partnerships with the Pyth Network and Dex3, among others, is also enticing investors, as it will improve the data feeds and…
Share
BitcoinEthereumNews2025/09/18 20:42
Utah Man Receives 3-Year Sentence For $3M Deceptive Exchange Scheme

Utah Man Receives 3-Year Sentence For $3M Deceptive Exchange Scheme

The post Utah Man Receives 3-Year Sentence For $3M Deceptive Exchange Scheme appeared on BitcoinEthereumNews.com. Crypto Fraud Exposed: Utah Man Receives 3-Year
Share
BitcoinEthereumNews2026/01/16 11:56
Zero Knowledge Proof (ZKP) Set To Explode 3000x, Surpassing POL And Ethereum As The Next Crypto Breakout

Zero Knowledge Proof (ZKP) Set To Explode 3000x, Surpassing POL And Ethereum As The Next Crypto Breakout

Explore Zero Knowledge Proof (ZKP) as it targets 3000x gains, outperforming POL and Ethereum while capturing major attention from crypto investors worldwide.
Share
CoinLive2026/01/16 12:00