This article evaluates RECKONING's generalizability on the real-world multi-hop logical reasoning task, FOLIO.This article evaluates RECKONING's generalizability on the real-world multi-hop logical reasoning task, FOLIO.

RECKONING: Reasoning through Dynamic Knowledge Encoding: Generalization to Real-World knowledge

2025/10/25 00:41

Abstract and 1. Introduction

  1. Background

  2. Method

  3. Experiments

    4.1 Multi-hop Reasoning Performance

    4.2 Reasoning with Distractors

    4.3 Generalization to Real-World knowledge

    4.4 Run-time Analysis

    4.5 Memorizing Knowledge

  4. Related Work

  5. Conclusion, Acknowledgements, and References

\ A. Dataset

B. In-context Reasoning with Distractors

C. Implementation Details

D. Adaptive Learning Rate

E. Experiments with Large Language Models

4.3 Generalization to Real-World knowledge

To investigate how generalizable our method is to real-world knowledge beyond the synthetic setting, we evaluate RECKONING on a more real-world multi-hop logical reasoning task, FOLIO [29], and report the result in Table 2. The dataset has a rich vocabulary, diverse logic patterns, and abundant language variations. It has been shown to challenge LLMs in both supervised fine-tuning and in-context learning settings. We fine-tune the GPT-2 model following the in-context reasoning setting as the baseline. As before, we train the GPT-2 model and RECKONING using the multi-task objective. We also compare to more advanced baselines, including GPT-3.5 (text-davinci-003 [55]) and ChatGPT(gpt-3.5-turbo[2]), two popular large language models with around 175B parameters. For these two large models, we evaluate both in the zero-shot and few-shot settings. In the few-shot setting, we prompt the model with 8 single-task examples randomly sampled from the training set to perform in-context learning. We find that RECKONING’s performance (which is initiated here from GPT-2) is better than the GPT-2 in-context reasoning baseline. Compared to the two advanced large language models, RECKONING outperforms them by a significant margin (12% 0-shot and 7% 8-shot). We conclude that RECKONING is effective and significantly benefits reasoning tasks using real-world knowledge.

\ Table 2: Evaluation results on FOLIO. We compare RECKONING against the FT-ICR baseline with GPT-2 and two popular large language models.

\

:::info Authors:

(1) Zeming Chen, EPFL (zeming.chen@epfl.ch);

(2) Gail Weiss, EPFL (antoine.bosselut@epfl.ch);

(3) Eric Mitchell, Stanford University (eric.mitchell@cs.stanford.edu)';

(4) Asli Celikyilmaz, Meta AI Research (aslic@meta.com);

(5) Antoine Bosselut, EPFL (antoine.bosselut@epfl.ch).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

2 https://openai.com/blog/chatgpt

Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

Polygon Tops RWA Rankings With $1.1B in Tokenized Assets

The post Polygon Tops RWA Rankings With $1.1B in Tokenized Assets appeared on BitcoinEthereumNews.com. Key Notes A new report from Dune and RWA.xyz highlights Polygon’s role in the growing RWA sector. Polygon PoS currently holds $1.13 billion in RWA Total Value Locked (TVL) across 269 assets. The network holds a 62% market share of tokenized global bonds, driven by European money market funds. The Polygon POL $0.25 24h volatility: 1.4% Market cap: $2.64 B Vol. 24h: $106.17 M network is securing a significant position in the rapidly growing tokenization space, now holding over $1.13 billion in total value locked (TVL) from Real World Assets (RWAs). This development comes as the network continues to evolve, recently deploying its major “Rio” upgrade on the Amoy testnet to enhance future scaling capabilities. This information comes from a new joint report on the state of the RWA market published on Sept. 17 by blockchain analytics firm Dune and data platform RWA.xyz. The focus on RWAs is intensifying across the industry, coinciding with events like the ongoing Real-World Asset Summit in New York. Sandeep Nailwal, CEO of the Polygon Foundation, highlighted the findings via a post on X, noting that the TVL is spread across 269 assets and 2,900 holders on the Polygon PoS chain. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 Key Trends From the 2025 RWA Report The joint publication, titled “RWA REPORT 2025,” offers a comprehensive look into the tokenized asset landscape, which it states has grown 224% since the start of 2024. The report identifies several key trends driving this expansion. According to…
Paylaş
BitcoinEthereumNews2025/09/18 00:40