This article details the experimental setup for evaluating RECKONING, a novel bi-level learning algorithm, on three diverse multi-hop logical reasoning datasetsThis article details the experimental setup for evaluating RECKONING, a novel bi-level learning algorithm, on three diverse multi-hop logical reasoning datasets

Evaluating Dynamic Knowledge Encoding: Experimental Setup for Multi-Hop Logical Reasoning

2025/10/24 09:15

Abstract and 1. Introduction

  1. Background

  2. Method

  3. Experiments

    4.1 Multi-hop Reasoning Performance

    4.2 Reasoning with Distractors

    4.3 Generalization to Real-World knowledge

    4.4 Run-time Analysis

    4.5 Memorizing Knowledge

  4. Related Work

  5. Conclusion, Acknowledgements, and References

\ A. Dataset

B. In-context Reasoning with Distractors

C. Implementation Details

D. Adaptive Learning Rate

E. Experiments with Large Language Models

4 Experiments

Setup We conduct our experiments on three datasets focusing on multi-hop logical reasoning over natural language knowledge: ProofWriter [73], which measures the model’s ability to emulate reasoning over facts and rules expressed in natural language; CLUTRR-SG [28], which is generated from the CLUTRR [71] benchmark, a logical reasoning task that involves reasoning over family relationships between entities grounded in first-order logical proofs; and FOLIO [29], a reasoning benchmark with first-order logical reasoning problems written by expert annotators based on real-world knowledge. Each problem in these datasets requires multiple reasoning hops to answer.[1]

\ We compare our method against the following baselines: (1) a fine-tuned model that performs a forward pass on only the question without access to the knowledge (No-Facts), (2) a fine-tuned model that performs a forward pass on only the knowledge without access to the question (No-Question), (3) a model trained using RECKONING with random knowledge that is not relevant to the questions (Random-Facts), and (4) an ICR baseline that concatenates the knowledge K with the question x in a single context and is trained using supervised learning to predict the answer (FT-ICR). Our first three baselines sanity-check whether any surface-level patterns in the questions and facts can be exploited to make accurate predictions. The last baseline compares RECKONING to the conventional way of reasoning with language models. Unless stated otherwise, we use the GPT-2-small [59] model (∼124M parameters) as our initialization and refer by RECKONING to our method trained with the multi-task objective. We compute each score from the average across three different runs. For more details on the implementation, datasets, and examples, see Appendix A and Appendix C.

\

:::info Authors:

(1) Zeming Chen, EPFL (zeming.chen@epfl.ch);

(2) Gail Weiss, EPFL (antoine.bosselut@epfl.ch);

(3) Eric Mitchell, Stanford University (eric.mitchell@cs.stanford.edu)';

(4) Asli Celikyilmaz, Meta AI Research (aslic@meta.com);

(5) Antoine Bosselut, EPFL (antoine.bosselut@epfl.ch).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] In ProofWriter, the number of reasoning hops is called the proof depth. To unify the presentation of the results, we use the term “hop” to describe the number of reasoning steps for both datasets.

Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

CME Group to launch options on XRP and SOL futures

CME Group to launch options on XRP and SOL futures

The post CME Group to launch options on XRP and SOL futures appeared on BitcoinEthereumNews.com. CME Group will offer options based on the derivative markets on Solana (SOL) and XRP. The new markets will open on October 13, after regulatory approval.  CME Group will expand its crypto products with options on the futures markets of Solana (SOL) and XRP. The futures market will start on October 13, after regulatory review and approval.  The options will allow the trading of MicroSol, XRP, and MicroXRP futures, with expiry dates available every business day, monthly, and quarterly. The new products will be added to the existing BTC and ETH options markets. ‘The launch of these options contracts builds on the significant growth and increasing liquidity we have seen across our suite of Solana and XRP futures,’ said Giovanni Vicioso, CME Group Global Head of Cryptocurrency Products. The options contracts will have two main sizes, tracking the futures contracts. The new market will be suitable for sophisticated institutional traders, as well as active individual traders. The addition of options markets singles out XRP and SOL as liquid enough to offer the potential to bet on a market direction.  The options on futures arrive a few months after the launch of SOL futures. Both SOL and XRP had peak volumes in August, though XRP activity has slowed down in September. XRP and SOL options to tap both institutions and active traders Crypto options are one of the indicators of market attitudes, with XRP and SOL receiving a new way to gauge sentiment. The contracts will be supported by the Cumberland team.  ‘As one of the biggest liquidity providers in the ecosystem, the Cumberland team is excited to support CME Group’s continued expansion of crypto offerings,’ said Roman Makarov, Head of Cumberland Options Trading at DRW. ‘The launch of options on Solana and XRP futures is the latest example of the…
Paylaş
BitcoinEthereumNews2025/09/18 00:56