This article details the experimental setup for evaluating RECKONING, a novel bi-level learning algorithm, on three diverse multi-hop logical reasoning datasetsThis article details the experimental setup for evaluating RECKONING, a novel bi-level learning algorithm, on three diverse multi-hop logical reasoning datasets

Evaluating Dynamic Knowledge Encoding: Experimental Setup for Multi-Hop Logical Reasoning

2025/10/24 09:15

Abstract and 1. Introduction

  1. Background

  2. Method

  3. Experiments

    4.1 Multi-hop Reasoning Performance

    4.2 Reasoning with Distractors

    4.3 Generalization to Real-World knowledge

    4.4 Run-time Analysis

    4.5 Memorizing Knowledge

  4. Related Work

  5. Conclusion, Acknowledgements, and References

\ A. Dataset

B. In-context Reasoning with Distractors

C. Implementation Details

D. Adaptive Learning Rate

E. Experiments with Large Language Models

4 Experiments

Setup We conduct our experiments on three datasets focusing on multi-hop logical reasoning over natural language knowledge: ProofWriter [73], which measures the model’s ability to emulate reasoning over facts and rules expressed in natural language; CLUTRR-SG [28], which is generated from the CLUTRR [71] benchmark, a logical reasoning task that involves reasoning over family relationships between entities grounded in first-order logical proofs; and FOLIO [29], a reasoning benchmark with first-order logical reasoning problems written by expert annotators based on real-world knowledge. Each problem in these datasets requires multiple reasoning hops to answer.[1]

\ We compare our method against the following baselines: (1) a fine-tuned model that performs a forward pass on only the question without access to the knowledge (No-Facts), (2) a fine-tuned model that performs a forward pass on only the knowledge without access to the question (No-Question), (3) a model trained using RECKONING with random knowledge that is not relevant to the questions (Random-Facts), and (4) an ICR baseline that concatenates the knowledge K with the question x in a single context and is trained using supervised learning to predict the answer (FT-ICR). Our first three baselines sanity-check whether any surface-level patterns in the questions and facts can be exploited to make accurate predictions. The last baseline compares RECKONING to the conventional way of reasoning with language models. Unless stated otherwise, we use the GPT-2-small [59] model (∼124M parameters) as our initialization and refer by RECKONING to our method trained with the multi-task objective. We compute each score from the average across three different runs. For more details on the implementation, datasets, and examples, see Appendix A and Appendix C.

\

:::info Authors:

(1) Zeming Chen, EPFL (zeming.chen@epfl.ch);

(2) Gail Weiss, EPFL (antoine.bosselut@epfl.ch);

(3) Eric Mitchell, Stanford University (eric.mitchell@cs.stanford.edu)';

(4) Asli Celikyilmaz, Meta AI Research (aslic@meta.com);

(5) Antoine Bosselut, EPFL (antoine.bosselut@epfl.ch).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] In ProofWriter, the number of reasoning hops is called the proof depth. To unify the presentation of the results, we use the term “hop” to describe the number of reasoning steps for both datasets.

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Crypto-Fueled Rekt Drinks Sells 1 Millionth Can Amid MoonPay Collab

Crypto-Fueled Rekt Drinks Sells 1 Millionth Can Amid MoonPay Collab

The post Crypto-Fueled Rekt Drinks Sells 1 Millionth Can Amid MoonPay Collab appeared on BitcoinEthereumNews.com. In brief Rekt Brands sold its 1 millionth can of its Rekt Drinks flavored sparkling water. The Web3 firm collaborated with payments infrastructure company MoonPay on a peach-raspberry flavor called “Moon Crush.” Rekt incentivizes purchasers of its drinks with the REKT token, which hit an all-time high market cap of $583 million in August. Web3 consumer firm Rekt Brands sold its 1 millionth can of its Rekt Drinks sparkling water on Friday, surpassing its first major milestone with the sold-out drop of its “Moon Crush” flavor—a peach raspberry-flavored collaboration with payments infrastructure firm MoonPay.  The sale follows Rekt’s previous sellout collaborations with leading Web3 brands like Solana DeFi protocol Jupiter, Ethereum layer-2 network Abstract, and Coinbase’s layer-2 network, Base. Rekt has already worked with a number of crypto-native brands, but says it has been choosy when cultivating collabs. “We have received a large amount of incoming enquiries from some of crypto’s biggest brands, but it’s super important for us to be selective in order to maintain the premium feel of Rekt,” Rekt Brands co-founder and CEO Ovie Faruq told Decrypt.  (Disclosure: Ovie Faruq’s Canary Labs is an investor in DASTAN, the parent company of Decrypt.) “We look to work with brands who are able to form partnerships that we feel are truly strategic to Rekt’s goal of becoming one of the largest global beverage brands,” he added. In particular, Faruq highlighted MoonPay’s role as a “gateway” between non-crypto and crypto users as a reason the collaboration made “perfect sense.”  “We’re thrilled to bring something to life that is both delicious and deeply connected to the crypto community,” MoonPay President Keith Grossman told Decrypt.  Rekt Brands has been bridging the gap between Web3 and the real world with sales of its sparkling water since November 2024. In its first sale,…
Share
BitcoinEthereumNews2025/09/20 09:24