This article explores how GPT-4 was tested on BYTESIZED32 games to generate and explain object, action, and score rules. Researchers used deterministic prompts in JSON mode and compared AI-generated rules with human-written annotations to ensure accuracy. The study reveals both the potential and limits of LLMs in understanding game dynamics, with humans still needed to validate rules and gameplay integrity.This article explores how GPT-4 was tested on BYTESIZED32 games to generate and explain object, action, and score rules. Researchers used deterministic prompts in JSON mode and compared AI-generated rules with human-written annotations to ensure accuracy. The study reveals both the potential and limits of LLMs in understanding game dynamics, with humans still needed to validate rules and gameplay integrity.

Testing GPT-4 on Game State Predictions

Abstract and 1. Introduction and Related Work

  1. Methodology

    2.1 LLM-Sim Task

    2.2 Data

    2.3 Evaluation

  2. Experiments

  3. Results

  4. Conclusion

  5. Limitations and Ethical Concerns, Acknowledgements, and References

A. Model details

B. Game transition examples

C. Game rules generation

D. Prompts

E. GPT-3.5 results

F. Histograms

A Model details

For the GPT-3.5 model, we use the gpt-3.5-turbo-0125 model. For the GPT-4 model, we use the gpt-4-0125-preview model. For both models, the temperature is set to 0 to get deterministic results. We also turn on the JSON mode of both models, which ensures that the model gives a valid JSON response. Our experiments cost approximately $5,000 for OpenAI API usage.

B Game transition examples

We manually pick the wash-clothes game in BYTESIZED32 as the example game as it contains both state transitions driven by actions and game’s underlying dynamics. In tasks where the model predicts action transition, environment-driven transitions, or the game progress alone, we provide one corresponding in-context example. In the task that requires the model to predict everything, we offer two in-context examples in the prompt. The two examples are manually picked such that in one example the game state is changed directly by the action taken while in the other example the game state is changed by the game’s underlying dynamics.

C Game rules generation

C.1 LLM generated rules

For LLM generated rules, we manually check all of them to avoid misinformation and offensive content.

\ We prompt GPT-4 (gpt-4-0125-preview) with the code of each object class to acquire the rules of each object. We also provide one in-context example. We ask GPT-4 to describe the meaning of each critical property (i.e. properties that do not inherit from parent) of the object and the tick function of the object (i.e. a function that defines how object properties may change at each time step regardless of the action taken). Below is an example of our prompt of object rule generation:

\ \

\ \ For action rules generation, we prompt GPT-4 (gpt-4-0125-preview) with the code of the whole game, but unlike object rules, we do not offer any in-context example. We ask GPT-4 to describe each of the actions in the game. Below is an example of our prompt for action rule generation:

\ \

\ \ Similar to action rules, we generate score rules by prompting GPT-4 (gpt-4-0125-preview) with the code of the game and ask GPT-4 to describe how the game can be won or lose and how rewards can be earned. Below is an example of our prompt for score rule generation:

\ \

\

C.2 Human-Written Action Rules

The action rules describe how each action can change the game states. The expert annotator reads the game description and source code for each game. They went through the list of available actions in the game and their corresponding functions in the game. Each action rule has three main parts: Action, Description, and Rules. The Action specifies the name of the action (e.g., action). The Description explains the general purpose of the action (e.g., connect two objects with input terminals). The Rules is an unordered list of rule descriptions that describe the constraints of the action when interacting with different objects (e.g., At least one of the objects should be a wire or a multimeter) or how the rule might function under different conditions (e.g., Disconnect terminal if the terminal is already connected to other objects). To ensure accuracy, the annotator plays through the game and checks if the written object rules were correctly reflected in the gameplay.

C.3 Human-Written Object Rules

The object rules describe the meaning of each object property (e.g., temperature, size, weight, etc.) and how they will be changed at each time step. The expert annotators read the game description and source code for each game. They went through the object classes in the code script and wrote the object rules. Each object rule has three main parts: Object, Description, and Properties. The Object specifies the name of the object. The Description explains the general purpose of the object (e.g., GarbageCan is a container that can hold garbage). In the Description, the inheritance of the object class has been noted. The Properties is an unordered list of property descriptions that describe each property of that object (e.g., A Mold has its shape.) and their default value (e.g., By default, a GameObject is not combustible.) if the object is an abstract class. For objects with tick function, there is another property describing how an object may change under each tick. To ensure accuracy, the annotator plays through the game and checks if the written object rules were correctly reflected in the gameplay.

C.4 Human-Written Score Rules

Score rules describe the conditions to win or lose the game and how rewards can be earned. An expert annotator (one of the BYTESIZED32 game authors) creates the rules by reading the game description and the code of the score function.

\

:::info Authors:

(1) Ruoyao Wang, University of Arizona (ruoyaowang@arizona.edu);

(2) Graham Todd, New York University (gdrtodd@nyu.edu);

(3) Ziang Xiao, Johns Hopkins University (ziang.xiao@jhu.edu);

(4) Xingdi Yuan, Microsoft Research Montréal (eric.yuan@microsoft.com);

(5) Marc-Alexandre Côté, Microsoft Research Montréal (macote@microsoft.com);

(6) Peter Clark, Allen Institute for AI (PeterC@allenai.org).;

(7) Peter Jansen, University of Arizona and Allen Institute for AI (pajansen@arizona.edu).

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Market Opportunity
Mode Network Logo
Mode Network Price(MODE)
$0,0003994
$0,0003994$0,0003994
+%12,12
USD
Mode Network (MODE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

The Channel Factories We’ve Been Waiting For

The Channel Factories We’ve Been Waiting For

The post The Channel Factories We’ve Been Waiting For appeared on BitcoinEthereumNews.com. Visions of future technology are often prescient about the broad strokes while flubbing the details. The tablets in “2001: A Space Odyssey” do indeed look like iPads, but you never see the astronauts paying for subscriptions or wasting hours on Candy Crush.  Channel factories are one vision that arose early in the history of the Lightning Network to address some challenges that Lightning has faced from the beginning. Despite having grown to become Bitcoin’s most successful layer-2 scaling solution, with instant and low-fee payments, Lightning’s scale is limited by its reliance on payment channels. Although Lightning shifts most transactions off-chain, each payment channel still requires an on-chain transaction to open and (usually) another to close. As adoption grows, pressure on the blockchain grows with it. The need for a more scalable approach to managing channels is clear. Channel factories were supposed to meet this need, but where are they? In 2025, subnetworks are emerging that revive the impetus of channel factories with some new details that vastly increase their potential. They are natively interoperable with Lightning and achieve greater scale by allowing a group of participants to open a shared multisig UTXO and create multiple bilateral channels, which reduces the number of on-chain transactions and improves capital efficiency. Achieving greater scale by reducing complexity, Ark and Spark perform the same function as traditional channel factories with new designs and additional capabilities based on shared UTXOs.  Channel Factories 101 Channel factories have been around since the inception of Lightning. A factory is a multiparty contract where multiple users (not just two, as in a Dryja-Poon channel) cooperatively lock funds in a single multisig UTXO. They can open, close and update channels off-chain without updating the blockchain for each operation. Only when participants leave or the factory dissolves is an on-chain transaction…
Share
BitcoinEthereumNews2025/09/18 00:09
Ripple (XRP) CEO Brad Garlinghouse Makes Another Statement Regarding the Anticipated US Cryptocurrency Legislation

Ripple (XRP) CEO Brad Garlinghouse Makes Another Statement Regarding the Anticipated US Cryptocurrency Legislation

Ripple CEO Brad Garlinghouse, in his latest statement, once again expressed his support for the cryptocurrency legislation being debated in the US. Continue Reading
Share
Coinstats2026/01/22 05:30
Trump Dismisses Stock Market Dip as Minor While Solana and XRP Stand to Gain

Trump Dismisses Stock Market Dip as Minor While Solana and XRP Stand to Gain

Trump calls stock market dip “peanuts” and predicts big gains for Solana and XRP, despite recent market volatility and geopolitical tensions. President Donald Trump
Share
LiveBitcoinNews2026/01/22 06:00