Acquista crypto Mercati Spot FuturesSPCX Earn Centro eventi

Altro

This article explores how GPT-4 and GPT-3.5 perform in simulated game environments under different conditions—human-written rules, LLM-generated rules, and no rules. The results reveal that GPT-4 significantly outpaces GPT-3.5, especially when rules are absent, underscoring its superior ability to apply common sense and accurately predict game states.This article explores how GPT-4 and GPT-3.5 perform in simulated game environments under different conditions—human-written rules, LLM-generated rules, and no rules. The results reveal that GPT-4 significantly outpaces GPT-3.5, especially when rules are absent, underscoring its superior ability to apply common sense and accurately predict game states.

GPT-4 vs GPT-3.5 Performance in Game Simulations

Autore: Hackernoon

Fonte: Hackernoon

2025/09/24 23:00

3 min di lettura

GAME$34.2716+3.22%

Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.

Table of Links

Abstract and 1. Introduction and Related Work

Methodology

2.1 LLM-Sim Task

2.2 Data

2.3 Evaluation
Experiments
Results
Conclusion
Limitations and Ethical Concerns, Acknowledgements, and References

A. Model details

B. Game transition examples

C. Game rules generation

D. Prompts

E. GPT-3.5 results

F. Histograms

D Prompts

The prompts introduced in this section includes game rules that can either be human written rules or LLM generated rules. For experiments without game rules, we simply remove the rules from the corresponding prompts.

D.1 Prompt Example: Fact

D.1 Prompt Example: Fact

\ D.1.2 State Difference Prediction

D.2 Prompt Example: Fenv

D.2.1 Full State Prediction

\ D.2.2 State Difference Prediction

D.3 Prompt Example: FR (Game Progress)

D.4 Prompt Example: F

D.4.1 Full State Prediction

\ D.4.2 State Difference Prediction

D.5 Other Examples

Below is an example of the rule of an action:

\ Below is an example of the rule of an object:

\ Below is an example of the score rule:

\ Below is an example of a game state:

\ Table 6: GPT-3.5 game progress prediction results

\ Below is an example of a JSON that describes the difference of two game states:

E GPT-3.5 results

Table 5 and Table 6 shows the performance of a GPT-3.5 simulator predicting objects properties and game progress respectively. There is a huge gap between the GPT-4 performance and GPT-3.5 performance, providing yet another example of how fast LLM develops in the two years. It is also worth notices that the performance difference is larger when no rules is provided, indicating that GPT-3.5 is especially weak at applying common sense knowledge to this few-shot world simulation task.

F Histograms

1. In Figure 3, we show detailed experimental results on the full state prediction task performed by GPT-4.

\ \ Table 7: Description of object properties mentioned in Figure 2

\ \ 2. In Figure 4, we show detailed experimental results on the state difference prediction task performed by GPT-4.

\ 3. In Figure 5, we show detailed experimental results on the full state prediction task performed by GPT-3.5.

\ 4. In Figure 6, we show detailed experimental results on the state difference prediction task performed by GPT-3.5.

\ \ (a) Human-generated rules.

\ \ \ (b) LLM-generated rules.

\ \ \ (c) No rules.

\ \ Figure 3: GPT-4 - Full State prediction from a) Human-generated rules, b) LLM-generated rules, and c) No rules.

\ \ (a) Human-generated rules.

\ \ \ (b) LLM-generated rules.

\ \ \ (c) No rules.

\ \ Figure 4: GPT-4 - Difference prediction from a) Human-generated rules, b) LLM-generated rules, and c) No rules.

\ \ (a) Human-generated rules.

\ \ \ (b) LLM-generated rules.

\ \ \ (c) No rules.

\ \ Figure 5: GPT-3.5 - Full State prediction from a) Human-generated rules, b) LLM-generated rules, and c) No rules.

\ \ (a) Human-generated rules.

\ \ \ (b) LLM-generated rules.

\ \ \ (c) No rules.

\ \ Figure 6: GPT-3.5 - Difference prediction from a) Human-generated rules, b) LLM-generated rules, and c) No rules.

\ \

:::info Authors:

(1) Ruoyao Wang, University of Arizona (ruoyaowang@arizona.edu);

(2) Graham Todd, New York University (gdrtodd@nyu.edu);

(3) Ziang Xiao, Johns Hopkins University (ziang.xiao@jhu.edu);

(4) Xingdi Yuan, Microsoft Research Montréal (eric.yuan@microsoft.com);

(5) Marc-Alexandre Côté, Microsoft Research Montréal (macote@microsoft.com);

(6) Peter Clark, Allen Institute for AI (PeterC@allenai.org).;

(7) Peter Jansen, University of Arizona and Allen Institute for AI (pajansen@arizona.edu).

:::

:::info This paper is available on arxiv under CC BY 4.0 license.

:::

Opportunità di mercato

Valore SQUID MEME (GAME)

$34.2716

$34.2716$34.2716

-4.13%

USD

Grafico dei prezzi in tempo reale di SQUID MEME (GAME)

Predict & Trade to Win Rewards

Guaranteed rewards with $500,000 prize pool

Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.