Detailing the process of training a conditional neural network to invert this observation model, starting with a randomly sampled program and incrementally denoising it to match a target output.Detailing the process of training a conditional neural network to invert this observation model, starting with a randomly sampled program and incrementally denoising it to match a target output.

Inverting the Observation Model: How to Generate Code from Any Output

2025/09/24 23:30

Abstract and 1. Introduction

  1. Background & Related Work

  2. Method

    3.1 Sampling Small Mutations

    3.2 Policy

    3.3 Value Network & Search

    3.4 Architecture

  3. Experiments

    4.1 Environments

    4.2 Baselines

    4.3 Ablations

  4. Conclusion, Acknowledgments and Disclosure of Funding, and References

Appendix

A. Mutation Algorithm

B. Context-Free Grammars

C. Sketch Simulation

D. Complexity Filtering

E. Tree Path Algorithm

F. Implementation Details

3.2 Policy

3.2.1 Forward Process

\ 3.2.2 Reverse Mutation Paths

\ Since we have access to the ground-truth mutations, we can generate targets to train a neural network by simply reversing the sampled trajectory through the forward process Markov-Chain, z0 → z1 → . . .. At first glance, this may seem a reasonable choice. However, training to simply invert the last mutation can potentially create a much noisier signal for the neural network.

\ Consider the case where, within a much larger syntax tree, a color was mutated as,

\

\

:::info Authors:

(1) Shreyas Kapur, University of California, Berkeley ([email protected]);

(2) Erik Jenner, University of California, Berkeley ([email protected]);

(3) Stuart Russell, University of California, Berkeley ([email protected]).

:::

:::info This paper is available on arxiv under CC BY-SA 4.0 DEED license.

:::

