Skip to content

test-time-training/discover

Repository files navigation

🔬 TTT-Discover

Learning to Discover at Test Time

Important

We'll refactor the code and share a much simpler API very soon! Sorry for the transition period..

arXiv Project Page License

Mert Yuksekgonul*, Daniel Koceja*, Xinhao Li*, Federico Bianchi*
Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou†, Carlos Guestrin†, Yu Sun*

Stanford · NVIDIA · Astera Institute · UC San Diego · Together AI


TTT-Discover performs reinforcement learning at test time, allowing the LLM to continue training with experience specific to the problem at hand. We achieve new state-of-the-art across mathematics, GPU kernels, algorithms, and biology.

Key Results

Mathematics
Erdős Overlap ↓
Kernel A100
TriMul ↓
Kernel H100
TriMul ↓
Algorithms
AtCoder ↑
Biology
Denoising ↑
Best Human 0.380927 4531 μs 1371 μs 566,997 0.64
Prev. Best AI 0.380924 558,026
TTT-Discover 0.380876 2198 μs 1161 μs 567,062 0.71

Installation

pip install -r requirements/requirements-math.txt

Set environment variables:

export TINKER_API_KEY="..."      
export WANDB_API_KEY="..."       
export WANDB_ENTITY="..."        

Task-specific requirements:

  • GPU kernels: requirements/requirements-gpumode.txt
  • AtCoder: requirements/requirements-ale.txt
  • Denoising: requirements/denoising/requirements-denoising.txt (see README)

Quick Start

Requires SLURM. Launch AC1 (autocorrelation inequality) on 4 nodes:

python main_tinker_submitit.py \
    --nodes 4 \
    --partition default \
    --cpus-per-task 100 \
    env=ac1 \
    model_name="openai/gpt-oss-120b" \
    sampler_type=puct_backprop \
    initial_exp_type=random \
    num_epochs=50 \
    wandb_project="my-project" \
    wandb_name="ac1-run-1"

Or use a preconfigured script:

bash scripts/tinker/ac1.sh

See docs/launching.md for all parameters and docs/intro.md for adding new tasks.

Domains

Mathematics — Classic open problems in combinatorics and analysis

Task Erdős Min. Overlap ↓ Autocorr. (AC1) ↓ Autocorr. (AC2) ↑
Best Human 0.380927 1.50973 0.9015
Prev. Best AI 0.380924 1.50314 0.9610
TTT-Discover 0.380876 1.50287 0.9591
Kernel Engineering — GPUMode TriMul competition for triangular matrix multiplication
Task A100 ↓ H100 ↓ B200 ↓ MI300x ↓
Best Human 4531 μs 1371 μs 1005 μs 2462 μs
TTT-Discover 2198 μs 1161 μs 905 μs 1596 μs
Algorithm Engineering — AtCoder Heuristic Contests on real-world optimization [AHC39] [AHC58]
Task AHC39 (Geometry) ↑ AHC58 (Scheduling) ↑
Best Human 566,997 847,674,723
Prev. Best AI 558,026 848,373,282
TTT-Discover 567,062 848,414,228
Biology — Single-cell RNA-seq denoising on OpenProblems benchmark
Task PBMC ↑ Tabula ↑
Best Human 0.64 0.64
TTT-Discover 0.71 0.73

Acknowledgments

This work builds on several outstanding projects and communities:

  • GPU Mode — Community for GPU kernel optimization and the TriMul competition
  • ALE-Bench — AtCoder-based benchmark for LLM evaluation
  • AlphaEvolve — DeepMind's evolutionary coding agent
  • OpenEvolve — Open-source implementation of AlphaEvolve
  • Tinker — LLM training recipes and RL framework

Citation

@article{ttt-discover2026,
  title   = {Learning to Discover at Test Time},
  author  = {Yuksekgonul, Mert and Koceja, Daniel and Li, Xinhao 
             and Bianchi, Federico and McCaleb, Jed and Wang, Xiaolong 
             and Kautz, Jan and Choi, Yejin and Zou, James 
             and Guestrin, Carlos and Sun, Yu},
  journal = {arXiv preprint arXiv:2601.16175},
  year    = {2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •