nanochat-omni/scripts at a641b6ca966fdabe81d8c30f25b287f3de9039a3 - nanochat-omni - Gitea: Git with a cup of tea

fam/nanochat-omni

Files

T

History

Mathieu Lacage a641b6ca96 MMLU main split is named auxiliary_train, not train

2026-03-13 13:19:10 +01:00

..

base_eval.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

base_train.py

All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.

2026-03-09 20:45:17 +00:00

chat_cli.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

chat_eval.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

chat_rl.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

chat_sft.py

MMLU main split is named auxiliary_train, not train

2026-03-13 13:19:10 +01:00

chat_web.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

tok_eval.py

initial commit

2025-10-13 06:49:24 -07:00

tok_train.py

quick fix to not OOM main speedrun script

2026-01-26 22:31:42 +00:00