nanochat-omni/nanochat at a8847a0f83d7a60be09d6762800feb204c3b4b64 - nanochat-omni - Gitea: Git with a cup of tea

fam/nanochat-omni

Files

T

History

Sam Abrahams 11e68bf442 Fix comment: rotary embeddings final dimension size

2025-11-17 11:32:56 -05:00

..

__init__.py

initial commit

2025-10-13 06:49:24 -07:00

adamw.py

fix: remove unnecessary tensor allocation in DistAdamW optimizer

2025-10-20 12:03:26 +03:00

checkpoint_manager.py

big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.

2025-11-13 15:34:40 +00:00

common.py

big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.

2025-11-13 15:34:40 +00:00

configurator.py

initial commit

2025-10-13 06:49:24 -07:00

core_eval.py

initial commit

2025-10-13 06:49:24 -07:00

dataloader.py

big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.

2025-11-13 15:34:40 +00:00

dataset.py

initial commit

2025-10-13 06:49:24 -07:00

engine.py

Fix torch.dtype mismatching when running engine inline test.

2025-11-14 07:28:29 -08:00

execution.py

nit delete redundant catch/raise in execute

2025-10-29 08:10:03 -07:00

gpt.py

Fix comment: rotary embeddings final dimension size

2025-11-17 11:32:56 -05:00

logo.svg

initial commit

2025-10-13 06:49:24 -07:00

loss_eval.py

fix typos

2025-11-14 11:20:25 +01:00

muon.py

initial commit

2025-10-13 06:49:24 -07:00

report.py

ensure consistency of quotes within each statement

2025-11-03 21:52:02 +01:00

tokenizer.py

allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough

2025-10-24 13:27:05 +00:00

ui.html

fix(ui): prevent iOS Safari toolbar from covering input on initial load

2025-10-21 17:34:40 -07:00