This website requires JavaScript.
Explore
Help
Sign In
fam
/
nanochat-omni
Watch
1
Star
0
Fork
0
You've already forked nanochat-omni
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
7b7fd0fe71cf496304d0b8d4e3571c2fc412356b
nanochat-omni
/
nanochat
T
History
Andrej Karpathy
c6abcdfe3a
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
2025-11-13 15:34:40 +00:00
..
__init__.py
initial commit
2025-10-13 06:49:24 -07:00
adamw.py
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
checkpoint_manager.py
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
2025-11-13 15:34:40 +00:00
common.py
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
2025-11-13 15:34:40 +00:00
configurator.py
initial commit
2025-10-13 06:49:24 -07:00
core_eval.py
initial commit
2025-10-13 06:49:24 -07:00
dataloader.py
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
2025-11-13 15:34:40 +00:00
dataset.py
initial commit
2025-10-13 06:49:24 -07:00
engine.py
minor fix comment in engine
2025-11-13 15:28:18 +00:00
execution.py
nit delete redundant catch/raise in execute
2025-10-29 08:10:03 -07:00
gpt.py
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
2025-10-21 18:07:33 +00:00
logo.svg
initial commit
2025-10-13 06:49:24 -07:00
loss_eval.py
Merge pull request
#35
from bhaskar0210s/master
2025-10-29 08:06:24 -07:00
muon.py
initial commit
2025-10-13 06:49:24 -07:00
report.py
ensure consistency of quotes within each statement
2025-11-03 21:52:02 +01:00
tokenizer.py
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
2025-10-24 13:27:05 +00:00
ui.html
fix(ui): prevent iOS Safari toolbar from covering input on initial load
2025-10-21 17:34:40 -07:00