Files
nanochat-omni/dev
Andrej Karpathy 43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training
The new DataLoader ensures that every token sequence in train/val batches has a BOS token
at the beginning. Therefore, no token streams start abruptly in the middle of a document,
which could be confusing for the model. Note that this changes the loss scale because there
are fewer confusing tokens in the train/val batches. The main downside is that we now waste
about 35% of tokens due to cropping. This is ok because we have a lot of data. See dev/LOG.md
entry for this change for a lot more information.
2026-01-13 20:05:47 +00:00
..
2026-01-04 19:29:22 +00:00
2025-10-13 06:49:24 -07:00
2025-10-14 14:19:44 -04:00
2025-10-13 06:49:24 -07:00