nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

This commit is contained in:
Andrej Karpathy
2026-01-31 19:12:25 +00:00
parent 348fbb301b
commit 1ddaad1c1c
8 changed files with 389 additions and 715 deletions
+4 -6
View File
@@ -45,9 +45,9 @@ python -m scripts.base_train \
python -m scripts.base_loss --device-batch-size=1 --split-tokens=16384
python -m scripts.base_eval --max-per-task=16
# midtraining (~10 minutes on my MacBook Pro M3 Max)
# SFT (~10 minutes on my MacBook Pro M3 Max)
curl -L -o $NANOCHAT_BASE_DIR/identity_conversations.jsonl https://karpathy-public.s3.us-west-2.amazonaws.com/identity_conversations.jsonl
python -m scripts.mid_train \
python -m scripts.chat_sft \
--max-seq-len=512 \
--device-batch-size=32 \
--total-batch-size=16384 \
@@ -56,13 +56,11 @@ python -m scripts.mid_train \
--num-iterations=1500 \
--run=$WANDB_RUN
# (it's ~ok to skip SFT)
# Chat with the model over CLI
# The model should be able to say that it is Paris.
# It might even know that the color of the sky is blue.
# Sometimes the model likes it if you first say Hi before you ask it questions.
# python -m scripts.chat_cli -i mid -p "What is the capital of France?"
# python -m scripts.chat_cli -p "What is the capital of France?"
# Chat with the model over a pretty WebUI ChatGPT style
# python -m scripts.chat_web -i mid
# python -m scripts.chat_web