nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1

2026-01-31 19:12:25 +00:00
parent 348fbb301b
commit 1ddaad1c1c
8 changed files with 389 additions and 715 deletions
@@ -45,9 +45,9 @@ python -m scripts.base_train \
 python -m scripts.base_loss --device-batch-size=1 --split-tokens=16384
 python -m scripts.base_eval --max-per-task=16

-# midtraining (~10 minutes on my MacBook Pro M3 Max)
+# SFT (~10 minutes on my MacBook Pro M3 Max)
 curl -L -o $NANOCHAT_BASE_DIR/identity_conversations.jsonl https://karpathy-public.s3.us-west-2.amazonaws.com/identity_conversations.jsonl
-python -m scripts.mid_train \
+python -m scripts.chat_sft \
    --max-seq-len=512 \
    --device-batch-size=32 \
    --total-batch-size=16384 \
@@ -56,13 +56,11 @@ python -m scripts.mid_train \
    --num-iterations=1500 \
    --run=$WANDB_RUN

-# (it's ~ok to skip SFT)
-
 # Chat with the model over CLI
 # The model should be able to say that it is Paris.
 # It might even know that the color of the sky is blue.
 # Sometimes the model likes it if you first say Hi before you ask it questions.
-# python -m scripts.chat_cli -i mid -p "What is the capital of France?"
+# python -m scripts.chat_cli -p "What is the capital of France?"

 # Chat with the model over a pretty WebUI ChatGPT style
-# python -m scripts.chat_web -i mid
+# python -m scripts.chat_web