delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking

This commit is contained in:
Andrej Karpathy
2026-01-03 23:55:28 +00:00
parent 48abd7d85f
commit aa42f40e66
10 changed files with 44 additions and 1029 deletions
-3
View File
@@ -16,9 +16,6 @@ if [ -z "$WANDB_RUN" ]; then
WANDB_RUN=dummy
fi
python -m nanochat.report reset
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
uv run maturin develop --release --manifest-path rustbpe/Cargo.toml
curl -L -o $NANOCHAT_BASE_DIR/identity_conversations.jsonl https://karpathy-public.s3.us-west-2.amazonaws.com/identity_conversations.jsonl
# train tokenizer on ~4B characters and kick off download of the rest for pretraining