1ccbaf4416
nit delete redundant catch/raise in execute
Andrej
2025-10-29 08:10:03 -07:00
29ff38d94b
Merge pull request #35 from bhaskar0210s/master
Andrej
2025-10-29 08:06:24 -07:00
b996131570
Merge branch 'master' into logo/kerning-update
svlandeg
2025-10-29 11:45:40 +01:00
3fa974f93c
few more reverts
svlandeg
2025-10-29 11:45:02 +01:00
cbd560a83d
revert formatting changes to minimize diff and merge conflicts
svlandeg
2025-10-29 11:42:56 +01:00
a1de1f46ad
Merge pull request #156 from tlepoint/fix/export-base-dir
Andrej
2025-10-28 15:19:08 -07:00
ee00f523d0
fixing all the typos to make the pull requests stop
Andrej
2025-10-28 13:36:07 -07:00
5e0987a431
numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list
Ajeesh Sunil
2025-10-28 20:05:38 +00:00
8c9b004c99
typo fixes in scripts
svlandeg
2025-10-28 20:17:31 +01:00
0a3ce7b0ff
typo fixes in readme
svlandeg
2025-10-28 20:11:00 +01:00
fdda5826e3
Merge branch 'haowei01-fix_kv_cache_due_to_resize'
Andrej Karpathy
2025-10-28 16:54:30 +00:00
baf0b3fdda
also add a test that failed before the fix and passes now with the fix for kv cache resize
Andrej Karpathy
2025-10-28 16:54:17 +00:00
f1db6b4712
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
Andrej Karpathy
2025-10-28 15:17:43 +00:00
9415931f85
delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm
Andrej Karpathy
2025-10-28 15:17:43 +00:00
2b9c085559
update the kv_shape
Haowei Zhang
2025-10-27 02:47:13 -07:00
b062b422ac
Fix kv cache, given resize will destroys the logical structure
Haowei Zhang
2025-10-27 02:23:08 -07:00
a9de4b1038
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
water-vapor
2025-10-26 01:43:49 -05:00
c75fe54aa7
readme tweak, link to new discussion and add file structure
Andrej Karpathy
2025-10-25 19:39:16 +00:00
fca2b8cd07
harden eval: prevent the calc tool from accessing globals and locals By passing empty globals() and locals() to eval() we can prevent simple malicious cases where the user gets the model to output something like
Marius Wachtler
2025-10-24 14:29:35 -05:00
05a051dbe9
fix tokenization bug, there should be no space before first letter. sigh
Andrej Karpathy
2025-10-24 15:06:06 +00:00
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
Andrej Karpathy
2025-10-24 14:02:48 +00:00
81597cd616
move the lr schedule args up in base_train so they are tunable in configurator
Andrej Karpathy
2025-10-24 13:27:31 +00:00
cc3636b01c
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
Andrej Karpathy
2025-10-24 13:27:05 +00:00
d5cda11ab8
Export the base dir variable
Tancrède Lepoint
2025-10-22 18:13:04 -04:00
5eeb2b6ef9
experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme
Andrej Karpathy
2025-10-22 16:55:54 +00:00
2dda5c4c8d
Merge branch 'ulanch-fix/ios-safari-input-overlap'
Andrej Karpathy
2025-10-22 16:26:35 +00:00
80b203ea59
also bump run1000.sh to new uv sync
Andrej Karpathy
2025-10-22 16:08:31 +00:00
917c858136
Updates lockfile with CPU package support without overwriting other architectures
Luke Stanley
2025-10-21 20:53:18 +00:00
db1d5b595d
Git ignore eval_bundle
Luke Stanley
2025-10-21 20:39:31 +00:00
dd9387b362
Fix GPU-less CPU use on Linux with specific Torch indexes
Luke Stanley
2025-10-21 19:52:21 +00:00
32571664b1
Fix Torch crash caused by pinning on CPU
Luke Stanley
2025-10-21 19:43:38 +00:00
51e70f0d3c
Merge branch 'lukestanley-fix-cpu-support-with-extras'
Andrej Karpathy
2025-10-22 16:11:15 +00:00
48387cd895
also bump run1000.sh to new uv sync
Andrej Karpathy
2025-10-22 16:08:31 +00:00
796f84527f
fix(ui): prevent iOS Safari toolbar from covering input on initial load
ulanch
2025-10-21 17:34:40 -07:00
7a52f9bfbb
Updates lockfile with CPU package support without overwriting other architectures
Luke Stanley
2025-10-21 20:53:18 +00:00
760af62e11
Git ignore eval_bundle
Luke Stanley
2025-10-21 20:39:31 +00:00
901b075605
Fix GPU-less CPU use on Linux with specific Torch indexes
Luke Stanley
2025-10-21 19:52:21 +00:00
defd1246aa
Fix Torch crash caused by pinning on CPU
Luke Stanley
2025-10-21 19:43:38 +00:00
2e938530ce
delete spurious torch.empty allocation in adamw
Andrej
2025-10-21 11:35:17 -07:00
a088b7a6ec
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
Andrej Karpathy
2025-10-21 18:07:33 +00:00
94ee507054
quick fix base eval due to fewshot requirement
Andrej Karpathy
2025-10-21 17:56:08 +00:00
33e8a27f91
Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something.
Andrej
2025-10-21 10:26:04 -07:00
50bea28ef9
also add readme mention of the cpu mps changes
Andrej Karpathy
2025-10-21 17:24:48 +00:00
5bdc99abfb
merge and resolve conflict
Andrej Karpathy
2025-10-21 17:19:10 +00:00
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
Andrej Karpathy
2025-10-21 17:15:53 +00:00
bb71c64579
fix silly issue in dataloader, this version is much faster and more portable to mps too
Andrej Karpathy
2025-10-21 17:12:50 +00:00
bb786c5560
i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
karpathy
2025-10-21 10:07:40 -07:00
c9ea7a91e2
Add customization instructions to README
Andrej
2025-10-21 08:57:10 -07:00
03cddd9878
actually let's not brick code on git pull. change error to warning
Andrej Karpathy
2025-10-21 15:13:25 +00:00
fe5aed940b
add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
Andrej Karpathy
2025-10-21 15:04:58 +00:00
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
karpathy
2025-10-20 10:15:17 -07:00
a09ac812ed
toml changes for cpu only install
Andrej
2025-10-20 07:53:15 -07:00
0abb0fa2e3
add both sides of the source check
burtenshaw
2025-10-20 10:44:07 +02:00
c7ae920a77
add check for linux on cpu
burtenshaw
2025-10-20 06:51:52 +02:00
0f007889dd
Add MIT License as a file to the project
Andrej
2025-10-19 17:22:19 -07:00
5a879f4947
export NANOCHAT_BASE_DIR so child processes get it too
Andrej
2025-10-19 17:07:56 -07:00
c1d2ed1c13
use orig_model in sampling, silly of me to miss this
Andrej Karpathy
2025-10-20 00:05:09 +00:00
2bc521a6de
use orig_model in sampling, silly of me to miss this
Andrej Karpathy
2025-10-20 00:04:15 +00:00
9467d83cf2
fix memory leak bug in rust tokenizer ty @mitsuhiko
Andrej Karpathy
2025-10-19 23:54:31 +00:00
b1443dc98c
export NANOCHAT_BASE_DIR so child processes get it too
Tancrède Lepoint
2025-10-19 14:05:40 -04:00
2b58e2dd2a
Update logo in code as well
obxium
2025-10-18 09:31:11 -04:00
cf2baf9933
fix typo
Andrej
2025-10-17 08:35:41 -07:00
e4f9b9c64d
revert to previous pyproject.toml
karpathy
2025-10-17 08:08:16 -07:00
e883b1d597
Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
Andrej
2025-10-17 07:24:38 -07:00
ed519b0f24
Update engine.py with correct error message on assert
Phúc H. Lê Khắc
2025-10-17 17:21:25 +07:00
23b6351c1c
add groups and source selection
burtenshaw
2025-10-17 12:20:18 +02:00
ae02650afe
update the midtraining script too
karpathy
2025-10-16 16:33:17 -07:00
df600b6ed5
many small tweaks. base, eval, core work now i think
karpathy
2025-10-16 15:46:18 -07:00
d6d86cbf4c
update readme with a link to the CPU|MPS branch
Andrej Karpathy
2025-10-16 22:03:39 +00:00
ccfe7915ac
mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
Andrej Karpathy
2025-10-16 19:32:44 +00:00
786119d593
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
karpathy
2025-10-16 10:26:19 -07:00
279b74312c
adjust comment/guidance on device type
karpathy
2025-10-16 10:06:39 -07:00
306bc380ab
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
karpathy
2025-10-16 10:04:43 -07:00
722da4f543
trying to add basic cpu support, will try mps too
Andrej Karpathy
2025-10-16 16:14:38 +00:00
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
Andrej Karpathy
2025-10-16 01:28:37 +00:00
2846999b8f
allow user to click on their message to edit them. conversation after that point is wiped
Andrej Karpathy
2025-10-16 01:16:22 +00:00
92d52ecc92
add slash commands to webui
Andrej Karpathy
2025-10-16 01:09:53 +00:00
fae3aca951
add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
Andrej Karpathy
2025-10-15 20:32:22 +00:00
4c3590c499
fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
Andrej Karpathy
2025-10-15 20:29:54 +00:00
03fa673b7d
add basic logging to chat_web, which i think might be fun
Andrej Karpathy
2025-10-15 19:51:06 +00:00
52bfeea8bd
add very basic abuse prevention limits to chat_web so it's ok to host endpoints
Andrej Karpathy
2025-10-15 19:42:54 +00:00
01fb290f53
allow multiple GPUs to do inference in a data parallel way
Andrej Karpathy
2025-10-15 19:12:19 +00:00
190d9515d0
dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
Andrej Karpathy
2025-10-15 16:42:23 +00:00
b8076dd367
fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
Andrej Karpathy
2025-10-15 16:35:04 +00:00
67aaca98f5
export NANOCHAT_BASE_DIR so child processes get it too
Andrej
2025-10-14 16:01:28 -07:00
938cb31f1a
Update logo
obxium
2025-10-14 14:19:44 -04:00