nanochat-omni

fam/nanochat-omni

Fork 0

Commit Graph

Select branches

Hide Pull Requests

main

mochi/issue-4

pre-fork

#5

1ccbaf4416 nit delete redundant catch/raise in execute Andrej 2025-10-29 08:10:03 -07:00
29ff38d94b Merge pull request #35 from bhaskar0210s/master Andrej 2025-10-29 08:06:24 -07:00
b996131570 Merge branch 'master' into logo/kerning-update svlandeg 2025-10-29 11:45:40 +01:00
3fa974f93c few more reverts svlandeg 2025-10-29 11:45:02 +01:00
cbd560a83d revert formatting changes to minimize diff and merge conflicts svlandeg 2025-10-29 11:42:56 +01:00
a1de1f46ad Merge pull request #156 from tlepoint/fix/export-base-dir Andrej 2025-10-28 15:19:08 -07:00
ee00f523d0 fixing all the typos to make the pull requests stop Andrej 2025-10-28 13:36:07 -07:00
5e0987a431 numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list Ajeesh Sunil 2025-10-28 20:05:38 +00:00
8c9b004c99 typo fixes in scripts svlandeg 2025-10-28 20:17:31 +01:00
0a3ce7b0ff typo fixes in readme svlandeg 2025-10-28 20:11:00 +01:00
fdda5826e3 Merge branch 'haowei01-fix_kv_cache_due_to_resize' Andrej Karpathy 2025-10-28 16:54:30 +00:00
baf0b3fdda also add a test that failed before the fix and passes now with the fix for kv cache resize Andrej Karpathy 2025-10-28 16:54:17 +00:00
f1db6b4712 delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm Andrej Karpathy 2025-10-28 15:17:43 +00:00
9415931f85 delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm Andrej Karpathy 2025-10-28 15:17:43 +00:00
2b9c085559 update the kv_shape Haowei Zhang 2025-10-27 02:47:13 -07:00
b062b422ac Fix kv cache, given resize will destroys the logical structure Haowei Zhang 2025-10-27 02:23:08 -07:00
a9de4b1038 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1 water-vapor 2025-10-26 01:43:49 -05:00
c75fe54aa7 readme tweak, link to new discussion and add file structure Andrej Karpathy 2025-10-25 19:39:16 +00:00
fca2b8cd07 harden eval: prevent the calc tool from accessing globals and locals By passing empty globals() and locals() to eval() we can prevent simple malicious cases where the user gets the model to output something like Marius Wachtler 2025-10-24 14:29:35 -05:00
05a051dbe9 fix tokenization bug, there should be no space before first letter. sigh Andrej Karpathy 2025-10-24 15:06:06 +00:00
8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think Andrej Karpathy 2025-10-24 14:02:48 +00:00
81597cd616 move the lr schedule args up in base_train so they are tunable in configurator Andrej Karpathy 2025-10-24 13:27:31 +00:00
cc3636b01c allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough Andrej Karpathy 2025-10-24 13:27:05 +00:00
d5cda11ab8 Export the base dir variable Tancrède Lepoint 2025-10-22 18:13:04 -04:00
5eeb2b6ef9 experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme Andrej Karpathy 2025-10-22 16:55:54 +00:00
2dda5c4c8d Merge branch 'ulanch-fix/ios-safari-input-overlap' Andrej Karpathy 2025-10-22 16:26:35 +00:00
80b203ea59 also bump run1000.sh to new uv sync Andrej Karpathy 2025-10-22 16:08:31 +00:00
917c858136 Updates lockfile with CPU package support without overwriting other architectures Luke Stanley 2025-10-21 20:53:18 +00:00
db1d5b595d Git ignore eval_bundle Luke Stanley 2025-10-21 20:39:31 +00:00
dd9387b362 Fix GPU-less CPU use on Linux with specific Torch indexes Luke Stanley 2025-10-21 19:52:21 +00:00
32571664b1 Fix Torch crash caused by pinning on CPU Luke Stanley 2025-10-21 19:43:38 +00:00
51e70f0d3c Merge branch 'lukestanley-fix-cpu-support-with-extras' Andrej Karpathy 2025-10-22 16:11:15 +00:00
48387cd895 also bump run1000.sh to new uv sync Andrej Karpathy 2025-10-22 16:08:31 +00:00
796f84527f fix(ui): prevent iOS Safari toolbar from covering input on initial load ulanch 2025-10-21 17:34:40 -07:00
7a52f9bfbb Updates lockfile with CPU package support without overwriting other architectures Luke Stanley 2025-10-21 20:53:18 +00:00
760af62e11 Git ignore eval_bundle Luke Stanley 2025-10-21 20:39:31 +00:00
901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes Luke Stanley 2025-10-21 19:52:21 +00:00
defd1246aa Fix Torch crash caused by pinning on CPU Luke Stanley 2025-10-21 19:43:38 +00:00
2e938530ce delete spurious torch.empty allocation in adamw Andrej 2025-10-21 11:35:17 -07:00
a088b7a6ec use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available Andrej Karpathy 2025-10-21 18:07:33 +00:00
94ee507054 quick fix base eval due to fewshot requirement Andrej Karpathy 2025-10-21 17:56:08 +00:00
33e8a27f91 Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something. Andrej 2025-10-21 10:26:04 -07:00
50bea28ef9 also add readme mention of the cpu mps changes Andrej Karpathy 2025-10-21 17:24:48 +00:00
5bdc99abfb merge and resolve conflict Andrej Karpathy 2025-10-21 17:19:10 +00:00
dfcb1c16f1 Merge branch 'master' into cpu-mps-dev Andrej Karpathy 2025-10-21 17:15:53 +00:00
bb71c64579 fix silly issue in dataloader, this version is much faster and more portable to mps too Andrej Karpathy 2025-10-21 17:12:50 +00:00
bb786c5560 i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history... karpathy 2025-10-21 10:07:40 -07:00
c9ea7a91e2 Add customization instructions to README Andrej 2025-10-21 08:57:10 -07:00
03cddd9878 actually let's not brick code on git pull. change error to warning Andrej Karpathy 2025-10-21 15:13:25 +00:00
fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok Andrej Karpathy 2025-10-21 15:04:58 +00:00
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming karpathy 2025-10-20 10:15:17 -07:00
a09ac812ed toml changes for cpu only install Andrej 2025-10-20 07:53:15 -07:00
49cd02f283 fix: remove unnecessary tensor allocation in DistAdamW optimizer Sermet Pekin 2025-10-20 12:03:26 +03:00
0abb0fa2e3 add both sides of the source check burtenshaw 2025-10-20 10:44:07 +02:00
c7ae920a77 add check for linux on cpu burtenshaw 2025-10-20 06:51:52 +02:00
0f007889dd Add MIT License as a file to the project Andrej 2025-10-19 17:22:19 -07:00
5a879f4947 export NANOCHAT_BASE_DIR so child processes get it too Andrej 2025-10-19 17:07:56 -07:00
c1d2ed1c13 use orig_model in sampling, silly of me to miss this Andrej Karpathy 2025-10-20 00:05:09 +00:00
2bc521a6de use orig_model in sampling, silly of me to miss this Andrej Karpathy 2025-10-20 00:04:15 +00:00
9467d83cf2 fix memory leak bug in rust tokenizer ty @mitsuhiko Andrej Karpathy 2025-10-19 23:54:31 +00:00
b1443dc98c export NANOCHAT_BASE_DIR so child processes get it too Tancrède Lepoint 2025-10-19 14:05:40 -04:00
2b58e2dd2a Update logo in code as well obxium 2025-10-18 09:31:11 -04:00
cf2baf9933 fix typo Andrej 2025-10-17 08:35:41 -07:00
e4f9b9c64d revert to previous pyproject.toml karpathy 2025-10-17 08:08:16 -07:00
e883b1d597 Merge pull request #99 from burtenshaw/cpu-mps-dev-ben Andrej 2025-10-17 07:24:38 -07:00
ed519b0f24 Update engine.py with correct error message on assert Phúc H. Lê Khắc 2025-10-17 17:21:25 +07:00
23b6351c1c add groups and source selection burtenshaw 2025-10-17 12:20:18 +02:00
ae02650afe update the midtraining script too karpathy 2025-10-16 16:33:17 -07:00
df600b6ed5 many small tweaks. base, eval, core work now i think karpathy 2025-10-16 15:46:18 -07:00
d6d86cbf4c update readme with a link to the CPU|MPS branch Andrej Karpathy 2025-10-16 22:03:39 +00:00
ccfe7915ac mention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo Andrej Karpathy 2025-10-16 19:32:44 +00:00
786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip karpathy 2025-10-16 10:26:19 -07:00
279b74312c adjust comment/guidance on device type karpathy 2025-10-16 10:06:39 -07:00
306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights karpathy 2025-10-16 10:04:43 -07:00
722da4f543 trying to add basic cpu support, will try mps too Andrej Karpathy 2025-10-16 16:14:38 +00:00
1f7ee5d3ce Remove redundant exception handling in chdir Ram Rachum 2025-10-16 15:40:10 +03:00
4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate Andrej Karpathy 2025-10-16 01:28:37 +00:00
2846999b8f allow user to click on their message to edit them. conversation after that point is wiped Andrej Karpathy 2025-10-16 01:16:22 +00:00
92d52ecc92 add slash commands to webui Andrej Karpathy 2025-10-16 01:09:53 +00:00
fae3aca951 add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now Andrej Karpathy 2025-10-15 20:32:22 +00:00
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints Andrej Karpathy 2025-10-15 20:29:54 +00:00
03fa673b7d add basic logging to chat_web, which i think might be fun Andrej Karpathy 2025-10-15 19:51:06 +00:00
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints Andrej Karpathy 2025-10-15 19:42:54 +00:00
01fb290f53 allow multiple GPUs to do inference in a data parallel way Andrej Karpathy 2025-10-15 19:12:19 +00:00
190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports Andrej Karpathy 2025-10-15 16:42:23 +00:00
b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation Andrej Karpathy 2025-10-15 16:35:04 +00:00
67aaca98f5 export NANOCHAT_BASE_DIR so child processes get it too Andrej 2025-10-14 16:01:28 -07:00
938cb31f1a Update logo obxium 2025-10-14 14:19:44 -04:00
f0855cbcc7 Update speedrun.sh Zach Mueller 2025-10-14 14:12:01 -04:00
02440f670d fix: return inf instead of crashing when evaluate_bpb has zero total_bytes Bhaskar 2025-10-14 17:21:11 +05:30
dd6ff9a1cc fix bug in fallback case of find_largest_model Andrej 2025-10-13 14:38:34 -07:00
afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model Mirza-Samad-Ahmed-Baig 2025-10-14 00:18:20 +03:00
5fd0b13886 Merge pull request #2 from epoyraz/patch-1 Andrej 2025-10-13 10:10:15 -07:00
6a795baf27 Update README.md Enes Poyraz 2025-10-13 18:40:12 +02:00
626bd3e260 Add image of the WebUI to readme Andrej 2025-10-13 08:03:00 -07:00
da96b46565 update link to the new discussion karpathy 2025-10-13 07:42:09 -07:00
a53833d04f add nanochat logo png karpathy 2025-10-13 06:59:59 -07:00
3a5e0bc50b initial commit karpathy 2025-10-13 06:49:24 -07:00

1 2 3 4

Commit Graph Select branches Hide Pull Requests main mochi/issue-4 pre-fork #5 Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

mochi/issue-4

pre-fork

#5