minor cosmetics for the table
This commit is contained in:
@@ -14,11 +14,11 @@ For questions about the repo, I recommend either using [DeepWiki](https://deepwi
|
|||||||
|
|
||||||
## Leaderboard
|
## Leaderboard
|
||||||
|
|
||||||
| # | Record time | val_bpb | CORE | Description | Date | Commit | Contributors |
|
| # | time | val_bpb | CORE | Description | Date | Commit | Contributors |
|
||||||
|---|-------------|---------|------|-------------|------|--------|--------------|
|
|---|-------------|---------|------|-------------|------|--------|--------------|
|
||||||
| 0 | 168 hours | - | 0.256525 | Original OpenAI GPT-2 checkpoint | 2019 | - | OpenAI |
|
| 0 | 168 hours | - | 0.2565 | Original OpenAI GPT-2 checkpoint | 2019 | - | OpenAI |
|
||||||
| 1 | 3.04 | 0.74833 | 0.25851 | d24 baseline, slightly overtrained | Jan 29 2026 | 348fbb3 | @karpathy |
|
| 1 | 3.04 | 0.74833 | 0.2585 | d24 baseline, slightly overtrained | Jan 29 2026 | 348fbb3 | @karpathy |
|
||||||
| 2 | 2.91 | 0.74504 | 0.2578 | d26 slightly undertrained **+fp8** | Feb 2 2026 | TODO | @karpathy |
|
| 2 | 2.91 | 0.74504 | 0.2578 | d26 slightly undertrained **+fp8** | Feb 2 2026 | 8309b83 | @karpathy |
|
||||||
|
|
||||||
The primary metric we care about is "time to GPT-2" - the wall clock time needed to outperform the GPT-2 (1.6B) CORE metric on an 8XH100 GPU node. The GPT-2 CORE score is 0.256525. In 2019, the training of GPT-2 cost approximately $50,000 so it is incredible that due to many advances over 7 years across the stack, we can now do so much faster and for well below $100 (e.g. at the current ~$3/GPU/hr, an 8XH100 node is ~$24/hr, so 3 hours is ~$72).
|
The primary metric we care about is "time to GPT-2" - the wall clock time needed to outperform the GPT-2 (1.6B) CORE metric on an 8XH100 GPU node. The GPT-2 CORE score is 0.256525. In 2019, the training of GPT-2 cost approximately $50,000 so it is incredible that due to many advances over 7 years across the stack, we can now do so much faster and for well below $100 (e.g. at the current ~$3/GPU/hr, an 8XH100 node is ~$24/hr, so 3 hours is ~$72).
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user