Merge pull request #634 from 2bitbit/fix-docs-and-comments
fix: correct minor typos in help text, README, and comments
This commit is contained in:
@@ -53,7 +53,7 @@ A few more notes:
|
||||
|
||||
- The code will run just fine on the Ampere 8XA100 GPU node as well, but a bit slower.
|
||||
- All code will run just fine on even a single GPU by omitting `torchrun`, and will produce ~identical results (code will automatically switch to gradient accumulation), but you'll have to wait 8 times longer.
|
||||
- If your GPU(s) have less than 80GB, you'll have to tune some of the hyperparameters or you will OOM / run out of VRAM. Look for `--device_batch_size` in the scripts and reduce it until things fit. E.g. from 32 (default) to 16, 8, 4, 2, or even 1. Less than that you'll have to know a bit more what you're doing and get more creative.
|
||||
- If your GPU(s) have less than 80GB, you'll have to tune some of the hyperparameters or you will OOM / run out of VRAM. Look for `--device-batch-size` in the scripts and reduce it until things fit. E.g. from 32 (default) to 16, 8, 4, 2, or even 1. Less than that you'll have to know a bit more what you're doing and get more creative.
|
||||
- Most of the code is fairly vanilla PyTorch so it should run on anything that supports that - xpu, mps, or etc, but I haven't personally exercised all of these code paths so there might be sharp edges.
|
||||
|
||||
## Research
|
||||
|
||||
Reference in New Issue
Block a user