Recommended GPUs and GPU clouds for AI use-cases
·
Table of Contents
Why does this exist? I wondered what GPU I should be using for Falcon-40B, for MPT-30B, and for running Stable Diffusion. And I wanted to know which one would be best purely for performance, which one would give the best overall balance between price and performance, and which one would be best for a lower price (note that at least for Stable Diffusion, you can definitely get it running with cheaper cards, too).
This way you don’t need to read all the model specs, look up different GPU rankings and stats, and check that a model runs on a given card, and then check pricing at all the GPU clouds…
Use case | Priority | GPU | Price/hr | Cloud |
---|---|---|---|---|
Falcon-40B | 🏆 Performance | 2x H100s | N/A | No instant availability of 2x GPU instances |
Falcon-40B | 👌 Price to performance ratio | 2x RTX 6000 Ada (not A6000 or RTX 6000) | $2.38 | ✅ Runpod |
Falcon-40B | 🪙 Lower price | 2x A6000 | $1.58-$1.60 | ✅ Runpod, FluidStack, or Lambda |
MPT-30B | 🏆 Performance | 1x H100 | $1.99 | ✅ FluidStack or Lambda |
MPT-30B | 👌 Price to performance ratio | 1x H100 | $1.99 | ✅ FluidStack or Lambda |
MPT-30B | 🪙 Lower price | 1x A100 80GB | $1.79 | ✅ Runpod |
Stable Diffusion | 🏆 Performance | 1x H100 | $1.99 | ✅ FluidStack or Lambda |
Stable Diffusion | 👌 Price to performance ratio | 1x RTX 4090 | $0.69 | ✅ Runpod |
Stable Diffusion | 🪙 Lower price | 1x RTX 3090 (or 1x A5000) | $0.44 | ✅ Runpod |
Detailed tables #
GPU requirements #
Use case | GPU requirements | Recommended card |
---|---|---|
Running Falcon-40B | GPU with 85-100GB+ VRAM (Video RAM) | See Falcon-40B table |
Running MPT-30B | 80GB for 16-bit precision | See MPT-30B table |
Training LLaMA (65B) | “They had 8,000 Nvidia A100s at the time.” | Very large H100 cluster |
Training Falcon (40B) | “384 A100 40GB GPUs” | Large H100 cluster |
Fine tuning an LLM (large scale) | “64 A100 40GB GPUs” | H100 cluster |
Fine tuning an LLM (small scale) | “4x A100 80gb” | Multi-H100 instance |
Stable Diffusion image generation | “12GB+” or “16GB+” | See Stable Diffusion table |
Whisper transcription | Minimal, can run on CPU | Can run on a GPU. If you want it faster, see the Stable Diffusion GPU table |