Skip to main content

Recommended GPUs and GPU clouds for AI use-cases

·
Table of Contents

Why does this exist? I wondered what GPU I should be using for Falcon-40B, for MPT-30B, and for running Stable Diffusion. And I wanted to know which one would be best purely for performance, which one would give the best overall balance between price and performance, and which one would be best for a lower price (note that at least for Stable Diffusion, you can definitely get it running with cheaper cards, too).

This way you don’t need to read all the model specs, look up different GPU rankings and stats, and check that a model runs on a given card, and then check pricing at all the GPU clouds…

Use case Priority GPU Price/hr Cloud
Falcon-40B 🏆 Performance 2x H100s N/A No instant availability of 2x GPU instances
Falcon-40B 👌 Price to performance ratio 2x RTX 6000 Ada (not A6000 or RTX 6000) $2.38 ✅ Runpod
Falcon-40B 🪙 Lower price 2x A6000 $1.58-$1.60 ✅ Runpod, FluidStack, or Lambda
MPT-30B 🏆 Performance 1x H100 $1.99 ✅ FluidStack or Lambda
MPT-30B 👌 Price to performance ratio 1x H100 $1.99 ✅ FluidStack or Lambda
MPT-30B 🪙 Lower price 1x A100 80GB $1.79 ✅ Runpod
Stable Diffusion 🏆 Performance 1x H100 $1.99 ✅ FluidStack or Lambda
Stable Diffusion 👌 Price to performance ratio 1x RTX 4090 $0.69 ✅ Runpod
Stable Diffusion 🪙 Lower price 1x RTX 3090 (or 1x A5000) $0.44 ✅ Runpod

Detailed tables #

Falcon-40B

MPT-30B

Stable Diffusion

GPU requirements #

Use case GPU requirements Recommended card
Running Falcon-40B GPU with 85-100GB+ VRAM (Video RAM) See Falcon-40B table
Running MPT-30B 80GB for 16-bit precision See MPT-30B table
Training LLaMA (65B) “They had 8,000 Nvidia A100s at the time.” Very large H100 cluster
Training Falcon (40B) “384 A100 40GB GPUs” Large H100 cluster
Fine tuning an LLM (large scale) “64 A100 40GB GPUs” H100 cluster
Fine tuning an LLM (small scale) “4x A100 80gb” Multi-H100 instance
Stable Diffusion image generation “12GB+” or “16GB+” See Stable Diffusion table
Whisper transcription Minimal, can run on CPU Can run on a GPU. If you want it faster, see the Stable Diffusion GPU table

Add comments on the google sheet here.