Recommended GPUs and GPU clouds for AI use-cases

Table of Contents

Why does this exist? I wondered what GPU I should be using for Falcon-40B, for MPT-30B, and for running Stable Diffusion. And I wanted to know which one would be best purely for performance, which one would give the best overall balance between price and performance, and which one would be best for a lower price (note that at least for Stable Diffusion, you can definitely get it running with cheaper cards, too).

This way you don’t need to read all the model specs, look up different GPU rankings and stats, and check that a model runs on a given card, and then check pricing at all the GPU clouds…

Use case	Priority	GPU	Price/hr	Cloud
Falcon-40B	🏆 Performance	2x H100s	N/A	No instant availability of 2x GPU instances
Falcon-40B	👌 Price to performance ratio	2x RTX 6000 Ada (not A6000 or RTX 6000)	$2.38	✅ Runpod
Falcon-40B	🪙 Lower price	2x A6000	$1.58-$1.60	✅ Runpod, FluidStack, or Lambda
MPT-30B	🏆 Performance	1x H100	$1.99	✅ FluidStack or Lambda
MPT-30B	👌 Price to performance ratio	1x H100	$1.99	✅ FluidStack or Lambda
MPT-30B	🪙 Lower price	1x A100 80GB	$1.79	✅ Runpod
Stable Diffusion	🏆 Performance	1x H100	$1.99	✅ FluidStack or Lambda
Stable Diffusion	👌 Price to performance ratio	1x RTX 4090	$0.69	✅ Runpod
Stable Diffusion	🪙 Lower price	1x RTX 3090 (or 1x A5000)	$0.44	✅ Runpod

Detailed tables #

Falcon-40B

MPT-30B

Stable Diffusion

GPU requirements #

Use case	GPU requirements	Recommended card
Running Falcon-40B	GPU with 85-100GB+ VRAM (Video RAM)	See Falcon-40B table
Running MPT-30B	80GB for 16-bit precision	See MPT-30B table
Training LLaMA (65B)	“They had 8,000 Nvidia A100s at the time.”	Very large H100 cluster
Training Falcon (40B)	“384 A100 40GB GPUs”	Large H100 cluster
Fine tuning an LLM (large scale)	“64 A100 40GB GPUs”	H100 cluster
Fine tuning an LLM (small scale)	“4x A100 80gb”	Multi-H100 instance
Stable Diffusion image generation	“12GB+” or “16GB+”	See Stable Diffusion table
Whisper transcription	Minimal, can run on CPU	Can run on a GPU. If you want it faster, see the Stable Diffusion GPU table

Add comments on the google sheet here.