How large is Bloom exactly to load all the checkpoints into gpu ram?

#127

by mishavee - opened Oct 22, 2022

Discussion

mishavee

Oct 22, 2022

How large is Bloom exactly to load all the checkpoints into gpu ram?
How large of gpu ram would be needed to load all the checkpoints and fine tune it?

TimeRobber

BigScience Workshop org Oct 23, 2022

How large is Bloom exactly to load all the checkpoints into gpu ram?

You need 352G of GPU ram to load the weights in bfloat16 in GPUs.

How large of gpu ram would be needed to load all the checkpoints and fine tune it?

You never need to load all the checkpoints at once ... if you want to finetune you have to take in account optimizer states. Luckily you can try using DeepSpeed zero offload, it essentially moves the memory footprint to other spaces (either CPU RAM or Disk). @stas has written a great documentation about how to use it in transformers https://huggingface.co/docs/transformers/main_classes/deepspeed

mishavee

Oct 23, 2022

•

edited Oct 23, 2022

so what is the least amount of A100 80gb gpus I need if I use deepspeed zero offload?

TimeRobber

BigScience Workshop org Oct 23, 2022

so what is the least amount of A100 80gb gpus I need if I use deepspeed zero offload?

The very minimum is probably going to be 1 A100. It's going to be very slow, but it's going to run. Offloading just means that it's going to use the CPU memory / disk space as additional memory so that you're not going to go out of memory.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment