LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Speed and energy use - RAM vs Mac Studio vs RTX cards (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

So I'm interested in applications that require memory more than speed, with high quality and a big context. I'm talking 100GB or more. Speed is still an important consideration. I don't need snappy conversations, but getting through more stuff 'overnight' is still valuable.

3090s are affordable, but it would take 4 to 8 to get into the big memory category, and the primary issue is energy use. For batch use the PC could shut down after finishing, so idle power use wouldn't be an issue. Are there motherboards that can completely shut off power to extra cards when they aren't needed?

Mac Studio M2 Ultra can get 192GB of unified memory, with about 140GB usable. This isn't as fast, obviously, but is meant to be acceptable for many applications.

What about PCs/servers with lots of mainboard RAM? Is this way slower than the Macs due to different architecture? If not it's probably a lot cheaper. The CPU would need to do all the work, and I don't know about how the energy efficiency would compare.

I would be grateful if anyone has data comparing speeds or joules per token for these broad options.

top 5 comments

sorted by: hot top controversial new old

[–] fnbr@alien.top 1 points 2 years ago (1 children)

How important is local processing for you? It might be worth looking into renting a cloud server. Datacenter GPUs, like the A/H100s, have much more memory. Could be better bang for your buck if all you care about it throughput.

[–] EvokerTCG@alien.top 1 points 2 years ago

A valid option. I haven't looked into prices for renting but it could make sense unless I will use it a lot.

[–] Ruin-Capable@alien.top 1 points 2 years ago (1 children)

What model are you going to run that can accept 100GB of context?

[–] EvokerTCG@alien.top 1 points 2 years ago (1 children)

I meant in total, but there do seem to be models with up to 100GB for context, like 01-ai/Yi-34B-200K.

[–] Ruin-Capable@alien.top 1 points 2 years ago

Ooh... now I've got another model to play with. :D