overview for EvokerTCG

1

Motherboards for AMD EPYC 9004 series build? (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

1 comments fedilink

I'm looking at costs for a large RAM CPU build and I want to share my ideas and get advice. I've seen others here say they've done something similar but without details.

So starting with the MB - I want a single CPU board with 12 memory channels with room for a GPU. The only one I can find is this supermicro for around $900. https://www.supermicro.com/en/products/motherboard/h13ssl-n

Gigabyte does have a 12 channel board but it has 24 DIMMs which would get in the way of a GPU. Are there other good options?

For RAM, 32gb 4800 server memory is about $140 each, so $1680 for 12, giving 384gb.

For CPU, the EPYC 9354 should be more than sufficient, and is around $3000.

Does Dual EPYC work for LLMs? in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago

Thanks. I can't find 'qualification samples' on ebay in the UK, unless you just find them through a serial number or something.

The DDR5 ram is more expensive, but it should hold value fairly well. I'll look for a 12 channel board.

Does Dual EPYC work for LLMs? in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago (2 children)

I want to keep my options open, and potentially have a large context, which can add up to 100GB to memory requirements.

I'm considering 1x genoa CPU with 12 channels. Something like the 9354 would be more than enough cores. I might start with a cheaper DDR4 machine first though.

How was it getting the Epyc machine set up? Are you using windows? What about a GPU?

Does Dual EPYC work for LLMs? in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago

Not true from what I've read here.

1

Does Dual EPYC work for LLMs? (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

9 comments fedilink

Continuing my quest to choose a rig with lots of memory, one possibility is dual socket MBs. Gen 1 to 3 EPYC chips have 8 channels of DDR4, so this gives 16 total memory channels, which is good bandwidth, if not beating GPUs, but can have way more memory (up to 1024GB). Builds with 64+ threads can be pretty cheap.

My questions are

Does the dual CPU setup cause trouble with running LLM software?
Is it reasonably possible to get windows and drivers etc working on 'server' architecture?
Is there anything else I should consider vs going for a single EPYC or Threadripper Pro?

1

Threadripper pro - how much does core count matter? (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

2 comments fedilink

So I'm looking into Threadripper pro systems, which can offer a pretty good memory bandwidth as they are 8 channel, and can have a huge amount of RAM. (I can put a 3090 or two in there too.)

I'm wondering how much the core count is going to affect performance. For example, the 5955WX has 16 cores while the 5995WX has 64 cores. They can both use the same memory though. There's little point spending extra if the limiting factor will be somewhere else.

Proposed Alternative to Repetition Penalty - Noisy Sampling in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago

Aside from repetition, isn't this effectively a new sampling method? You could call it Fuzzed Greedy Sampling.

Speed and energy use - RAM vs Mac Studio vs RTX cards in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago (1 children)

I meant in total, but there do seem to be models with up to 100GB for context, like 01-ai/Yi-34B-200K.

Speed and energy use - RAM vs Mac Studio vs RTX cards in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago

A valid option. I haven't looked into prices for renting but it could make sense unless I will use it a lot.

1

Speed and energy use - RAM vs Mac Studio vs RTX cards (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

5 comments fedilink

So I'm interested in applications that require memory more than speed, with high quality and a big context. I'm talking 100GB or more. Speed is still an important consideration. I don't need snappy conversations, but getting through more stuff 'overnight' is still valuable.

3090s are affordable, but it would take 4 to 8 to get into the big memory category, and the primary issue is energy use. For batch use the PC could shut down after finishing, so idle power use wouldn't be an issue. Are there motherboards that can completely shut off power to extra cards when they aren't needed?

Mac Studio M2 Ultra can get 192GB of unified memory, with about 140GB usable. This isn't as fast, obviously, but is meant to be acceptable for many applications.

What about PCs/servers with lots of mainboard RAM? Is this way slower than the Macs due to different architecture? If not it's probably a lot cheaper. The CPU would need to do all the work, and I don't know about how the energy efficiency would compare.

I would be grateful if anyone has data comparing speeds or joules per token for these broad options.

Relationship of RAM to context size? in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago (1 children)

Thanks. I would guess the seqlen is the sum of the input and output length as it feeds back on itself.

Relationship of RAM to context size? in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago

Thanks. Yes, a 2kW heater pc would only be welcome in the winter, and could get pricy to run.

1

Relationship of RAM to context size? (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

11 comments fedilink

I understand that a bigger memory means you can run a model with more parameters or less compression, but how does context size factor in? I believe it's possible to increase the context size, and that this will increase the initial processing before the model starts outputting tokens, but does someone have numbers?

Is memory for context independent on the model size, or does a bigger model mean that each bit of extra context 'costs' more memory?

I'm considering an M2 ultra for the large memory and low energy/token, although the speed is behind RTX cards. Is this the best option for tasks like writing novels, where quality and comprehension of lots of text beats speed?

1

Any M2 ultra reviews? (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

6 comments fedilink

So I'm considering getting a good LLM rig, and the M2 Ultra seems to be a good option for large memory, with much lower power usage/heat than 2 to 8 3090s or 4090s, albeit with lower speeds.

I want to know if anyone is using one, and what it's like. I've read that it is less supported by software which could be an issue. Also, is it good for Stable Diffusion?

Another question is about memory and context length. Does a big memory let you increase the context length with smaller models where the parameters don't fill the memory? I feel a big context would be useful for writing books and things.

Is there anything else to consider? Thanks.

I have some questions in c/localllama@poweruser.forum

[–] EvokerTCG@alien.top 1 points 2 years ago

I haven't tried Mac and don't know what the software ecosystem is like. Have you tried it or seen it working?

It looks like it doesn't have dedicated VRAM, but shared memory. I would guess this is slower than dedicated GPU memory but faster than RAM sticks on a normal PC?

1

I have some questions (alien.top)

submitted 2 years ago by EvokerTCG@alien.top to c/localllama@poweruser.forum

3 comments fedilink

So for background I've had some interest in LLMs and other AI for a year or so. I've used online LLMs like ChatGPT but haven't tried running my own due to 10 year old hardware. I'm considering getting a new PC and want to know whether to splash for one that can do high end LLM stuff.

I've read up a fair bit but have some questions that hopefully aren't too stupid.

1.) It looks like VRAM is the biggest hardware limit for model size. What are some good hardware options at different price points? Are there really expensive options that blow consumer stuff out of the water? Is now a good time to buy or is there something worth waiting for?
2.) Open source models seem to be dependent on the trainers giving away their expensively acquired work. Are you anticipating model releases to replace LLAMA2, and when?
3.) Is retraining or fine tuning possible for ordinary users? Is this meaningfully different from having a 'mission' or instruction set added to the beginning of each prompt/context? 3.) I think I understand parameter size and compression, but what determines the token context size a model can handle? GPT4s new massive context size is very handy.
4.) I'm interested in 'AutoGPT' type systems (or response + validation etc). Can this work in series mode, where you only have 1 model running a time? It seems like having specialised models could be useful. Would loading different models most suited to each particular 'subroutine' slow things down a lot? Are these systems difficult to set up or is it just a matter of feeding the output of one query into the input of the next (while adding on previous relevant context).
5.) Is the same type of hardware setup good for both LLMs and Stable Diffusion, or do they have separate setups for good bang/buck?

Many thanks to anyone who can help!