FaustBargain

joined 10 months ago
[–] FaustBargain@alien.top 1 points 9 months ago

So there are CPU intrinsics for prefetching data. If we can get better at anticipating the next pieces of data that need to be calculated you can speckle in those preload instructions and achieve more speed.

[–] FaustBargain@alien.top 1 points 9 months ago

if you have the ram don't worry about disk at all. if you have to drop to any kind of disk even if it's gen 5 ssd you speeds will tank. memory bandwidth matters so much more than compute for LLMs, but it all depends on your needs. there are probably cheaper ways to go about this if you just need something occasionally. maybe runpod or something, but if you need a lot of inference then locally could save you money, but renting a big machine with a100s will always be faster. so will a 7B model do what you need or do you need the accuracy and comprehension of a 70b or one of the new 120b merges? also llama3 is supposed to be out in jan/feb and if it's significantly better then everything changes again.

[–] FaustBargain@alien.top 1 points 9 months ago

"organisations"...

[–] FaustBargain@alien.top 1 points 9 months ago

wait the 100B one says it's based on llama2-chat? did they take the llama 2 foundational model, up the parameter count, and just continue training?

[–] FaustBargain@alien.top 1 points 9 months ago (2 children)

how much ram do you think the 600B would take? I have 512gb and I can fit another 512gb in my box before I run out of slots. I think with 1TB I should be able to run it unquantized because falcon 180b used slightly less than half my ram.

[–] FaustBargain@alien.top 1 points 9 months ago (1 children)

Qwen 72b

I can't seem to find anything about qwen 72b except two tweets from a month ago that said it was coming out. who makes it? what's it trained on? any details?

[–] FaustBargain@alien.top 1 points 10 months ago

sounds like me ;) Thanks!

[–] FaustBargain@alien.top 1 points 10 months ago (3 children)

my setup

EPYC Milan-X 7473X 24-Core 2.8GHz 768MB L3

512GB of HMAA8GR7AJR4N-XN HYNIX 64GB (1X64GB) 2RX4 PC4-3200AA DDR4-3200MHz ECC RDIMMs

MZ32-AR0 Rev 3.0 motherboard

6x 20tb WD Red Pros on ZFS with zstd compression

SABRENT Gaming SSD Rocket 4 Plus-G with Heatsink 2TB PCIe Gen 4 NVMe M.2 2280

you can probably get away with a non-x without really an performance difference. it might make a difference in very tiny models, but that's not the point of getting such a beastly machine.

I got the Milan-X because I also use it for cad, and circuit board development, and gaming, and video editing so it's an all in one for me.

also my electric bill went from $40 a month to $228 a month, but some of that is because I haven't setup the suspend states yet and the machine isn't sleeping the way I want it to yet. I just haven't gotten around to it. i imagine it would cut the bill in half, and then maybe choosing the right fan manager and governors might save me another $30 a month.

I can run falcon 180b unquantized and still have tons of ram left over.

[–] FaustBargain@alien.top 1 points 10 months ago (1 children)

if llama3 can get really really close to gpt4 after the best finetunes then I could really do some powerful autonomous agent stuff with it

[–] FaustBargain@alien.top 1 points 10 months ago (1 children)

if it's a company that could be a drop in the bucket