this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

This is the reason why you can't find ones in your local best buy. They are paying premium for it. But it indeed is very helpful, if I can get my hand on a few for my build.

top 22 comments
sorted by: hot top controversial new old
[–] trailer_dog@alien.top 1 points 10 months ago (2 children)

China needs to start pumping out their own dedicated AI accelerator cards. I'm sick of Nvidia's VRAM business model. Having to run multiple giant GPUs in parallel instead of simply soldering more RAM chips onto the board is extremely wasteful.

[–] fallingdowndizzyvr@alien.top 1 points 10 months ago (1 children)

China needs to start pumping out their own dedicated AI accelerator cards.

They already do. Or should I say would.

https://www.tomshardware.com/news/chinese-biren-rolls-out-new-gpus-with-77-billion-transistors-2-pflops-of-ai-performance

The problem for China is that like the US, they have to have Taiwan build their chips for them. And the US has told Taiwan in no uncertain terms that they better not build chips for China or else. So even though they have their own designs, they can't get them built. Or should I say they couldn't. Since China pulled a July surprise when Huawei released domestically made 7nm chips. 7nm is what the BR100 is designed to be made at. It was thought that China wouldn't be able to make 7nm chips for at least another 5 years.

Having to run multiple giant GPUs in parallel instead of simply soldering more RAM chips onto the board is extremely wasteful.

That's not what they do. They use custom PCBs. Even the 16GB RX580 is not a piggy backed 8GB RX580. They basically harvest the chips from an existing card and then solder them onto a newly designed PCB.

[–] MrArborsexual@alien.top 1 points 9 months ago (1 children)

I mean Taiwan isn't exactly jumping at the chance to build advance chips for the rougue provences of West Taiwan, that constantly warn of their impeding violent invasion of Taiwan.

Also their yields at 7nm are apparently terrible, assuming they are being honest about the actual production. A few years ago China put out news of a home grown x86 chip that was on par with Zen 1 or something like that, and it turned out they had just repackaged old chips.

[–] fallingdowndizzyvr@alien.top 1 points 9 months ago

I mean Taiwan isn't exactly jumping at the chance to build advance chips for the rogue provences of West Taiwan, that constantly warn of their impeding violent invasion of Taiwan.

Ah... then why are they trying to convince the US to give them a license to run factories in China. TSMC, you know those Taiwan chipmakers, want the US to give them permanent license to run factories those "rogue provences of West Taiwan".

You know what a good way is to keep someone from invading you? Be so crucial that they don't want to even risk damaging you in any way. They would be making chips for them right now if the US allowed it.

[–] mcmoose1900@alien.top 1 points 10 months ago (1 children)

I'm sick of Nvidia's VRAM business model

At the top end, they are actually limited by how much they can physically hang off the die (48GB for current silicon, or 196GB(?) for the interposer silicon).

But yeah, below that its price gouging. What are ya gonna do, buy an Arc?

AMD is going along with this game too. You'd see a lot more 7900s on this sub, and on GitHub, if AMD let their manufacturers double up the VRAM to 48GB.

[–] bassoway@alien.top 1 points 10 months ago

VRAM is not located on the GPU die

[–] metaprotium@alien.top 1 points 10 months ago

Man, I bet they're saving so much money too.

[–] ElectroFried@alien.top 1 points 10 months ago (2 children)

It is not just the 4090's getting vacuumed up. A large amount of the post crypto crash stock has been snapped up for AI use. The used market only a year ago was flooded with things like mi25's and above that were being liquidated. Even the p40's have started to become more expensive. When I purchased mine a year ago it was just over $150 USD, now you will struggle to find one for under $250. For some reason the AMD compute capable cards coming out of China seem to be even more scarce than the Nvidia ones. I highly suspect there is some secret AMD specific pipeline in use that has not become public knowledge right now.

People are starting to really come to grips that 'this time is different' when it comes to the AI boom and it is starting to really impact GPU pricing and availability. The only upside compared to the Crypto boom I guess is that with AI based use cases is that PCIe bus speeds matter and this is stopping people buying anything and everything then slapping 8 GPU's in an AI mining rig.

Things are only going to get worse from here though, Nvidia and AMD both are too caught up in the server space right now to bother with consumer offerings that might compete. The average gamer is not going to demand more than the existing 24GB on their GPU as games simply do not need more at current resolutions. That leaves the limited workstation market and those have always come with a premium. The Pascal based Quadro cards are still selling for twice as much as a p40 and show no sign of coming down. They are not going to rush out and drop a "RTX AI Card" like they did with crypto because the server market would snap them up to drive lower speed training and inference farms.

[–] azriel777@alien.top 1 points 10 months ago (2 children)

China has a lot of used crypto GPU farms where you had racks of GPU's chugging away at crypto crunching. How hard would it be to convert them for A.I, use?

[–] ElectroFried@alien.top 1 points 10 months ago

That 'depends'. Most of the crypto farms run on low cost motherboard/cpu combos with 8+ GPUs essentially connected via a single PCIe lane. If you wanted to do training or even inference on that, you would need to relocate those GPU's to a more capable system and then limit the number of cards to a maximum of 4 cards per system or less. At which point if you are talking about cards with 8GB or less VRAM you have an expensive to run and set up system with 32GB VRAM and fairly low performance. That is why the higher 16GB+ cards are all disappearing.

[–] fallingdowndizzyvr@alien.top 1 points 10 months ago

It depends on what you do with it. I think they can be very useful. Check my post elsewhere in this thread.

https://www.reddit.com/r/LocalLLaMA/comments/183na9z/china_is_retrofitting_consumer_rtx4090s_with_2/kasawk5/

[–] fallingdowndizzyvr@alien.top 1 points 10 months ago

The used market only a year ago was flooded with things like mi25's and above that were being liquidated.

The MI25 is finally getting the love it deserves. I wish I had bought more when they were $65-$70 a few months ago. But I was hoping they would go lower. Even last month or so, I think I saw that they were $90. Right now, I just checked before posting, the seller with the most is selling them for $160. Crazy.

By the way, the one I got is in really good shape. As in really good. If the seller told me they were new, I would believe it. There's not a speck of dust on it. Like no where and I looked deep into the fins of the heatsink. Even the fingers on the slot looked basically new.

The only upside compared to the Crypto boom I guess is that with AI based use cases is that PCIe bus speeds matter and this is stopping people buying anything and everything then slapping 8 GPU's in an AI mining rig.

I don't think that's blanket true. I think it really depends what you do with it. I can think of a couple of uses off the top of my head where 8 GPUs sitting on yanky PCIe 1x would be fine.

  1. Use them as a team. Nothing says you can only use them to infer one large model. You can run 8 7b-13b models. One model per card. The 1x speed wouldn't really matter in that case after the model is loaded. Having a team of small models run instead of 1 large model is a valid way to go.

  2. Batch process 8 different prompts on a large model spread across the GPUs. Since inference is sequential, only 1 GPU is active at a time when only processing a prompt. The others 7 GPUs are idle. Don't let them idle. Vectorize it. Process 8 or more prompts at the same time. Once the vector is full, all 8 GPUs will be running. One the t/s for any one prompt won't be fast. The overall throughput t/s for all the prompts will be. It would be best to keep the prompts coming and thus the vector full to keep all GPUs running. So a good application for this is on a server that is inferring multiple prompts from multiple users. Or multiple prompts from the same user. Or the same prompt 8 different times. Since you can as the same model the same question 8 times and get 8 different answers. Let it process it 8 times and pick the best answer.

  3. There are techniques that can allow for inference to be paralyzed. That may run great on a mining rig with 8 GPUs.

So it's far from useless to repurpose an old mining rig. You just have to be creative.

[–] georgejrjrjr@alien.top 1 points 10 months ago

They used to be on eBay. They’re still listed on Alibaba.

[–] nexusjuan@alien.top 1 points 10 months ago (1 children)

I'm running a Tesla M40 12gb and I'm real close to pulling the trigger on a 24gb. I also have one of the Tesla P4's in my server. With the the M40 I can fully off load 13b models to vram.

[–] thebliket@alien.top 1 points 10 months ago

how does a M40 compare with a A4000?

[–] thebliket@alien.top 1 points 10 months ago (1 children)

why are they getting 4090s when 3090s have the same 24gb memory?

[–] fallingdowndizzyvr@alien.top 1 points 9 months ago (2 children)

Because 4090s are faster. Companies don't use these things for inferring like most people do at home. That's low compute and basically memory bandwidth dependent. Companies use these for training. Which is high compute. A 4090 is much faster than a 3090.

And they are busy putting 48GB on those 3090s.

https://www.techpowerup.com/img/erPhoONBSBprjXvM.jpg

[–] alexgand@alien.top 1 points 9 months ago

What? Was the 48GB 3090 consumer available?

[–] NickUnrelatedToPost@alien.top 1 points 9 months ago (1 children)

Although I don't doubt you, the rendering looks as fake as it gets.

But is there a way to fankenstein more RAM on a existing 3090? Are there shops I could send mine to?

[–] fallingdowndizzyvr@alien.top 1 points 9 months ago

Although I don't doubt you, the rendering looks as fake as it gets.

Here's the attribution for that image, "Social media is abuzz with a screengrab of a regional webpage of the NVIDIA website purporting a "GeForce RTX 3090 CEO Edition" graphics card. "

So tell nvidia to up their rendering game. They should know a little something about graphics or at least know someone that does.

But is there a way to fankenstein more RAM on a existing 3090? Are there shops I could send mine to?

Supposedly those exact frankensteins are available in China. A poster here on this sub has reported buying some. If you were in China, you could take your 3090 to any of the endless Chinese tech center booths with dudes with the skills and equipment to try to do it. I would ask if they've done it before though. You don't want to be the one they learn on.

[–] No-Activity-4824@alien.top 1 points 9 months ago

The entire country of China? The 1 billion people 😁?

[–] Opteron67@alien.top 1 points 9 months ago

why not waterblocks ??