this post was submitted on 07 Dec 2023
284 points (97.3% liked)
Technology
59219 readers
2821 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Does GPT4all not allow that? Or do you have specific other models?
I haven't super looked into it but I'm not interested in playing the GPU game against the gamers so if AMD can do a Tesla equivalent with gobs of RAM and no display hardware I'd be all about it.
Right now it's looking like I'm going to build a server with a pair of K80s off ebay for a hundred bucks which will give me 48GB of RAM to run models in.
That segment of the market is less price-sensitive than gamers, which is why Nvidia is demanding the prices that they are for it.
An Nvidia H100 will give you 80GB of VRAM, but you'll pay $30,000 for it.
AMD competing with Nvidia in the sector more-strongly will improve pricing, but I doubt very much that it's going to make compute cards cheaper than GPUs.
Besides, if you did wind up with compute cards being cheaper, you'd have gamers just rendering frames on compute cards and then using something else to push the image to the screen. I know that Linux can do that with PRIME, and I assume that Windows can as well. That'd cause their attempt to split the market by price to fail. Nah, they're going to split things up by amount of VRAM on the card, not by whether there's a video interface on it.
I suspect that a better option is to figure out ways to reasonably split up models to run on lower-VRAM GPUs in parallel.
Some of the LLMs it ships with are very reasonably sized and still be impressive. I can run them on a laptop with 32GB of RAM.
This is very interesting! Thanks for the link. I'll dig into this when I manage to have some time.