Tiny_Arugula_5648

joined 10 months ago
[–] Tiny_Arugula_5648@alien.top 1 points 9 months ago

one of those cases where proving something can be done doesn't make it useful. This has to be one of the least efficient ways to do inferencing. Like the people who got Doom running on a HP printer. Great you did it but it's the worst possible version.

 

I've been having a hell of time getting notebooks running and finding A100 80GB (like everyone else). I'd like to train 30B at 16 or 8 Bit. I need to use this model in an early version of our new product, and 4bit just isn't going to cut it.

Anyone know of a point and click service that I can use.. I don't need an enterprise class service with a ton of MLops capabilities for big $$. I just need some automation to get models built quickly.

I just need recommendations on severices to consider.. No need for any philosophical or political debate.

Much appreciated!!

[–] Tiny_Arugula_5648@alien.top 1 points 9 months ago

unless you're doing this as a business it's going to be massively cost prohibitive, hundreds of thousands dollars of hardware. If it is a business you better get talking to cloud vendors because GPUs are an incredibly scarce resource right now.

[–] Tiny_Arugula_5648@alien.top 1 points 9 months ago (3 children)

no absolutely not.. not how you described it. the issue isn't about RAM it's about the numbers of calculations that need to be done. With GPUs you need to load the data into VRAM and that is only going to be available for that GPUs calculations it's not a shared memory pool. So load data into the p40 it will only be able to use that for it's calculations.

Yes you can run the model on multiple GPUs. If one of those is very slow with lots of RAM then the layers you offload to that card will be processed slowly. No there is no way to speed up calculations. VRAM is only making the weights readily available so you're not constantly loading and unloading the model weights.

[–] Tiny_Arugula_5648@alien.top 1 points 10 months ago

Looks like there are already have issues.. it should be using LLMs to automate moderation..

[–] Tiny_Arugula_5648@alien.top 1 points 10 months ago

What's the VRAM usage? a context that big can use an enormous amount..

[–] Tiny_Arugula_5648@alien.top 1 points 10 months ago

Their needle in a haystack test isn't very compelling. Sure no test is flawless but a random out of context fact placed at different points in the context window there is a lot of reasons why the model would fail to retrieve that.

[–] Tiny_Arugula_5648@alien.top 1 points 10 months ago (1 children)

Go to huggingface and look at the multitude of datsets that have already been prepped and read whatever documentation and papers that have been published. Go through the data and get a sense of what the data looks like and how it's structured.

[–] Tiny_Arugula_5648@alien.top 1 points 10 months ago

Textai is fantastic!!

[–] Tiny_Arugula_5648@alien.top 1 points 10 months ago

You should also check out the Coral TPU boards, they are more efficient for models.