overview for Tiny_Arugula

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything! in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

one of those cases where proving something can be done doesn't make it useful. This has to be one of the least efficient ways to do inferencing. Like the people who got Doom running on a HP printer. Great you did it but it's the worst possible version.

1

Recommendations for code free SaaS products, that enable easy fine-tuning of models >= 13B @ >=8 Bit. (alien.top)

submitted 2 years ago by Tiny_Arugula_5648@alien.top to c/localllama@poweruser.forum

0 comments fedilink

I've been having a hell of time getting notebooks running and finding A100 80GB (like everyone else). I'd like to train 30B at 16 or 8 Bit. I need to use this model in an early version of our new product, and 4bit just isn't going to cut it.

Anyone know of a point and click service that I can use.. I don't need an enterprise class service with a ton of MLops capabilities for big $$. I just need some automation to get models built quickly.

I just need recommendations on severices to consider.. No need for any philosophical or political debate.

Much appreciated!!

What kind of specs to run local llm and serve to say up to 20-50 users in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

unless you're doing this as a business it's going to be massively cost prohibitive, hundreds of thousands dollars of hardware. If it is a business you better get talking to cloud vendors because GPUs are an incredibly scarce resource right now.

Hardware question: combining a 3090 and a p40 in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago (3 children)

no absolutely not.. not how you described it. the issue isn't about RAM it's about the numbers of calculations that need to be done. With GPUs you need to load the data into VRAM and that is only going to be available for that GPUs calculations it's not a shared memory pool. So load data into the p40 it will only be able to use that for it's calculations.

Yes you can run the model on multiple GPUs. If one of those is very slow with lots of RAM then the layers you offload to that card will be processed slowly. No there is no way to speed up calculations. VRAM is only making the weights readily available so you're not constantly loading and unloading the model weights.

[D] For those interested, Please, help build a new and small subreddit community centered on positive and enthusiastic AI discourse. in c/machinelearning@academy.garden

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

Looks like there are already have issues.. it should be using LLMs to automate moderation..

Introducing Tess: Tess-M with 200K Context Length in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

What's the VRAM usage? a context that big can use an enormous amount..

GPT-4's 128K context window tested in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

Their needle in a haystack test isn't very compelling. Sure no test is flawless but a random out of context fact placed at different points in the context window there is a lot of reasons why the model would fail to retrieve that.

Point me towards some basic dataset preparation tips for LLM's? in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago (1 children)

Go to huggingface and look at the multitude of datsets that have already been prepped and read whatever documentation and papers that have been published. Go through the data and get a sense of what the data looks like and how it's structured.

RAG in a couple lines of code with txtai-wikipedia embeddings database + Mistral in c/localllama@poweruser.forum

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

Textai is fantastic!!

[D] People who work on computer vision models on the edge, what devices do you deploy to? in c/machinelearning@academy.garden

[–] Tiny_Arugula_5648@alien.top 1 points 2 years ago

You should also check out the Coral TPU boards, they are more efficient for models.