overview for ThisGonBHard

in the context of evaluating LLMs, what do these scores technically mean? in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

Nothing, sadly.

Models are trained on the questions, to improve performance, making the tests moot.

Pay as you go for opensouce models in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

One I used before is runpod.io, but it is a pay per time platform, not API.

Anyone know what the speed difference between 2x 4090 vs 2x 3090 would be for larger LLMs (70B-120B) in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

3090 might be faster/around the same speed, as they have NV-Link.

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything! in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

I think it becomes faster to run on CPU than that.

How bottlenecked are LLMs by CPU clock? (Budget options to host multiple GPUs) in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago (1 children)

Pretty much not at all. The main bottleneck is memory speed.

I barely see a difference between 4 and 12 cores on 5900X when running on CPU.

When running multi GPU, the lanes are the biggest bottleneck.

On single GPU, CPU does not matter.

Macs with 32GB of memory can run 70B models with the GPU. in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

I think it means no display in.

Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

While the benchmarks then to be cheated, especially by small models, I honestly think something is wrong with how you run it.

Yi-34B trades blows with Lllama 2 70B from my personal tests, making it do novel tasks invented by me, not the gamed benchmarks.

ALL 7B models are like putting a 7 year old vs an renowned professor when they are compared to 34B and 70B.

Serious inquiry: I've been tinkering a lot with finetuning and was wondering if it would be worth to buy a V100 of my own in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago (1 children)

Why the hell would you get a 2 gen old 16 GB GPU for 7.7K when you can get 3-4 4090s, each will rofl stomp it ANY use case, let alone running 3.

Get either an A6000 (Ampere 48GB card), A6000 ADA, 3 4090s and the a AMD TR system with it or something like that. It will still run laps around the V100 and be cheaper.

Why LocalLLaMa when GPT-4 exists? in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

https://github.com/oobabooga/text-generation-webui

How much ram do you have? It matters a lot.

For a BIF simplification, think of the models you can run as the size (billion parameter, for example 13B means 13 billion) = 50-60% of your RAM.

If you have 16 GB, you can run a 7B model for example.

If you have 128GB, you can run 70B,

Why LocalLLaMa when GPT-4 exists? in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

closed-source model

You gave your own answer:

Not monitored

Not controlled

Uncensored

Private

Anonymous

Flexible

Alternatives to LLamaSharp? in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago

The whole AI ecosystem was pretty much designed for python from the ground up.

I am guessing you can run C# as the front end, and python as back end.

Wich Llama can I run with a M3 pro 36GO ? in c/localllama@poweruser.forum

[–] ThisGonBHard@alien.top 1 points 1 year ago (1 children)

I dont know if Exllama 2 supports Mac, but if it does, 70B.