ThisGonBHard

joined 1 year ago
[–] ThisGonBHard@alien.top 1 points 11 months ago

Nothing, sadly.

Models are trained on the questions, to improve performance, making the tests moot.

[–] ThisGonBHard@alien.top 1 points 11 months ago

One I used before is runpod.io, but it is a pay per time platform, not API.

[–] ThisGonBHard@alien.top 1 points 11 months ago

3090 might be faster/around the same speed, as they have NV-Link.

[–] ThisGonBHard@alien.top 1 points 11 months ago

I think it becomes faster to run on CPU than that.

[–] ThisGonBHard@alien.top 1 points 11 months ago (1 children)

Pretty much not at all. The main bottleneck is memory speed.

I barely see a difference between 4 and 12 cores on 5900X when running on CPU.

When running multi GPU, the lanes are the biggest bottleneck.

On single GPU, CPU does not matter.

[–] ThisGonBHard@alien.top 1 points 11 months ago

I think it means no display in.

[–] ThisGonBHard@alien.top 1 points 11 months ago

While the benchmarks then to be cheated, especially by small models, I honestly think something is wrong with how you run it.

Yi-34B trades blows with Lllama 2 70B from my personal tests, making it do novel tasks invented by me, not the gamed benchmarks.

ALL 7B models are like putting a 7 year old vs an renowned professor when they are compared to 34B and 70B.

[–] ThisGonBHard@alien.top 1 points 11 months ago (1 children)

Why the hell would you get a 2 gen old 16 GB GPU for 7.7K when you can get 3-4 4090s, each will rofl stomp it ANY use case, let alone running 3.

Get either an A6000 (Ampere 48GB card), A6000 ADA, 3 4090s and the a AMD TR system with it or something like that. It will still run laps around the V100 and be cheaper.

[–] ThisGonBHard@alien.top 1 points 11 months ago

https://github.com/oobabooga/text-generation-webui

How much ram do you have? It matters a lot.

For a BIF simplification, think of the models you can run as the size (billion parameter, for example 13B means 13 billion) = 50-60% of your RAM.

If you have 16 GB, you can run a 7B model for example.

If you have 128GB, you can run 70B,

[–] ThisGonBHard@alien.top 1 points 11 months ago

closed-source model

You gave your own answer:

Not monitored

Not controlled

Uncensored

Private

Anonymous

Flexible

[–] ThisGonBHard@alien.top 1 points 11 months ago

The whole AI ecosystem was pretty much designed for python from the ground up.

I am guessing you can run C# as the front end, and python as back end.

[–] ThisGonBHard@alien.top 1 points 11 months ago (1 children)

I dont know if Exllama 2 supports Mac, but if it does, 70B.

view more: next ›