a_beautiful_rhind

joined 1 year ago
[–] a_beautiful_rhind@alien.top 1 points 11 months ago

Good luck. Centrism is not allowed. You would have to skip the last decade of internet data. Social engineering works for both people and language models much the same.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

I'm not getting a super huge jump with the bigger models yet. Just a mild bump. I got a P100 to load the low 100s and have exllama work. That's 64g of FP16 using vram.

For bigger I can use FP32 and put back the 2 more P40s. That's 120g of vram. Also 6 vidya cards :P

It required building for this type of system from the start. I'm not made of money either, I just upgrade it over time.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

It really is christmas.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

I got a P100 for like $150 to see how well it will work with exllama + 3090s and if it is any faster at SD.

These guys are all gone already.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

Would be cool to see this in a 34b and 70b.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

Aren't there people selling such services to companies here? Implementing RAG, etc.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

Heh, 72b with 32k and GQA seems reasonable. Will make for interesting tunes if it's not super restricted.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

That's a good sign if anything.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

one is not enough

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

Does it give refusals on base? 67B sounds like full foundation train.

[–] a_beautiful_rhind@alien.top 1 points 11 months ago

Something is wrong with your environment. even P40s give more than that.

Other option is you don't get enough tokens to get proper t/s speed. What was the total inference time?

view more: next ›