ortegaalfredo

joined 10 months ago
[–] ortegaalfredo@alien.top 1 points 10 months ago

Check panchovix repo on huggingface.

[–] ortegaalfredo@alien.top 0 points 10 months ago (2 children)

I'm hosting Goliath 120b with a much better quant (4.5b exl2, need 3x3090) and its scary, it feels alive sometimes. Also, with exllama2 it has about the same speed as a 70B model.

[–] ortegaalfredo@alien.top 1 points 10 months ago

Because LLama2-70B is similar or better in most metrics, and it small enough to not need distributed inference.

[–] ortegaalfredo@alien.top 1 points 10 months ago

LLMs on neuroengine.ai should support way more than 400 words. Don't know exactly the limit.