this post was submitted on 21 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'm the author of this article, thank you for posting it! If you don't want to use Medium, here's the link to the article on my blog: https://mlabonne.github.io/blog/posts/ExLlamaV2_The_Fastest_Library_to_Run%C2%A0LLMs.html
I'm a little surprised by the mention of
chatcode.py
which was merged intochat.py
almost two months ago. Also it doesn't really require flash-attn-2 to run "properly", it just runs a little better that way. But it's perfectly usable without it.Great article, though. thanks. :)
Thanks for your excellent library! It makes sense because I started writing this article about two months ago (
chatcode.py
is still mentioned in theREADME.md
by the way). I had a very low throughput using ExLlamaV2 without flash-attn-2. Do you know if it's still the case? I updated these two points, thanks for your feedback.