LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

ctransformers VS llama-cpp-python which one should I use? (alien.top)

submitted 2 years ago by ZiadHAsan23@alien.top to c/localllama@poweruser.forum

6 comments fedilink hide all child comments

I'm on my way to deploy a GGUF model on Huggingface space (free hardware CPU and RAM).Currently I'm using a GGUF model because I need to run it using CPU. Later, I have plans to run AWQ models on GPU. I'm currently thinking about ctransformers or llama-cpp-python. Please suggest me which one should I use as a beginner with a plan of integrating llms with websites in future.

Comparison Aspects

Speed

Computational Power Efficiency

Readability and ease of use

Popularity and availability of educational resources

Extra Questions

If I learn ctransformers, is it gonna help me when I will use the huggingface transformers library to load gpu based models in the future? Which one has more resources to solve problems? which one requires less code to run? consider all these aspects and you must choose one between the two

Do I need to learn llama.cpp or C++ to deploy models using llama-cpp-python library?

I used to run AWQ quantized models in my local machine and there is a huge difference in quality. Same model with same bit precision performs much, much worse in GGUF format compared to AWQ. Is there something wrong? Suggest me some fixes

top 6 comments

sorted by: hot top controversial new old

[–] andershaf@alien.top 1 points 2 years ago

Have you tried ollama?

[–] vatsadev@alien.top 1 points 2 years ago

"Do I need to learn llama.cpp or C++ to deploy models using llama-cpp-python library?" No its pure python

[–] vatsadev@alien.top 1 points 2 years ago (1 children)

Also AWQ has entire engines for efficieny, look into aphrodite engine, supposably the fastest for awq

[–] mcmoose1900@alien.top 1 points 2 years ago

vLLM is way faster, but its pretty barebones and VRAM spikes hard.

[–] randull@alien.top 1 points 2 years ago

I use both, and they're pretty interchangeable in my experience. It looks like it's been 2 months since the last ctransformers update, and llama-cpp has always been a little more popular with more contributors and stars on github.

[–] _Lee_B_@alien.top 1 points 2 years ago

Learn docker compose. Run ollama as one of your docker containers (it's already available as a docker container). Run your website server as another docker container. Deploy it securely, and you're done. If/when you want to scale it and make it more enterprisey, upgrade from docker compose to kubernetes.