Have you tried ollama?
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
"Do I need to learn llama.cpp or C++ to deploy models using llama-cpp-python library?" No its pure python
Also AWQ has entire engines for efficieny, look into aphrodite engine, supposably the fastest for awq
vLLM is way faster, but its pretty barebones and VRAM spikes hard.
I use both, and they're pretty interchangeable in my experience. It looks like it's been 2 months since the last ctransformers update, and llama-cpp has always been a little more popular with more contributors and stars on github.
Learn docker compose. Run ollama as one of your docker containers (it's already available as a docker container). Run your website server as another docker container. Deploy it securely, and you're done. If/when you want to scale it and make it more enterprisey, upgrade from docker compose to kubernetes.