I just wrote a post today about serving 7B models with `llama.cpp` from cheap AWS instances - might be useful:
https://github.com/ggerganov/llama.cpp/discussions/4225
I just wrote a post today about serving 7B models with `llama.cpp` from cheap AWS instances - might be useful:
https://github.com/ggerganov/llama.cpp/discussions/4225