I deployed Llama 2 (GGUF and using CPU) as an Amazon ECS fargate service
I just bundled my entire Docker build into ECR and fired up my container
The bloke has a few quantized variants
A gguf 7B: https://huggingface.co/TheBloke/Orca-2-7B-GGUF
I deployed Llama 2 (GGUF and using CPU) as an Amazon ECS fargate service
I just bundled my entire Docker build into ECR and fired up my container