Download koboldcpp and get gguf version of any model you want, preferably 7B from our pal thebloke. Only get Q4 or higher quantization. Q6 is a bit slow but works good. In koboldcpp. Exe select cublast and set the layers at 35-40.
You should get abot 5T/s or more.
This is the simplest method to run llms from my testing.