It's been a couple months since I used less-than-complete GPU offloading; When I was using my Alienware laptop (i7-8th gen, 2060 6GB) to run 13B models with 13/25 layers offloaded I was getting 1-2 t/s, so yours sounds low.
It's been a couple months since I used less-than-complete GPU offloading; When I was using my Alienware laptop (i7-8th gen, 2060 6GB) to run 13B models with 13/25 layers offloaded I was getting 1-2 t/s, so yours sounds low.