Perhaps you are using a wrong fork of KobolAI, I get much more tokens per second. Did you open the task manager and check that the GPU memory used indeed increases when loading and using the model?
Otherwise try out Koboldcpp. It needs gguf instead gptq, but needs no special fork. With cublas enabled you should get good token speeds for a 13B model.