If you have installed or use Oogabooga tex-generation-webui download a model that has ben quantized for nVidia GPU. Those are the models with GTPQ and the newer AWQ suffixes.
On hugging face, the user "thebloke" has aggregated dozens and dozens, maybe hundreds, of models.
the youtube channel Aitrepreneur a couple good videos on installing ooga and how to run the GPU quantized models
If you have installed or use Oogabooga tex-generation-webui download a model that has ben quantized for nVidia GPU. Those are the models with GTPQ and the newer AWQ suffixes.
On hugging face, the user "thebloke" has aggregated dozens and dozens, maybe hundreds, of models.
the youtube channel Aitrepreneur a couple good videos on installing ooga and how to run the GPU quantized models