If model fits completely inside 12Gb than it would work faster on a desktop, if model not fits into 12Gb but fits fully in 16Gb then you have a good chances it would run faster on a laptop with 16Gb GPU.
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
i can't speak for the desktop 3080ti, but i have that laptop card and it's roughly equivalent in performance to my 4060ti desktop card.
That’s odd considering the 4060 Ti desktop is 8GB VRAM. But are you saying just speed or are you able to run larger parameter LLMs on your laptop that your desktop wouldn’t be able to?
I have the 16gb version of 4060ti, so the cards have nearly identical capabilities.
You mind shooting a few test to have real word numbers? Like what kind of speeds are you getting for a 7b q6 and 13b q6, they should fully fit in VRAM
You mind shooting a few test to have real word numbers for the laptop version? Like what kind of speeds are you getting for a 7b q6 and 13b q6, they should fully fit in VRAM