this post was submitted on 14 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

(title)

top 2 comments
sorted by: hot top controversial new old
[–] Susp-icious_-31User@alien.top 1 points 10 months ago

I store all mine on slow drives because no matter where you load it, RAM or VRAM, it gets fully loaded and the original file is forgotten about. And it’s not like the read speed of huge files is terrible, even on a spinning disk. Even if you overload your RAM and swap to disk, you’ll still be using your designated pagefile/swap drive rather than your LLM files drive.

[–] rgar132@alien.top 1 points 10 months ago

Eh I’ll differ here and say yes it sometimes matters. For example, if you’re writing code that requires you to load the model to debug the code and watch it break because of some mundane thing, only to tweak it a bit and have to wait to reload the model again. I store most models on slow drives but have a ram disk set up for the one I’m working with. Loading from spinning rust to ram disk takes a few minutes sometimes but after that it’s quick to load the model into vram.

As far as quality or speed of the output after model load then no it doesn’t really matter as long as the model fully fits in vram or other fast storage when you’re using it.

All that said, a t7 is plenty fast anyway, much faster than a spinning platter or even many of the sata ssd’s, so you’ll be fine.