GGUF via TheBloke:
farkinga
As so often happens, the real LPT is in the comments. Using sysctl to change vram allocation is amazing. Thanks for this post.
The words you see were generated by a neural network based on the words it was trained on. That text is not related to the intentions or capabilities of the model.
Since it is running in gpt4all, we can see from the source code that the model cannot call functions. Therefore, the model cannot "do" anything; it just generates text.
If, for example, the model said it was buying a book from a website, that doesn't mean anything. We know it can't do that because the code running the model doesn't provide that kind of feature. The model lives inside a sandbox, cut off from the outside world.
Nice post. This got me thinking...
While many commenters are discussing the computation aspect, which leads to petals and the horde, I am thinking about bit torrent (since you mentioned it).
We do need a hub for torrenting LLMs. HF is amazing for their bandwidth (okay for the UI) - but once that VC money dries up, we'll be on our own. So, distributing the models - just the data, not the computation - is also important.
Yeah! That's what I'm talking about. Would you happen remember what it was reporting before? If it's like the rest, I'm assuming it said something like 40 or 45gb, right?