LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

Can I run an LLM that takes up no more than 1-4GB of RAM / VRAM and have it answer questions using my notes, or is that unrealistic? (alien.top)

submitted 2 years ago by TheTwelveYearOld@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

I have an 8GB M1 MacBook Air and 16GB MBP (that I haven't turned in for repair) that I'd like to run an LLM on, to ask questions and get answers from notes in my Obsidian vault (100s of markdown files). I've been lurking this subreddit but I'm not sure if I could run LLMs <7B with 1-4GB of RAM or if the LLM(s) would be too quality.

you are viewing a single comment's thread
view the rest of the comments

[–] gentlecucumber@alien.top 1 points 2 years ago

I haven't tried Mistral yet, but RAG with a 7b might not give accurate info from the context you pass it; even larger models can have trouble with accurate Q/A over documents, but there are things you can do to help with that.

Why not just make API calls to GPT 3.5T instead of trying to barely run a 7b model at a snails pace for sub-par results? It's fractions of a penny for thousands of tokens.