LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

I wonder theres way to run LLM without loading on ram (alien.top)

submitted 1 year ago by wjohhan@alien.top to c/localllama@poweruser.forum

9 comments fedilink hide all child comments

https://preview.redd.it/txoqaubzehzb1.png?width=1062&format=png&auto=webp&s=5ce1e0599c1b0430106cd828cad77dc516a42a4a

https://reddit.com/link/17rzqfm/video/fqtexzq5fhzb1/player

https://preview.redd.it/s60h7gh1fhzb1.png?width=1016&format=png&auto=webp&s=23f963f561d4f57c8562924032301ce0256e4249

Heard Apple's working on an on-device Siri with LLMs, but these models are memory-intensive, especially for iPhone's limited RAM. This isn't just an Apple issue; big tech companies who want to run ML models on device, like samsung, google, meta will face same problem.

What if models could run directly from storage instead of RAM?

Samsung is onto something with their MRAM tech – it's non-volatile, power-efficient, and can handle some Logic, AI processing. Imagine your phone running models from storage!

Not an ML expert, but this tech evolution is intriguing. is there other attempt like this?

top 9 comments

sorted by: hot top controversial new old

[–] xadiant@alien.top 1 points 1 year ago (1 children)

Sure, it's just going to generate 5 tokens per week

[–] Aaaaaaaaaeeeee@alien.top 1 points 1 year ago

It will never be this bad, at most, it would be 2min / t

[–] fallingdowndizzyvr@alien.top 1 points 1 year ago (1 children)

What if models could run directly from storage instead of RAM?

You can already do that. That's what mmap does. It uses a file on storage as if it were RAM. It's not speedy. Since even the fastest SSD is slow compared to RAM.

[–] wjohhan@alien.top 1 points 1 year ago

Thanks didn't know that

[–] Herr_Drosselmeyer@alien.top 1 points 1 year ago

RAM is storage, just faster to access and write to.

[–] nazihater3000@alien.top 1 points 1 year ago

Do you want it to bake cookies, too?

[–] MINIMAN10001@alien.top 1 points 1 year ago

Spoiler alert MRAM is just RAM but more expensive. You would be better off just buying more RAM.

Ram is just faster storage.

None of that is relevant to LLMS.

[–] 2016YamR6@alien.top 1 points 1 year ago

Did they confirm the LLM is on device memory? That wouldn’t make much sense to me at all. Siri already takes an input and sends it to the cloud to return a response. Why wouldn’t they use the same concept and just connect LLM to the cloud to process the response then send to the phonev

[–] SlowSmarts@alien.top 1 points 1 year ago

I ran a 13b Q_4 on a Raspberry Pi4 8Gb with Llama.cpp with no special settings, it just automatically cashed from disk... Was mega slow and got worse with more tokens, but did it. Don't know if it was Llama.cpp or Raspberry Pi OS that automatically cached.

You can cmake Llama.cpp on many platforms.