this post was submitted on 21 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

With high-end Android phones now packing upwards of 24GB of RAM, I think there's huge potential for an app like this. It would be amazing to have something as powerful as the future Mistral 13B model running natively on smartphones!

You could interact with it privately without an internet connection. The convenience and capabilities would be incredible.

top 14 comments
sorted by: hot top controversial new old
[–] GermanK20@alien.top 1 points 10 months ago (1 children)

People are always building, but the smaller models are kinda pointless

[–] Winter_Tension5432@alien.top 1 points 10 months ago (1 children)

Smaller models are the future of smartphones, everyone's will be running 10b models on their phones by 2025 this are more than enough for creating emails and translations and just asking questions, a lot more useful than siri and alexa.

[–] GermanK20@alien.top 1 points 10 months ago (1 children)

Well, I've just tested a few models for my workflows and found out only 70B cuts it.

[–] Winter_Tension5432@alien.top 1 points 10 months ago

For now, but you will have 13b models as good as 70b models by the end of next year.

[–] SlowSmarts@alien.top 1 points 10 months ago (1 children)

The direction I took was to start making a Kivy app that connects to an LLM API at home via OpenVPN. I have Ooba and LLama.cpp API servers that I can point the android app to. So, works on old or new phones and is the speed of the server.

The downsides are, you have to have a static IP address or DDNS to connect a VPN to. And cell reception can cause issues.

I have a static to my house, but a person could have the API server be in the cloud with a static IP, if you were to do things similarly.

[–] Winter_Tension5432@alien.top 1 points 10 months ago (1 children)

A normal person would not be able to do it, the first people that create a oogaboga app for android and iPhone and place it on the store at 15$ will have my money for sure and probably from a million other people too.

[–] SlowSmarts@alien.top 1 points 10 months ago

🤔 hmmm... I have some ideas to test...

[–] MrOogaBoga@alien.top 1 points 10 months ago (1 children)

Why isn't anyone building an Oogabooga-like app

you spoke the sacred words so here i am

[–] Winter_Tension5432@alien.top 1 points 10 months ago

I am dreaming with a S24 ultra with a app that let me run a hypothetical future mistral 13b running at 15 tokens/sec with tts, someone can dream.

[–] a_beautiful_rhind@alien.top 1 points 10 months ago

Apple is literally doing this stuff with their ML framework built into devices.. but for tool applications, not a chatbot.

[–] BlackSheepWI@alien.top 1 points 10 months ago (1 children)

It's a lot of work. Phones use a different OS and a different processor instruction set. The latter can be a big pain, especially if you're really dependant on low-level optimizations.

I also feel that -most- people who would choose a phone over PC for this kind of thing would rather just use a high quality easily-accessible commercial option (chatGPT, etc) instead of a homebrew option that required some work to get running. So demand for such a thing is pretty low.

[–] Winter_Tension5432@alien.top 1 points 10 months ago

I'm not so sure, chatgpt has privacy issues and a small model but completely uncensored it has value too. There is a market for this. Convenient and privacy.

[–] Nixellion@alien.top 1 points 10 months ago

Check Ollama, they have links on their GitHub page to stuff using it, and they have an android app that I believe runs locally on the phone. It uses llama.cpp

[–] _Lee_B_@alien.top 1 points 10 months ago

It's not just RAM, you also need the processing power. Phones can't do *good* LLMs yet.

If you watch the chatGPT voice chat mode closely on android, what it does is listen, with a local voice model (whisper.cpp), and then answers generally/quickly LOCALLY, for the first response/paragraph. While that's happening, it's sending what you asked to the servers, where the real text processing takes place. By the time your phone has run the simple local model and gotten a simple sentence for the first response and read that to you, it has MOSTLY gotten the full paragraphs of text back from the server and can read that. Even then, you still notice a slight delay.