this post was submitted on 20 Nov 2023

1 points (100.0% liked)

LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

What are your thoughts on the future of LLMs running mobile? (alien.top)

submitted 2 years ago by Tree-Sheep@alien.top to c/localllama@poweruser.forum

17 comments fedilink hide all child comments

Following the release of Dimensity 9300 and S8G3 phones, I am expecting growth in popularity of LLMs running on mobile phones, as quantized 3B or 7B models can already run on high-end phones from five years ago or later. But despite it being possible, there are a few concerns, including power consumption and storage size. I've seen posts about successfully running LLMs on mobile devices, but seldom see people discussing about future trends. What are your thoughts?

top 17 comments

sorted by: hot top controversial new old

[–] Flying_Madlad@alien.top 1 points 2 years ago

This guy gets it

[–] oe-g@alien.top 1 points 2 years ago (3 children)

My personal take is what are the use cases for user friendly local LLMs on mobile compared to higher performance llm closed models?

Privacy is the only serious benefit I can think of.

[–] NDBellisario@alien.top 1 points 2 years ago (2 children)

Latency is one thing with the internet.

Any model that can run locally doesn’t need a round trip to a datacenter. This can of course depending on computer power

[–] Maykey@alien.top 1 points 2 years ago

At current capabilities it's faster to query server on the opposite hemisphere than to generate locally.

[–] CocksuckerDynamo@alien.top 1 points 2 years ago

round trip latency of an http request (or grpc or whatever pick your poison) is utterly insignificant compared to the time it takes to run the inference process, even for the smallest models with the fastest inference

[–] Combinatorilliance@alien.top 1 points 2 years ago

It's not going to be just chat. The LLMs are going to be integrated into everything in the OS.

Suggesting emails, finding appointments in e-mail (I believe this already exists somewhat for Apple? In any case it will be private, local and more reliable), improved search, way improved personal assistant, APIs to access the model from any app. Lots of stuff...

[–] GraceRaccoon@alien.top 1 points 2 years ago

Privacy I don't care about too late for that lol. If it becomes as normal to use ai as it is to google something, my worry about be it intentionally using language to fuck with my head, or skew my perspective on something I'm trying to get info on. Social engineering is a spooky thing. Algorithms on social media are already causing damage lol.

[–] nntb@alien.top 1 points 2 years ago (1 children)

i run a LLMA on my galaxy fold 4.

[–] RealLordDevien@alien.top 1 points 2 years ago

How? Via Termux? What package / config do you use?

[–] vikarti_anatra@alien.top 1 points 2 years ago

Just my thoughts on this:

Would be great.

Would be rather limited but possible (thanks to https://llm.mlc.ai/ and increasing memory).

A lot of CHEAP Chinese devices will say they can actually do it. They will. At 2 bit quatization and <1 t/s and it would be 7B Models or even less. They will be unusuable.

Google say it's not necessary because you can use their Firebase Services for AI and you can use NNAPI anyway. You must also censor your LLM-using apps in Play Store to adhere to their rules.

Apple says it's not necessary, later they will advertise it as very good thing and provide optmized libraries and some pretrained models but you need to buy latest iphone(last-year won't work because Apple). You must also censor your apps AND mark it as 18+

Areas of usage?

- Language translation (including voice-to-voice). Basically much more improved google translate.

- AI Assistant (basically MUCH more imroved Siri, used not only as command interface).

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

My thought is even if it runs fairly decently, it's going to drain your battery.

[–] jamesstarjohnson@alien.top 1 points 2 years ago

If llm is packaged in an app and installed on a phone with all the rights given and good function calling ability to use os api it will be able to do lots of things with voice commands even without being fully integrated like siri

[–] AreYouOKAni@alien.top 1 points 2 years ago

Theoretically doable, practically unlikely. Battery life will take a significant hit, and the 3B/7B models don't provide THAT much benefit to just take that hit.

It is something to consider in the future, though. Like, 5 years from now we will probably have SoCs that are efficient enough to do it live.

[–] ab2377@alien.top 1 points 2 years ago

i am running tinyllama and deepseek 1.3B on a almost 3 year old cheap Poco X3 (snapdragon 732G) and its great. Will post the video soon. So the new phones, and high-end ones, well i am sure some people can run mistral on those. But i also wish that phones gets some of its prices reduced, high-end phones are becoming more expensive the most laptops i cant afford.

[–] Maykey@alien.top 1 points 2 years ago

My hot take is that local models will become truly feasible on phones(and in general) only once we move past transformers towards something more FLOP and memory efficient(RetNet, S5)

[–] sshan@alien.top 1 points 2 years ago

Hard to make a broad use case here until power consumption drops. Best approach is still push to cloud.

Edge cases like robotics / cars / high availbility likely exist though and could be big niche.

[–] gabbalis@alien.top 1 points 2 years ago

I know the question here is about running LLMs on mobile, but that's building in too many assumptions I think.

The future of LLM technology is as follows

Large models learn to do a new task
Specific tasks get broken down into foundational sub tasks
foundational subtasks are distilled into memoized code, hardcoded transformers and traditional code.
you no longer use a Large model for that subtask, instead you use a highly specialized module that fits on a toaster.

This loop is going to get faster and faster, and once its generally accessible, you're no longer concerned with what LLMs run on you're phone, you're instead concerned with which specific subtasks can be designed to run on your phone and how to assemble them into your application's specific needs. At the end of the day, you are not going to need to ask an AGI to fill out API calls.