I have an app on the App Store that does that. It ships with a 4 bit quantised 3B parameter LLM baked in (The app is a 1.67GB download) and users on newer phones (iPhone 14,15 Pro and Pro Max) can optionally download a 3 bit quantised 7B parameter LLM.
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
Hey, I actually just tried this on my iPhone SE 2nd gen to see if it would run the 3B, even slowly, but it says it’s not compatible— any suggestions?
TinyLlama 1.1b may have potential - Tiny Llama 1.1b project
TheBloke has made a GGUF of v0.3 chat already.
Looking on HuggingFace, there may be more that have been fine tuned for instruct, etc.
Check out the tiny llama project! 1.1B parameters, pretty solid performance for its size and the currently available checkpoints are only about halfway through the complete pre-training process.
Lol, sounds rough. 3B is better than no B. And that should mean I can have several models up at once
It will be very unfriendly for the user having to have a 1-2GB app that eats RAM and battery like a mobile game, still you won't get good or quick results. Check on Replicate, runpod or vast.ai for cheap GPU