this post was submitted on 16 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

have been thinking about this for a while-- does anyone know how feasible this is? Basically just applying some sort of "LoRa" on top of models to give them vision capabilities-- making then multimodal.

top 2 comments
sorted by: hot top controversial new old
[โ€“] mcmoose1900@alien.top 1 points 1 year ago

There's more than one image ingestion model already. Several for llama/mistral.

If you are talking about generating images, I dunno about that. Some people hook up LLMs to prompt stable diffusion, but thats not really the same thing.

[โ€“] Glat0s@alien.top 1 points 1 year ago