this post was submitted on 25 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
CogVLM is supposed to support this with prompts like "Can you provide a description of the image and include the coordinates [[x0,y0,x1,y1]] for each mentioned object?"
However I couldn't get it to work properly, it would just hallucinate.
If you want to give it a shot here are the official visual QA prompts