GeraltOfRiga

joined 11 months ago
 

Something like https://huggingface.co/spaces/Lin-Chen/ShareGPT4V-7B but that understands audio instead.

Thanks!

[–] GeraltOfRiga@alien.top 1 points 11 months ago

This is kinda nuts (first time I try a LLM + vision)

Tried with a first person shooter screenshot, enemy on screen. Asked to give me the 2D coordinates of the enemy and it did, precisely.