There's fuyu-8b, but no commercial license.
It can really cover the "GPT-4 reads websites" and stuff like that, helpful with complex charts too. Other than that LLava is your best hope.
There's fuyu-8b, but no commercial license.
It can really cover the "GPT-4 reads websites" and stuff like that, helpful with complex charts too. Other than that LLava is your best hope.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/neva-22b
https://replicate.com/joehoover/instructblip-vicuna13b/api
Here are a couple that haven't been mentioned; they're quite a lot weaker than GPT4V though, as to be expected from small models.
have you checked out the new release from OpenVL? Their vision API is gaining traction and might fit your needs.
have you checked out LLaVa's early maturity? seems like a promising alternative. not sure about commercial offerings though.