this post was submitted on 14 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I found out about this model browsing LLaMA-Adapter repo, it was released a few days ago.

Model page
Weights (40GB)
Paper
Demo

Seems to be able to handle different tasks on images such as bounding box and object-detection, text extraction. On benchmarks it shows a bit lower numbers than CogVLM, so I tried to test how well it can reason and compare it to CogVLM, I was able to get good results with SPHINX consistently, with higher temperature while CogVLM was missing the point with any configuration:

CogVLM

SPHINX

top 1 comments
sorted by: hot top controversial new old
[–] Lirezh@alien.top 1 points 1 year ago

It's better than llava 1.5 for sure, remarkable better.
Given how they feed the image into the projector I'm not surprised about it.