LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

SPHINX: a new multi-modal LLM from the creators of LLaMA-Adapter (alien.top)

submitted 1 year ago by remixer_dec@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

I found out about this model browsing LLaMA-Adapter repo, it was released a few days ago.

Model page
Weights (40GB)
Paper
Demo

Seems to be able to handle different tasks on images such as bounding box and object-detection, text extraction. On benchmarks it shows a bit lower numbers than CogVLM, so I tried to test how well it can reason and compare it to CogVLM, I was able to get good results with SPHINX consistently, with higher temperature while CogVLM was missing the point with any configuration:

CogVLM

SPHINX

you are viewing a single comment's thread
view the rest of the comments

[–] Lirezh@alien.top 1 points 1 year ago

It's better than llava 1.5 for sure, remarkable better.
Given how they feed the image into the projector I'm not surprised about it.