remixer_dec

joined 2 years ago

SPHINX: a new multi-modal LLM from the creators of LLaMA-Adapter (alien.top)

submitted 2 years ago by remixer_dec@alien.top to c/localllama@poweruser.forum

1 comments fedilink

I found out about this model browsing LLaMA-Adapter repo, it was released a few days ago.

Seems to be able to handle different tasks on images such as bounding box and object-detection, text extraction. On benchmarks it shows a bit lower numbers than CogVLM, so I tried to test how well it can reason and compare it to CogVLM, I was able to get good results with SPHINX consistently, with higher temperature while CogVLM was missing the point with any configuration:

CogVLM

SPHINX

Translate to and from 400+ languages locally with MADLAD-400 in c/localllama@poweruser.forum

[–] remixer_dec@alien.top 1 points 2 years ago (1 children)

Thanks a lot for converting and quantizing these. I have a couple of questions.

How does it compare to ALMA? (13B)

Is it capable of translating more than 1 sentence at a time?

Is there a way to specify source language or does it always detect it on its own?