Doesn't the LlamaCpp server host a GUI for multimodal? You could potentially visit it, open the developer panel in your browser, and observe the HTTP requests being sent.
this post was submitted on 26 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
I ended up just scrutinizing the server code to understand it better and found that the prompt needs to follow a very specific format or else it won't work well:
prompt: \
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\nUSER:[img-12]${message}\nASSISTANT:``