I'm pretty knew here so apologies if I'm coming off green with the request ahead of time.
Im looking to see what the best options for running a LVLM (any LLM with visual recognition capabilities like supplying it an image, etc) locally. Bonus points for anything that can also be helpful with video / gif generation
And any (if at all) LM's that do work with sound / voice recognition too that can be run locally.