LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Is there any sort of project that is combines Text + Image + TTSVoice generation in one single UI ? (alien.top)

submitted 2 years ago by Starkboy@alien.top to c/localllama@poweruser.forum

7 comments fedilink hide all child comments

So I have the text-generation-ui by oogabooga running at one place, then I also have stable diffusion in the other tab. But I'm looking for ways to expose these project's APIs, and then combine them to then produce output like what GPT-4 does, where it can call APIs when it needs to, to other models.

I'm also looking for a solution where the text generation output is also able to execute the said code, and then infer from its results to do next things. (iknow the risks but yeah).

top 7 comments

sorted by: hot top controversial new old

[–] DanIngenius@alien.top 1 points 2 years ago (1 children)

This is something I'm interested in working on, i want to crowd fund a good LLM + SD + TTSvoice host, DM me if you are interested in taking part!

[–] BuzzLightr@alien.top 1 points 2 years ago

We did a vtt llm ttv sadtalker project.. Ended up replacing the images for a gif thingy due to generation delays..

Im looking forward to a good project that does this the right way

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (2 children)

Two off the top of my head: https://heyamica.com/ and silly tavern for fun stuff.

For agents there are https://github.com/spyglass-search/talos or https://github.com/Josh-XT/AGiXT

I think the problems is work + play aren't really the same goals.

[–] Starkboy@alien.top 1 points 2 years ago

Thanks for your answer! I get it. These projects do give me some ideas. I didn't know such things are called 'agents' in this space

[–] jkende@alien.top 1 points 2 years ago

Really just a UX problem. Work is a subset of play. Play is how we simulate and practice anything and everything

[–] Material1276@alien.top 1 points 2 years ago

Its probably not what you're looking for, but SillyTavern does do all those things via API calls.

https://docs.sillytavern.app/

https://docs.sillytavern.app/usage/api-connections/

https://docs.sillytavern.app/extras/extensions/stable-diffusion/

https://docs.sillytavern.app/extras/extensions/tts/

[–] LyPreto@alien.top 1 points 2 years ago

you have all the APIs whats stopping you from putting something like this together? personally for me the only challenge is finding projects compatible with M1 that offer Metal offloading— but for linux it should be relatively straightforward to implement