LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Running Multiple WebUI instances (follow up from my question yesterday) (alien.top)

submitted 1 year ago by multiverse_fan@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

It's working great so far. Just wanted to share and spread awareness that running multiple instances of webui (oobabooga) is basically a matter of having enough ram. I just finished running three models simultaneously (taking turns of course). Only offloaded one layer to gpu per model, used 5 threads per model, and all contexts were set to 4K. (the computer has 6 core cpu, 6GB vram, 64GB ram)

The models used were:

dolphin-2.2.1-ashhlimarp-mistral-7b.Q8_0.gguf

causallm_7b.Q5_K_M.gguf

mythomax-l2-13b.Q8_0.gguf (i meant to load a 7B on this one though)

I like it because it's similar to the group chat on character.ai but without the censorship and I can edit any of the responses. Downsides are having to copy/paste between all the instances of the webui, and it seems that one of the models was focusing on one character instead of both. Also, I'm not sure what the actual context limit would be before the gpu would go out of memory.

https://preview.redd.it/8i6wwjjtt54c1.png?width=648&amp%3Bformat=png&amp%3Bauto=webp&amp%3Bs=26adca2a850f62165301390cdd4ba11548447c0d

https://preview.redd.it/3c9z5ee9u54c1.png?width=1154&amp%3Bformat=png&amp%3Bauto=webp&amp%3Bs=210d7c67bcf0efafeb3f328e76199f13159dae64

https://preview.redd.it/lt8aizhbu54c1.png?width=1154&amp%3Bformat=png&amp%3Bauto=webp&amp%3Bs=d24f8b2bf899084bbdb11d73e34b5564b629e0be

https://preview.redd.it/8lbl4nzeu54c1.png?width=1154&amp%3Bformat=png&amp%3Bauto=webp&amp%3Bs=a81b8f1d8630e3d17ad37885915f8c7e3077584c

top 2 comments

sorted by: hot top controversial new old

[–] fediverser@alien.top 1 points 1 year ago

This post is an automated archive from a submission made on /r/LocalLLaMA, powered by Fediverser software running on alien.top. Responses to this submission will not be seen by the original author until they claim ownership of their alien.top account. Please consider reaching out to them let them know about this post and help them migrate to Lemmy.

Lemmy users: you are still very much encouraged to participate in the discussion. There are still many other subscribers on !localllama@poweruser.forum that can benefit from your contribution and join in the conversation.

Reddit users: you can also join the fediverse right away by getting by visiting https://portal.alien.top. If you are looking for a Reddit alternative made for and by an independent community, check out Fediverser.

[–] Murky-Ladder8684@alien.top 1 points 1 year ago

If you learn AutoGen you could assign each model to a different agent and have them interact. If using the same model and having multiple char talk is your thing than the sillytavern group option is the way.