this post was submitted on 13 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I do the same thing but with 70b. Then I run regular SD on a P40.
My image gen is a little slow, so I'm going to try MLC as it now supports AWQ models.
Goal here being to use the 2 P40s+3090 together at more than 8t/s and leave the other 3090 for image gen while running Goliath-120b.
To use this kind of thing away from home, I run the telegram bot.
This setup beats any service for chatting hands down.
Why do you need 70b? for prompting SD?
I found that for good prompts even mistral 7b does the job good!
You dont need 3 GPU's to run it all, I do it on 3090
I just installed TensorRT which improves the speeds by a big margin (automatic1111)
I generate 1024x1024 30step image in 3.5 secs instead of 9
I use the 70b to chat and it also prompts SD during the convo. I agree for just SD you can use almost any LLM model.
IME, TensorRT didn't help. Just shaved a second off. I also tried the vlad version (diffusers) and to compile the model. If I use the 3090 I get somewhere around 6 seconds for 1024x1024 and I found that XL doesn't do as good for smaller images.
In chat and not serious SD, even 576x576 is "enough" on this 1080P laptop. On the P40 that takes 12 seconds.
Ideally for actual SD, I will try comfyUI at some point. AFAIK, it's the only UI that does XL properly; where the latent image is passed to the refiner model. Probably why my XL outputs don't look much better than good 1.5 models.