this post was submitted on 26 Dec 2023

28 points (81.8% liked)

Technology

62853 readers

5469 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

Best text to image generator (lemmy.world)

submitted 1 year ago by Billd111@lemmy.world to c/technology@lemmy.world

13 comments fedilink hide all child comments

I have used several different generators. What they all seem to have in common is that they don't always display what I am asking for. Example: if I am looking for a person in jeans and t-shirt, I will get images of a person wear things totally different clothing and it isn't consistent. Another example is if I want a full body picture, that command seems to be ignored giving just waist up or just below the waist. Same goes if I ask for side views or back views. Sometimes they work. Sometimes they don't. More often they don't. I have also seen that none of the negative requests seem to actually work. If I ask for pictures of people and don't want them using cell phones or no tattoos, like magic they have cell phones. Some have tattoos. I have noticed this in every single generator I have used. Am I asking for things the wrong way or is the AI doing whatever it wants and not paying attention to my actual request?

Thanks

top 13 comments

sorted by: hot top controversial new old

[–] Vibi@lemmy.world 15 points 1 year ago (1 children)

My favorite has been locally hosting Automatic1111's UI. The setup process was super easy and you can get great checkpoints and models on Civitai. This gives me complete control over the models and the generation process. I think it's an expectation thing as well. Learning how to write the correct prompt, adjust the right settings for the loaded checkpoint, and running enough iterations to get what you're looking for can take a bit of patience and time. It may be worth learning how the AI actually 'draws' things to adjust how you're interacting with it and writing prompts. There's actually A LOT of control you gain by locally hosting - controlNet, LORA, checkpoint merging, etc. Definitely look up guides on prompt writing and learn about weights, order, and how negative prompts actually influence generation.

[–] EdgeRunner@lemmy.dbzer0.com 2 points 1 year ago

Ive started with stablediffusion_webui, i feel you !!

[–] rickdg@lemmy.world 12 points 1 year ago (1 children)

Can you give an example of a complete prompt? Are you using Dall-E, Midjourney, Stable Diffusion…?

It seems that all models need to have prompts crafted specifically for them and you need to follow-up with corrections. The follow-up is critical for pretty much anything these LMMs output.

[–] Ragdoll_X@lemmy.world 4 points 1 year ago* (last edited 1 year ago) (1 children)

Image-to-image also helps a lot with SD. Even some roughly-drawn blobs can be the difference between the image almost matching what you had in mind vs. looking exactly how you intended.

[–] BlueEther@no.lastname.nz 1 points 1 year ago

I just cant get img2img on SD to work for me to get images that are what I want(A1111 front end)

[–] EdgeRunner@lemmy.dbzer0.com 6 points 1 year ago (1 children)

Its time to promote, https://lemmy.dbzer0.com/c/stable_diffusion_art.

Very helpfull and relaxing,

[–] CommunityLinkFixer@lemmings.world 9 points 1 year ago

Hi there! Looks like you linked to a Lemmy community using a URL instead of its name, which doesn't work well for people on different instances. Try fixing it like this: !stable_diffusion_art@lemmy.dbzer0.com

[–] silas@programming.dev 6 points 1 year ago* (last edited 1 year ago)

Talking to a text-to-image model is kinda like meeting someone from a different generation and culture that only half knows your language. You have to spend time with them to be able to communicate with them better and understand the “generational and cultural differences” so to speak.

Try checking out PromptHero or Civit.ai to see what prompts people are using to generate certain things.

Also, most text-to-image models are not made to be conversational and will work better if your prompts are similar to what you’d type in when searching for a photo on Google Images. For example, instead of a command like “Generate a photo for me of a…”, do “Disposable camera portrait photo, from the side, backlight…”

[–] altima_neo@lemmy.zip 2 points 1 year ago* (last edited 1 year ago)

Dall-E 3 seems to be the easiest to use and from my experience, does pretty well with prompts like that.

The issue is that it's quick to throttle you after a while and it's heavily censored for seemingly innocuous words.

Stable Diffusion can be a bit dumb sometimes, occasionally giving you an image of a person wearing jean everything. Now if you're willing to put in the time to learn to use Stable Diffusion, and you are able to run it on your PC, it's got a lot of freedom and unlimited image output as fast as your GPU can handle. You could use the "regional prompter" extension to mark zones where you want jeans to be, a specific shirt, etc. Or use inpaint to regenerate a masked area. It's more work, but it's very flexible and controllable.

[–] simple@lemmy.world 2 points 1 year ago

Dall-E 3 is the easiest to use and usually understand prompts the best. You can use it for free via Bing Image Editor.

[–] Billd111@lemmy.world 1 points 1 year ago (1 children)

Let me add to my post. I am using Perchance. I didn't make that clear. I am not sure what system it uses. I am only a beginner at this. I use it primarily because it is completely free. There are no credits to earn or anything to buy. Most other "free" sites offer 5-10 start up credits then want you to purchase a credit package to continue to use them. My original intent was to take a picture I have of someone I know and use that face to create a character. I was told that was image to image AI. I found a few free trial ones but they either take your original and digitize it or make you use it in their pre-designated environments they offer. You can't create an image and use it to create your own environment. So, I am stuck with text to image which only works sometimes and will only work if the AI knows the person you designate. For example, if I say Taylor Swift on a beach in a bikini, it will generate and image likeness of her in the environment I specified, but ONLY if I use words they will allow. If I say, put my friend Cheryl in a similar picture, I get some stranger that looks nothing like her. I tried the Taylor thing on Bing to test it out and it won't do it because I used words Bing felt were inappropriate. That isn't exactly porn. There are actual pictures of that on the internet. There is no creative freedom on most of the AI sites available. Perchance was the only one that would allow that and was totally free and had unlimited usage. I am just trying to learn this technique. I am not looking to spend big money on this, but I would like something that is consistent.

[–] Usernameblankface@lemmy.world 2 points 1 year ago

Taylor Swift alone gets the image blocked every time on Bing.

Given that all my experience is with Bing, all I can add with confidence is that given the goal you have, Bing is not the tool for the job.

I'd hazard a guess that you're better off learning to edit pictures to get what you're going for.

[–] Usernameblankface@lemmy.world 1 points 1 year ago* (last edited 1 year ago)

I use DAL-E 3 through the Bing Image Creator website. It's free and happens to work well with the way I describe things.

For the full body picture, describe their shoes as well as their hat or hair. Or describe what they're standing on and what they're looking at.

Most of the time, DalE will take "do not include thing" to mean "~~do not~~ include thing." Sometimes starting from Bing chat and asking for it to draw a picture not including a thing works better.