Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[R] Meta Announces Emu Edit: Precise Image Editing via Recognition and Generation Tasks (alien.top)

submitted 2 years ago by Successful-Western27@alien.top to c/machinelearning@academy.garden

3 comments fedilink hide all child comments

Researchers at Meta AI announced Emu Edit today. It can edit images precisely based on text instructions. It's a big advance for "instructable" image editing.

Existing systems struggle to interpret instructions correctly - making imprecise edits or changing the wrong parts of images. Emu Edit tackles this through multi-task training.

They trained it on 16 diverse image editing and vision tasks like object removal, style transfer, segmentation etc.

Emu Edit learns unique "task embeddings" to guide it towards suitable edits based on the instruction text. Like a "texture change" vs "object removal".

In evaluations, Emu Edit significantly outperformed prior systems like InstructPix2Pix on following instructions faithfully while preserving unrelated image regions.

With just a few examples, it can adapt to wholly new tasks like image inpainting by updating the task embedding rather than full retraining.

There's still room for improvement on complex instructions. But Emu Edit demonstrates how multi-task training can majorly boost AI editing abilities. It's now much closer to human-level performance on translating natural language to precise visual edits.

TLDR: Emu Edit uses multi-task training on diverse edits/vision tasks and task embeddings to achieve big improvements in instruction-based image editing fidelity.

Full summary is here. Paper here.

top 3 comments

sorted by: hot top controversial new old

[–] crantob@alien.top 1 points 2 years ago

Looks like too much work to recreate easily.

[–] Xanian123@alien.top 1 points 2 years ago

I was just talking to a friend yesterday about how AI images won't take off unless tweaks can be done using natural language. If the paper's claims are true, this is going to be revolutionary.

[–] evanthebouncy@alien.top 1 points 2 years ago

We'll have a finer definition on what an edit is.

Currently from flipping the image vertically, to swapping out sub regions, to truly semantic edits like "make the person stand up". They're all lumped together and called "edits".

Something like different tiers of autonomous driving will be needed. Tier1 edits, all the way to tier5.

The proposed method is like tier2. Capable of swapping out sub regions via style transfer, but cannot meaningfully change the structure of the scene, ie "make the man stand up".