this post was submitted on 17 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

Researchers at Meta AI announced Emu Edit today. It can edit images precisely based on text instructions. It's a big advance for "instructable" image editing.

Existing systems struggle to interpret instructions correctly - making imprecise edits or changing the wrong parts of images. Emu Edit tackles this through multi-task training.

They trained it on 16 diverse image editing and vision tasks like object removal, style transfer, segmentation etc.

Emu Edit learns unique "task embeddings" to guide it towards suitable edits based on the instruction text. Like a "texture change" vs "object removal".

In evaluations, Emu Edit significantly outperformed prior systems like InstructPix2Pix on following instructions faithfully while preserving unrelated image regions.

With just a few examples, it can adapt to wholly new tasks like image inpainting by updating the task embedding rather than full retraining.

There's still room for improvement on complex instructions. But Emu Edit demonstrates how multi-task training can majorly boost AI editing abilities. It's now much closer to human-level performance on translating natural language to precise visual edits.

TLDR: Emu Edit uses multi-task training on diverse edits/vision tasks and task embeddings to achieve big improvements in instruction-based image editing fidelity.

Full summary is here. Paper here.

you are viewing a single comment's thread
view the rest of the comments
[–] Xanian123@alien.top 1 points 10 months ago

I was just talking to a friend yesterday about how AI images won't take off unless tweaks can be done using natural language. If the paper's claims are true, this is going to be revolutionary.