LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Colud LLaVa be finetuned to perform image to markdown or even image to html conversion? (alien.top)

submitted 2 years ago by tortistic_turtle@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

Hello! I am wondering, since this would be a very interesting use case and there is more than enough training material out there (pretty much every MD file could be rendered, then the image and markdown code could be used for training/finetuning)

however I have pretty much no idea about llava. Do you think this would be feasible to do?

top 3 comments

sorted by: hot top controversial new old

[–] herozorro@alien.top 1 points 2 years ago

what you are looking for is OCR. then feed the LLM to the markdown

[–] Byt3G33k@alien.top 1 points 2 years ago

Facebook Nougat OCR model does PDF to Markdown. Also fine tuned versions of it doing PDF to LaTeX. I plan on making a fine tuned version to do PDF to XML later this winter break too!

[–] arthurwolf@alien.top 1 points 2 years ago

I'd really like a version of llava that can process comic/manga pages (read the text, say which character is saying what, doing what, in what order. pretty much turn the manga into a novel or something like that).

Anyone know of any project that is going in that direction/working on that?