what you are looking for is OCR. then feed the LLM to the markdown
this post was submitted on 26 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
Facebook Nougat OCR model does PDF to Markdown. Also fine tuned versions of it doing PDF to LaTeX. I plan on making a fine tuned version to do PDF to XML later this winter break too!
I'd really like a version of llava that can process comic/manga pages (read the text, say which character is saying what, doing what, in what order. pretty much turn the manga into a novel or something like that).
Anyone know of any project that is going in that direction/working on that?