what you are looking for is OCR. then feed the LLM to the markdown
this post was submitted on 26 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
Facebook Nougat OCR model does PDF to Markdown. Also fine tuned versions of it doing PDF to LaTeX. I plan on making a fine tuned version to do PDF to XML later this winter break too!
I'd really like a version of llava that can process comic/manga pages (read the text, say which character is saying what, doing what, in what order. pretty much turn the manga into a novel or something like that).
Anyone know of any project that is going in that direction/working on that?