this post was submitted on 27 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

anyone knows some robust open source library for extracting tables from pdf , even ocr library is fine

P.S- i have already tried tabula ,camelot , ing2table, unstructured.io and most of the document loader in langchain , none of them are even 95% robust

you are viewing a single comment's thread
view the rest of the comments
[–] Dry_Long3157@alien.top 1 points 9 months ago (2 children)

nougat by Facebook is your best bet.

[–] happy_dreamer10@alien.top 1 points 9 months ago (1 children)

thanks will check it out . have you tried it ?

[–] Dry_Long3157@alien.top 1 points 9 months ago

Yup, it's the best I've tried for tables and math formulas.