G_S_7_wiz

joined 2 years ago

submitted 2 years ago by G_S_7_wiz@alien.top to c/localllama@poweruser.forum

2 comments fedilink

I am working on a project where I have to extract tables from PDFs(usually financial reports which contain lot of tables(simple tables and cells merged tables) and graphs).
Following are the libraries that have been used without much great results:
Naugat, PyMuPDF(fitz) , PyPDF2 , pdfplumber, PDFMiner, Camelot, Tabula, pdfquery

What other OCR, LLMs or other tools do you recommend to proceed further? Thanks in advance!