We use GPT-4o for data extraction from documents, its really good. I published a small library that does a lot of the document conversion and output parsing: https://npmjs.com/package/llm-document-ocr
For straight OCR, it does work really well but at the end of the day its still not 100%
For straight OCR, it does work really well but at the end of the day its still not 100%