Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The "Using python to dump the PDF to text" dramatically underestimates how hard this is.

Tables and especially multi-column PDFs often need one-off handling and - worse - you don't know when one is being misparsed until you start getting weird search results. At that point you need to debug your entire search pipeline, which isn't fun!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: