The "Using python to dump the PDF to text" dramatically underestimates how hard this is.
Tables and especially multi-column PDFs often need one-off handling and - worse - you don't know when one is being misparsed until you start getting weird search results. At that point you need to debug your entire search pipeline, which isn't fun!
Tables and especially multi-column PDFs often need one-off handling and - worse - you don't know when one is being misparsed until you start getting weird search results. At that point you need to debug your entire search pipeline, which isn't fun!