I just use the PDF defaults from whatever browser I'm using at the time. Nothing...

gregsadetsky · on May 18, 2020

Got it, makes sense.

snazz · on May 18, 2020

> EDIT: Seeing the command-line you're using, the search you do is over the files' names, correct? The PDF/(original web page) text content is not indexed, right? Just to make sure I understand correctly.

pdftotext gets the actual text from the PDF. I don't do this, but I'm sure that you could automate the process of generating a text file for each PDF in a directory with pdftotext and then ripgrep the text files when it's time to search the contents. That would be doable with a makefile or a couple of shell scripts.

fit2rule · on May 18, 2020

Yeah, my computer is fast enough that I can just do "find . -name '*.pdf' -exec pdftotext {} \; | grep -i someSearchTerm" and come back later. Bonus points that it stays in my Terminal for reference later in the day as needed.

wolfhumble · on May 19, 2020

Is there a reason why you don't use mdfind instead (built-in spotlight search from the terminal)? That way you can search pdf files directly from the terminal without converting to text first, and the directory is already indexed.

E.g:

[$]mdfind -onlyin ~/MyDirectory someSearchTerm

fit2rule · on May 19, 2020

1: Force of habit, since I use grep and silversearcher elsewhere a lot, but 2: I hate the mdfind indexer service putting garbage all over my disks, so I've turned it off and forgotten about it.

burntsushi · on May 19, 2020

ripgrep can actually do this seamlessly with its preprocessors: https://github.com/BurntSushi/ripgrep/blob/b72ad8f8aa897191c...

snazz · on May 19, 2020

That's absolutely awesome! Much better than my theoretical solution.