Ok, I tried your benchmark in Tesseract 4.0.0-beta.1 in Ubuntu 18.04, and it gave me:
> EARTHQUAKE HOUSE
> ‘his building was erected around 1874 to provide a location for a seismometer of the British
Association. A seismometer is an instrument which is designed to record earthquarkes and the
‘one located in this building was only one of a series of such instruments located in the vicinity
‘earthquakes which had been, and continue to be, prevalent in the area.
> of Comrie to investigate
Clearly, Tesseract 4.0 has problems following the baseline of the text. But otherwise, it is much better than the output from the website and even got the title correct. Which makes me think they use an older version (?)
This project is badly in need of an update. It's using an ancient 4 yo. version of tesseract.js. The current version:
https://github.com/naptha/tesseract.js
is based on tesseract v4.1.1, a newer your Ubuntu 18.04's.
The 4.0 version added new neural network system based on LSTMs, with major accuracy gains.
Ok, I tried the demo page of project naptha on exactly the same image, and it gave me:
> ARTHQUAKE HOUSE 1
> Ths builing was erecied Ground 1874 to provide a location for a seismomater of the Bitish Aasociation. A sefémomatar is an nsirument which is designed 10 racord earthquarkes and the ne ocate n his buiding was only one of a seies of such instruments located in the vicinity of Coms tonvestgete the earthauakes which had been, and continue to be, prevalet n the area.
> EARTHQUAKE HOUSE
> ‘his building was erected around 1874 to provide a location for a seismometer of the British Association. A seismometer is an instrument which is designed to record earthquarkes and the ‘one located in this building was only one of a series of such instruments located in the vicinity ‘earthquakes which had been, and continue to be, prevalent in the area.
> of Comrie to investigate
Clearly, Tesseract 4.0 has problems following the baseline of the text. But otherwise, it is much better than the output from the website and even got the title correct. Which makes me think they use an older version (?)
My commandline: