[clue-tech] OCR.

David L. Anselmi anselmi at anselmi.us
Sat Nov 28 18:50:38 MST 2009


I had a chance to try out Free OCR tools.  tesseract seemed to be the 
best, gocr and ocrad not so good.  But...

First try the file tesseract made resembled the original, but barely. 
The original was a dot-matrix print and scanned at 300dpi you could see 
the individual dots.  150dpi was the sweet spot for that page and then 
tesseract produced something accurate enough to be better than starting 
from scratch.

I didn't try gocr or ocrad at 150dpi.  gocr especially might have been 
better--at 300dpi is produced a useless set of dots, commas, and what 
not.  So perhaps it was more confused than the others that the letters 
were dot-matrix, and it would have been a whole lot better with laser print.

But it's nice that tesseract was usable.  It's command line and fairly 
picky (works on .tif but not the same file named .tiff) to there's 
plenty of polishing for anyone looking for a project to contribute to.

Dave


More information about the clue-tech mailing list