[clue-tech] OCR.
David L. Anselmi
anselmi at anselmi.us
Sat Nov 28 18:50:38 MST 2009
I had a chance to try out Free OCR tools. tesseract seemed to be the
best, gocr and ocrad not so good. But...
First try the file tesseract made resembled the original, but barely.
The original was a dot-matrix print and scanned at 300dpi you could see
the individual dots. 150dpi was the sweet spot for that page and then
tesseract produced something accurate enough to be better than starting
from scratch.
I didn't try gocr or ocrad at 150dpi. gocr especially might have been
better--at 300dpi is produced a useless set of dots, commas, and what
not. So perhaps it was more confused than the others that the letters
were dot-matrix, and it would have been a whole lot better with laser print.
But it's nice that tesseract was usable. It's command line and fairly
picky (works on .tif but not the same file named .tiff) to there's
plenty of polishing for anyone looking for a project to contribute to.
Dave
More information about the clue-tech
mailing list