NearOCR

NearOCR splash image.

About

A screenshot of NearOCR.

NearOCR is an OCR program based on NearNeural, and was written as a first-year project in Java. It uses a series of sensor vectors to learn the ideal shape of letters from a training corpus before comparing those identified in a document against this gold standard using the neural net (and a covariance measure).

The code is relatively clean, but since I was only young and impulsive it might not be the highest quality work ever. The fact that it's a pure-Java open source OCR tool and library probably makes it best suited to use as an educational tool.

Download

Download NearOCR-0.1.tar.gz.

Documentation & Use

The download comes with a series of scripts, compile, document and run. Run these for their respective functions!

When it comes to using the tool, the following is a rough outline:

  1. Load or create a net
  2. Load a series of symbols
  3. Train the net against the symbols and optionally save these weightings
  4. Load a document
  5. Slice the document into letters
  6. Run the recognition (the bars on the side show the weightings of neural net, subtractive similarity and covariance used)
  7. Look at the output ;-)

There is a help file in the help/ folder, which can be opened using anything that supports ODF, as well as a report in html format, written for the project.

The OCR and neural net parts are both libraries, used by a front-end written in Swing. The javadoc is here.