Re: NTLK Jot v Calligrapher v Newton

From: J Moylan (moylan@cutterjohn.net)
Date: Wed Jun 07 2000 - 22:24:38 CDT


Handwritten character recognition: remember that character recognition is
NOT an exact science. Recognition depends upon an algorith that return a
number of probable results for each character which may then be applied as
an aggregate against a lexicon(or dictionary) if available to further refine
the overall recognition probasbility.

The problem with connected character recognition is mainly in the area of
image processing and reducing the written text to individual
components/characters which can be extrememly difficult without proper
training. (A neural network based algorithm is generall used which is why
many connected component recognition engines have a training subprogram that
allows for dynamic recalculation of weighted values.) In any case the
initial separation/segmentation of the text is problematic, and prone to a
certain percentage of errors. Hence, the recognition engine jokes.
Paragraph's Caligrapher is one of the best commercially produced engines
that I have ever seen that works well with little or no end user training.
(I suspect that a large portion of their development time was spent in
training/tweaking the engine.)

Recognition of printed or separately written characters on the other hand is
a relatively trivial task, and should be able to easily achieve near
perfection with almost anyone's printing assuming relatively neat/separated
creation of characters. Most errors in this case will be those of
spacing/formatting.

(I worked for several years on a small research group at a large university
developing a connected character recognition engine for the USPS. We ended
up achieving a 99.5% recognition rate, 2% rejection rate on 100 DPI images.
(too "dirty" for any chance of accurate recognition or items that returned
too low of a recognition probability score.) The original algorithm was
developed for 300DPI images, and, was, mainly trained upon samples provided
by the USPS. (address blocks of packages/letters are scanned at all major
letter processing centers in the country as separated character/machine
printed recognition has been in operation for over a decade.) Additional
caveats: focused only on english text and relied upon a limited lexicon as
pre-determined by the USPS DPF database of addresses/delivery routes in the
US.)

In short the performance of Paragraph's Caligrapher engine is no small
achievement out of the box, based upon the widely varying input that the
engine could expect and the results that it provides, which I find to be
more than adequate. Printed text, and foreign text Cyrillic, Japanese,
Chinese, Korean, etc. should have near perfect recognition rates out of the
box, excepting for the most sloppy. (i.e. I refute entirely the need to
learn a special alphabet to allow a recognition engine to function
adequately/acceptably.)

(don't even get into voice recognition as this is even more trivial,
reducing the problem set to an, essentially, single dimensional problem...)

JTM

***************************************
NewtonTalk brought to you by:

EVOTE.COM -- the ESPN of politics on the Internet! All the players, all the news, and the hottest analysis and features (plus 'toons!) anywhere.... visit http://www.evote.com today!

***************************************
Need Subscribe/Unsubscribe info?

Visit the NewtonTalk section at http://www.planetnewton.com



This archive was generated by hypermail 2b29 : Sat Jul 01 2000 - 00:00:05 CDT