9. Lexicon Driven (LDR) Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6 w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] 1 2 3 4 5 6 7 8 9 o[6.1]
10. Grapheme Models (LFR) Writer Specific Modeling Holistic Features grapheme pos orientation angle Down cusp 3.0 -90 o Up loop Down arc
11.
12. Interactive Models (LDR) Phrase Level T-crossings, loops, ascenders, descenders, length West Central Street West Main Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1 Lexicon 2 Lexicon 3 Interactive Model features image 2-way interaction
17. Fusion of Recognizers Type III LDR 5.6 7.4 … LFR .52 .81 … Identification task: Amherst Buffalo … Verification task: 5.6 .52 Amherst Question: if we find optimal and , is it necessarily ? Accept Reject
18.
19.
20.
21. Independence of Scores In a single trial Amherst 5.6 7.4 … Buffalo .52 .81 … LDR LFR … … . … .
22. Lexicon1 Lexicon i Lexicon N Independence of Scores In a single trial Recognizer 1 Recognizer M Tulyakov & Govindaraju, TIFS 2009 Independent? Dependent Dependent
23. Optimal Combination ? Correlated Scores Dependent on input signal Set size LFR LDR Both correct Either correct LR Weighted sum 54.8% 77.2% 48.9% 83.0% 69.8% 81.6% 6147 3366 4744 3005 5105 4293 5015 2 nd choice 3 rd choice 4 th choice Mean LFR .4359 .4755 .4771 .1145 LDR .7885 .7825 .7673 .5685
24. Optimal Trainable Combination Function Minimizing misclassification cost: Classify as rather than Assume that scores assigned to different classes are independent : Tulyakov & Govindaraju IJPRAI 2009
25. Combination Methods Identification Tasks No! Traditional Training mixes the genuine and imposter scores from different trials. Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer Score 2 Recognizer score 1
26. Combination Methods Identification Tasks Model Training MUST process scores from one identification trial as a single training sample . BRecognizer score 2 Recognizer score 1 Impostor Genuine Rexcognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Biometric score 1
27.
28.
29.
30.
31.
32. Topic Categorization Lexicon Reduction Lex Free Large Lexicon > 5K Handwritten Medical Documents ICR Features ~33% word Recognition rate (10 points gain) Topic Categorization Select Reduced Lexicon ~2.5K Lex Driven
½ min Good Afternoon: I am Venu Govindaraju, Professor at the University at Buffalo. The title of my talk today is “Paradigms in Handwriting Recognition”. This will be in the context of “English” language and the Roman alphabet. The idea is to see if some of the techniques that have proved successful in English are also applicable to Arabic or Chinese. This will be an overview style presentation: describing paradigms, applications, and accuracy figures.
In the postal application, we are able to operate in the Lexicon size 30 (average). When we do not have collateral information, how does one reduce the lexicon size.
1 min The problem of handwriting recognition has been typically defined as follows: - The inputs are: a bit-map image of the word to be recognized AND a lexicon of possible choices. The lexicon usually captures the context of the application at hand. When the lexicon is not provided by the application, it assumes the size of the entire English Dictionary or at least the words in common usage. In such cases, the lexicon can be of the size of tens of thousands of words. -The output is a ranked list of the lexical choices. The choices are often associated with a confidence score. In this talk, we will make the following 2 assumptions: that we are dealing with single words or short phrases of a few words. There has been a considerable body of work in recognition of entire sentences. An early paper on the topic was published by Kim, Govindaraju and Srihari in IJDAR 1997. Since the, several papers have been published on the topic most notably from Prof Suen’s group at Concordia and Prof. Bunke’s group in Switzerland. The second assumption is that we are dealing with offline handwriting recognition.
We are looking at the narrative text in the medical forms. We are using medical dictionaries. It can be seen that the techniques scale to other applications as well. We want develop a search engine for such medical forms where a health official could search the forms by querying with some medical terms. We demonstrated the method of keyword spotting at the demo session yesterday. We will now describe an alternate method of attempting full transcription- which is expected to be errorful- and see if search engines are still viable. The handwriting is sloppy- written in ambulances and other emergency scenarios. Abbreviations are freely used. Documents are in carbon copies and binarization itself is a challenge.- we presented this work at DAS 06. Lexicon Free recognition can pick up only a few characters in a each word with reasonable confidence. Lexicon driven- the lexicons will be greater than 5K for which the accuracy is in the 20s. What should we do?
One problem with cohesive phrases alone is that during the recognition phase we do not know the words. Therefore, we extract terms from these cohesive phrases to be used to model the category to which its associated. This is the basis for the hypothesis. For example [read slide]
The pseudo-category vector is then attached to the matrix of category column vectors.
Some more detail concerning the impact of ruled line removal on word recognition: We extracted all the test word images from lined pages and measured the top choice recognition performance. Here are the numbers: -- Total word images in test set : 848 from a total of 274 pages. Of these: -- Number of word images from pages with ruled lines: 460, from 146 lined pages. -- The ratio of words and pages with ruled lines in the 34 PAW data set: 460/848 = 54.25% (word), 146/274=53.28% (pages). Recognition performance on words from lined pages: -- Top1: Earlier: 318/460 = 69.13% Now: 349/460 = 75.87% The ruled line removal improves the word recognition for top 1 by 6.74% (evaluated on words from lined pages). Overall improvement for top 1 is by 4.13% (evaluated using test set including all word images from lined or non-lined pages - which we had reported earlier). Also the PAW recognizer is a straightforward implementation using a k-nearest neighbor classifier. The features used are CUBS Gradient, Structure and Concavity Features. The classifier is a very simple implementation that can be improved and its purpose was for testing the effectiveness of our features.
Digital libraries like the George Washington Papers collection at the Library of Congress consist of approximately 152,000 handwritten document images and associated transcripts. The Newton Project aims to make all of Newton's writings available online. The task of aligning the transcription with handwritten text in these libraries would enable one to automatically generate an immense database of word images which in turn can be used as truth data by word recognizers to create transcriptions for the remaining scanned documents. The tedious process of manually dragging a box around each word in an image and keying in the annotations could thus be avoided. In forensic document evaluations capturing characteristics specific to a writer are of paramount importance both in writer identification and writer verification. Thus if a mapping algorithm correctly maps word images to lexicon words during preprocessing the accuracy of writer recognition would improve remarkably. For existing scanned images the alignment enables one to build interfaces where the transcript text can be browsed alongside the manuscript.
Existing keyword spotting approaches can be classified into two categories: (a) Image based and (b) OCR based In image feature based indexing approaches, after preprocessing of document images and word segmentation, feature vectors are extracted from word images and stored in a database. When a user provides a query word, the similarity between the query and the word image in the database is computed, and word images are returned in the decreasing order of similarities. (b) In OCR based approaches, the indices are built from OCR scores such posterior probabilities or feature vector observational likelihoods (probability density) converted from distances returned by word recognizer.
Existing keyword spotting approaches can be classified into two categories: (a) Image based and (b) OCR based In image feature based indexing approaches, after preprocessing of document images and word segmentation, feature vectors are extracted from word images and stored in a database. When a user provides a query word, the similarity between the query and the word image in the database is computed, and word images are returned in the decreasing order of similarities. (b) In OCR based approaches, the indices are built from OCR scores such posterior probabilities or feature vector observational likelihoods (probability density) converted from distances returned by word recognizer.