SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
International Association of Scientific Innovation and Research (IASIR) 
(An Association Unifying the Sciences, Engineering, and Applied Research) 
(An Association Unifying the Sciences, Engineering, and Applied Research) 
International Journal of Emerging Technologies in Computational 
and Applied Sciences (IJETCAS) 
www.iasir.net 
IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 280 
ISSN (Print): 2279-0047 
ISSN (Online): 2279-0055 
A Script Recognizer Independent Bi-lingual Character Recognition System for Printed English and Kannada Documents 
N. Shobha Rani1, Deepika B.D.2, Pavan Kumar S.3 
Department of Computer Science 
Amrita Vishwa Vidyapeetham, Mysore Campus 
Bogadi, Mysore 
INDIA 
_____________________________________________________________________________________ 
Abstract: Recognition of text document images is the inclination of any optical character recognition systems. This paper aims at extending the functionality of optical character recognition system to recognize more than one language. At present optical character recognition technologies are able to recognize and translate only one language, however multi-lingual recognition capabilities for OCR are accomplished through incorporation of script recognizer. This paper eliminates the need of identifying the script type and achieves the automatic recognition of two different scripts with single optical character recognition system, which we are representing as bilingual OCR. Bilingual OCR recognizes the text document images composed of both English and Kannada scripts. The construction of bilingual OCR for English and Kannada is achieved by employing efficient constructs like multiple projection profiles, connected component analysis and principal component analysis. The devised system is proved to be effective and reliable by claiming around 95%-100% accuracy. 
Keywords: Bilingual document, script recognizer, principal component analysis (PCA), Optical Character Recognition (OCR), bilingual OCR. 
______________________________________________________________________________________ 
I. Introduction 
OCR is software that recognizes characters by exploiting its structural/visual characteristics on the basis of script and represents the same in readable character format. The development of an efficient OCR is an interesting and challenging research area in the field of Pattern Recognition and Image processing for its usefulness since 1950. During 1960s and 1970s numerous OCRs are developed and sprang up in retail businesses, banks, hospitals, post offices, insurance, railways and aircraft companies, news paper publishers and many other industries to meet needs of different regional linguistic individuals. In a multi-lingual country like India there are numerous instances to design documents containing printed English characters as well as regional languages of many different states. 
In a multi-lingual country like India the existence of documents containing two different scripts are very popular and highly used. Some of the application requirements are creation of language learning books in digital libraries, processing of invoices, applications, forms, bank cheques, sorting of mails and magazines etc which are related to any Govt/Private organizations. The above factors implies that there is an increasing demand for recognition of bilingual documents through an efficient bilingual OCR 
Processing of documents plays very significant role in the country. Since eighteen official languages are in use, every government office uses at least two languages, English and the official language of the corresponding state respectively. 
The official language of the state Karnataka is Kannada. This system can interpret the Kannada and English words in question papers, Newspapers, Magazines, Books, Application forms, Railway Reservation forms, many national organizations such as Banks. However, most of the documents in the government offices of the state Karnataka adapt the languages English and Kannada. The proposed system considerably cuts down the efforts and saves time needed to process document images via Bi-Lingual OCR instead of using two different OCR’s. 
II. Literature Survey 
Researches in the area of uni-lingual optical character recognition system are considerably wide and almost successful. Some of the experimentations in the area of script recognition and optical character recognition systems are has reviewed below. 
Sanghamitra Mohanty et. al[1] has proposed an approach for the processing of printed documents containing both English and Oriya texts. The method works by taking into consideration the paragraph wise or line wise features of text. Sanghamitra Mohanty et. al [2] has propped an approach of distinguishing script for bilingual OCR for Oriya and Roman by employing horizontal projection profile features. Even though the method can efficiently distinguishes two types of scripts, it still requires the use of different OCR’s for each language to process the data. Rahiman M.A. et. al [3] had presented a bilingual OCR system for printed Malayalam and English text using
N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, 
pp. 280-285 
IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 281 
bitmap obtained through segmentation of document line wise and character wise. The comparison is done using pixel-match algorithm. The matched character is displayed in the notepad. An efficiency of 87.25% is obtained using this approach. Still the method requires many performance issues to be experimented further. D. Dhanya et al. [4] had devised a optimal feature extraction techniques for distinguishing Tamil and Roman by incorporating techniques like structural and geometrical features, DCT based features, wavelet transform based features etc. S. Basava Raju Patil [5] had implemented Neural Network based Bilingual OCR system which can read printed document images, written in two scripts of English and Kannada languages. Dynamic feature extractor extracts distinctive equal number of features from each separated word irrespective of size of the word. These features are accepted by probabilistic neural classifier and are sorted by script, Kannada and Roman. 
All the above approaches discussed has been contributing towards script recognition and the experimentation by [5] based on neural networks proves to be effective in processing bilingual documents with bilingual OCR, but still the approach encourage the need of script recognition. The proposed system focuses on efficient feature extraction technique that can distinguish each and every component of image and processes the bilingual documents through bilingual OCR. 
III. Proposed Methodology 
The proposed bilingual OCR system for English and Kannada is composed of six phases like Pre Processing, Segmentation, Feature extraction, Classification, Recognition and Post-processing. Any OCR system begins by reading a scanned document image as input and produces an output which is in editable document format. The architecture of the bilingual OCR is depicted in the Figure 1. 
Figure 1: Architecture of Bi-Lingual OCR 
In general any bi-lingual OCR systems are incorporated with the script recognizer in order to distinguish the types of scripts and then script wise processing will be performed with separate OCR’s of that particular script. In the proposed system the computational complexities involved in script identification is eliminated and an effective feature extraction, classification and post-processing is performed by employing techniques like principal component analysis and template matching techniques. The document processing initiates with pre- processing, since the performance of any recognition system (OCR systems) depends on the detailed analysis of pre-processing and segmentation operations. Bi-lingual OCR system acquires a scanned image as an input of any legal image formats such as JPEG, BMP, PIX etc. 
A. Pre-processing 
Data pre-processing describes any type of processing performed on raw data to prepare it for another processing procedure. Hence, pre-processing is the preliminary step which transforms the data into a format that will be more easily and effectively processed. Pre-processing activity involves representation, noise reduction, binarization, skew estimation/detection, zoning, character segmentation. Therefore, the main task in pre-processing the captured data is to decrease the variation that causes a reduction in the recognition rate and increases the complexities, as for example, 
Pre-processing of the input raw stroke of characters is crucial for the success of efficient character recognition systems. Thus, pre-processing is an essential stage prior to feature extraction since it controls the suitability of the results for the successive stages. The stages in a pattern recognition system are in a pipeline fashion meaning that each stage depends on the success of the previous stage in order to produce optimal/valid results.
N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, 
pp. 280-285 
IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 282 
B. Segmentation 
The segmentation is considered to be one of the very important as well as crucial phases of any optical character recognition system. In particular for South Indian language like Kannada the segmentation is a non trivial aspect. The language like Kannada complicates the process of segmentation due to its typical structure and its consonant as well as vowel modifier group may gives rise to widest collection of compound or connected characters. In the proposed methodology a hybrid approach is devised by incorporating traditional constructs of segmentation like projection profiles, XY-cut analysis and connected component analysis. Segmentation in the proposed system is comprised of two phases. In the first stage, mathematical morphology technique is used for constructing bridge between the components. The morphological operations avoid the intrusions that ensue during the recognition of a character. The second stage is the core process of the segmentation stage. 
The hybrid approach designed can handle the isolated text, connected components, overlapping lines/characters, broken characters and touching characters. Initially the pre-processed image from phase one of segmentation is subject to the line segmentation process using connected component analysis in line wise. The result of segmentation from original image is as represented in figure 2a and figure 2b. 
Figure 2a: Binarized Image Figure 2b: Line Images Extracted 
The figure 2a depicts the original image and figure 2b indicates the lines extracted from the binarized image. The segmentation algorithm automatically extracts all the line segments and stores each line as separate image. The line images extracted are normalized to a fixed size using interpolation techniques. Then the vertical projection profile of each line image is analyzed to perform the word segmentation. The proposed system is also able to deal with touching characters upto some extent. The document considered for touching character segmentation is shown in figure 3a figure 3b and figure 3c. 
Figure 3a: Original Image with few touching components 
Figure 3b: A Binarized Image corresponding to Figure 3a 
Figure 3c: Lines Extracted from figure 3b 
C. Character Segmentation 
The character segmentation is concerned with extraction of individual character components from segmented word images. The extracted word is divided into two zones as upper zone and bottom zone as depicted in figure 4a.
N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, 
pp. 280-285 
IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 283 
Figure 4a: The upper zone and bottom zone of a segmented word block 
In the proposed methodology the segmented words are subject to connected component analysis [10] initially. The bounding boxes are used to enclose the connected characters with respect to the assumed width of various characters. The width of characters is defined from a knowledge base. The bounding boxes are assigned separately for upper zone and bottom zone. First the characters which come in the upper zone are assigned with bounding boxes and the maximum character width in the upper zone is 13, which is equal to number of columns in the character segment considered. The bounding box of length greater than 13 is considered to be the touching characters and which will be handled through water reservoir principle [14]. 
The maximum character width in the bottom zone is fixed to 7, which is inferred from knowledge base. Since, the proposed system is concentrating on the printed characters; the width of printed characters determined from knowledge base tends to be accurate and reliable. The printed English text blocks extracted will also be segmented in the same procedure as the Kannada character segmentation is carried out, but the bottom zone of English word block in more than 98% of cases does not contain any connected components, since it is the printed text considered here. In a similar fashion rest of the segmented word blocks will be processed iteratively and vice versa. The results obtained through segmentation algorithm are presented in figure 4b. 
Figure 4b: Few Segmented character Images corresponding to line 1 and line 2 
D. Feature Extraction 
Feature extraction is an integral part of any recognition system. The aim of feature extraction is to describe the pattern by means of a minimum number of features or attributes that are effective in discriminating among pattern classes. The accuracy of feature extraction is depending upon the way we segment the characters in the document. In the proposed system, Principal components analysis (PCA) [15] features are extracted from each segmented character block to uniquely identify the characters in both English and Kannada. PCA is a linear transformation that chooses a new coordinate system for the data set such that the greatest variance by any projection of the data set comes to lie on the first axis (then called the first principal component), the second greatest variance on the second axis, and so on. PCA can be used for reducing dimensionality in a dataset while retaining those characteristics of the dataset that contribute most to its variance by eliminating the later principal components (by a more or less heuristic decision). These characteristics may be the "most important", but this is not necessarily the case, depending on the application. 
A covariance matrix for the matrix M is defined as (x - x')*(y - y'), and can be represented formally as C = M*MT. The covariance matrix C is then used to compute the matrix eigenvectors. 
Assuming zero empirical mean (the empirical mean of the distribution has been subtracted from the data set), the principal component wi of a dataset x can be calculate by finding the eigen values and eigenvectors of the covariance matrix of x, we find that the eigenvectors with the largest eigen values correspond to the dimensions that have the strongest correlation in the dataset. The original measurements are finally projected onto the reduced vector space. The variance features of PCA makes it possible to deal with both the type of scripts to identify the various alphabetic sets uniquely. Thus PCA is good alternative for differentiating the two types of scripts in the proposed system. 
E. Classification and Post-Processing 
The classification stage is the decision making stage of the recognition system. The extracted features are given as the input to the classification process. In the proposed system the PCA features of 2346 character samples are trained including with both English and Kannada font styles and font sizes. The covariance features of extracted
N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, 
pp. 280-285 
IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 284 
in feature extraction process are compared with the trained set of features in 2346 classes defined. Once a class matched with test character features, It is immediately post processed. In post processing the Unicode of the corresponding character class matched will printed onto the Microsoft word processor as represented in figure 5e and figure 5f. 
IV. Experimental Results and Discussion 
An user friendly graphical user interface has been designed using MATLAB and made accessible to all types of users in a simple and comprehensive way. We have conducted three different experiments. The first one is to test only the printed English document images. The second one is to test only the printed Kannada document images. Finally, a bilingual printed document image containing both English and Kannada are subject to experimentation process. The graphical user interface and the results of experimentations are as represented in figure 5a, figure 5b, figure 5c, figure 5d, figure 5e and figure 5f as follows. 
Figure 5a: The GUI loaded with English document Figure 5b: The editable document output of fig. 5a 
image 
Figure 5c: The Kannada Document image loaded Figure 5d: The editable document output of fig. 5c 
Figure 5e: The Bilingual document input Figure 5f: The editable document output of fig. 5e 
The figures indicate that the results obtained are quite good and encouraging. The Table 1 shows recognition accuracy for independent input of printed English and Kannada documents and for the mixed printed Kannada and English documents respectively.
N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, 
pp. 280-285 
IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 285 
The data set of more than 2000 character samples images are collected and trained to our database using template matching method with respect to principal component analysis features. The recognition accuracy of 95.25% - 97.05% was achieved and the results appeared to be encouraging in most of the cases with respect to both printed bilingual document images of English and Kannada respectively. The Table 1 shows recognition accuracy for independent input of printed English and Kannada documents and for the mixed printed Kannada and English documents respectively. 
Document Type 
Recognition Accuracy 
English only 
100% 
Kannada only 
98% 
English and Kannada 
95-97% 
V. Conclusion and Future Scope 
By employing the concepts of image processing and MAT LAB it’s possible to design a system which could identify the different scripts used in a document which contain different scripts. In general, when a Bilingual script document is to be processed, the respective language OCR’s are to be used. But the proposed system effectively eliminates the need of using the script recognizer and produces reliable results for certain types of font styles and sizes. The output in post processing of our proposed system is displayed in Microsoft word processor, which quite interesting and challenging to test all the unicodes of more than 2000 classes including with English and Kannada. 
Along with the reasonable set of advantages, even there are some limitations that are associated with our proposed system. The first limitation is the complete system works for only certain font sizes and font styles that we have trained to the system, however the future work focuses on extending the same to font style/ size independent bilingual OCR system. The second limitation to be focused in future is errors of the segmentation process in case of many number of touching components, if document contains only touching characters the proposed system fails to work. 
VI. References 
[1] Sanghamitra Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera, "An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents," icapr, pp.398-401, 2009 Seventh International Conference on Advances in Pattern Recognition, 2009. 
[2] Sanghamitra Mohanty, Himadri Nandini Das Bebartta, “A Novel Approach for Bilingual (English - Oriya) Script Identification and Recognition in a Printed Document” International Journal of Image Processing (IJIP), Volume (4): Issue 2. 
[3] Rahiman, M.A, Adheena, C.V., Anitha, R., Deepa, N., “Bilingual OCR system for printed documents in Malayalam and English”, Electronics Computer Technology (ICECT), 2011 3rd International Conference Volume 3. 
[4] D. Dhanya, A. G. Ramakrishnan, “An Optimal feature extractor for bilingual OCR”, department of Electrical engineering, Indian Institute of Science, Springer, pp 25-36, 2002. 
[5] S. Basava Raju Patil, “Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents”, International Journal of Computer Applications (0975 – 8887) Volume 13– No.8, January 2011. 
[6] U. Pal, B. B. Choudhuri, “Script line separation from Indian multi-Script documents,” Proc. of fifth Intl. Conf. on Document Analysis and Recognition (IEEE computer society press), pp. 406-409, 1999. 
[7] Santanu Choudhury, Gaurav Harit, Shekar Madnani, R.B. Shet, “Identification of Scripts of Indian Languages by Combining Trainable Classifiers” ICVGIP, Bangalore, India, Dec.20-22, 2000. 
[8] S. Chanda, U. Pal, “English, Devanagari and Urdu Text Identification,”Proc. Intl. Conf. on Document Analysis and Recognition, pp. 538-545. 
[9] B.V. Dhandral, Mallikarjun Hangarge', Ravindra Hegadil and V.S. Malemathl IEEE “Word Level Script Identification in Bilingual Documents through Discriminating Features”, ICSCN 2007 Chennai,India, pp.630-635, Feb. 2007. 
[10] P. A. Vijaya, M. C. Padma, “Text line identification from a multilingual document,” Proc. of Intl.Conf. on digital image processing (ICDIP2009) Bangkok, pp. 302-305, March 2009. 
[11] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi V.S Malemath. “Word-wise Script Identification from Bilingual Documents Based on Morphological Reconstruction” PG department of study and research in computer science Gulbarg University, Gulbarg, INDIA. 
[12] Prakash K. Aithal, Rajesh G., Dinesh U. Acharya, Krishnamoorthi M. Subbareddy N. V. “Text Line Script Identification for Tri- lingual Document” Manipal Institute of Technology Manipal, Karnataka,INDIA. 
[13] [13] M.C. Padma and P. A. Vijaya, “Identification and separation of Text words of Kannada, Telugu, Tamil, Hindi and English languages through visual discriminating features,” Proc. of Intl. conf. on Advances in Computer Vision and Information Technology(ACVIT-2007),Aurangabad, India, pp. 
[14] U. Pal1, A. Belaïd, C. Choisy, “Water Reservoir Based Approach for Touching Numeral Segmentation”, Indian Statistical Institute, 203 B. T. Road, Calcutta-35, India. 
[15] Mirosław Miciak, “Character Recognition Using Radon Transformation and Principal Component Analysis in Postal Applications”, Proceedings of the International Multi conference on Computer Science and Information Technology, pp. 495 – 500 ISBN 978-83-60810-14-9, ISSN 1896-7094. D. Kornack and P. Rakic, “Cell Proliferation without Neurogenesis in Adult Primate Neocortex,” Science, vol. 294, Dec. 2001, pp. 2127-2130, doi:10.1126/science.1065467.

Más contenido relacionado

La actualidad más candente

An Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptAn Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...CSCJournals
 
Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...eSAT Journals
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognitionVikas Dongre
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
 
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...IJERA Editor
 
IRJET- Image to Text Conversion using Tesseract
IRJET-  	  Image to Text Conversion using TesseractIRJET-  	  Image to Text Conversion using Tesseract
IRJET- Image to Text Conversion using TesseractIRJET Journal
 
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...IJCSEA Journal
 
Character recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neuralCharacter recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neuralIAEME Publication
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionDr. Syed Hassan Amin
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...ijaia
 
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKijaia
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyEr. Ashish Pandey
 
Signboard Text Translator: A Guide to Tourist
Signboard Text Translator: A Guide to TouristSignboard Text Translator: A Guide to Tourist
Signboard Text Translator: A Guide to TouristIJECEIAES
 

La actualidad más candente (19)

An Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptAn Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari Script
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
Hf3413291335
Hf3413291335Hf3413291335
Hf3413291335
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
 
Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognition
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
 
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
 
IRJET- Image to Text Conversion using Tesseract
IRJET-  	  Image to Text Conversion using TesseractIRJET-  	  Image to Text Conversion using Tesseract
IRJET- Image to Text Conversion using Tesseract
 
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
 
Character recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neuralCharacter recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neural
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq Recognition
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
 
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper Study
 
Signboard Text Translator: A Guide to Tourist
Signboard Text Translator: A Guide to TouristSignboard Text Translator: A Guide to Tourist
Signboard Text Translator: A Guide to Tourist
 
ocr
ocrocr
ocr
 

Similar a Ijetcas14 619

Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREOPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREIRJET Journal
 
Preprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphsPreprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphsIJECEIAES
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET-  	  A Novel Approach – Automatic paper evaluation systemIRJET-  	  A Novel Approach – Automatic paper evaluation system
IRJET- A Novel Approach – Automatic paper evaluation systemIRJET Journal
 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRIRJET Journal
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET Journal
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNAM Publications
 
Product Label Reading System for visually challenged people
Product Label Reading System for visually challenged peopleProduct Label Reading System for visually challenged people
Product Label Reading System for visually challenged peopleIRJET Journal
 
Script identification from printed document images using statistical
Script identification from printed document images using statisticalScript identification from printed document images using statistical
Script identification from printed document images using statisticalIAEME Publication
 
IRJET- Neural Network based Script Recognition using Wavelet Features: An App...
IRJET- Neural Network based Script Recognition using Wavelet Features: An App...IRJET- Neural Network based Script Recognition using Wavelet Features: An App...
IRJET- Neural Network based Script Recognition using Wavelet Features: An App...IRJET Journal
 
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A ReviewText Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A ReviewIRJET Journal
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep LearningIRJET Journal
 
Corpus-based technique for improving Arabic OCR system
Corpus-based technique for improving Arabic OCR systemCorpus-based technique for improving Arabic OCR system
Corpus-based technique for improving Arabic OCR systemnooriasukmaningtyas
 
Dimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian DocumentsDimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian Documentsijait
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSijait
 
IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...
IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...
IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...IRJET Journal
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 

Similar a Ijetcas14 619 (20)

Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREOPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
 
Bj35343348
Bj35343348Bj35343348
Bj35343348
 
Preprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphsPreprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphs
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET-  	  A Novel Approach – Automatic paper evaluation systemIRJET-  	  A Novel Approach – Automatic paper evaluation system
IRJET- A Novel Approach – Automatic paper evaluation system
 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using Android
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNN
 
Product Label Reading System for visually challenged people
Product Label Reading System for visually challenged peopleProduct Label Reading System for visually challenged people
Product Label Reading System for visually challenged people
 
Script identification from printed document images using statistical
Script identification from printed document images using statisticalScript identification from printed document images using statistical
Script identification from printed document images using statistical
 
Telugu letters dataset and parallel deep convolutional neural network with a...
Telugu letters dataset and parallel deep convolutional neural  network with a...Telugu letters dataset and parallel deep convolutional neural  network with a...
Telugu letters dataset and parallel deep convolutional neural network with a...
 
IRJET- Neural Network based Script Recognition using Wavelet Features: An App...
IRJET- Neural Network based Script Recognition using Wavelet Features: An App...IRJET- Neural Network based Script Recognition using Wavelet Features: An App...
IRJET- Neural Network based Script Recognition using Wavelet Features: An App...
 
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A ReviewText Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep Learning
 
Corpus-based technique for improving Arabic OCR system
Corpus-based technique for improving Arabic OCR systemCorpus-based technique for improving Arabic OCR system
Corpus-based technique for improving Arabic OCR system
 
Dimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian DocumentsDimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian Documents
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
 
IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...
IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...
IRJET- Scandroid: A Machine Learning Approach for Understanding Handwritten N...
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 

Más de Iasir Journals (20)

ijetcas14 650
ijetcas14 650ijetcas14 650
ijetcas14 650
 
Ijetcas14 648
Ijetcas14 648Ijetcas14 648
Ijetcas14 648
 
Ijetcas14 647
Ijetcas14 647Ijetcas14 647
Ijetcas14 647
 
Ijetcas14 643
Ijetcas14 643Ijetcas14 643
Ijetcas14 643
 
Ijetcas14 641
Ijetcas14 641Ijetcas14 641
Ijetcas14 641
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
 
Ijetcas14 632
Ijetcas14 632Ijetcas14 632
Ijetcas14 632
 
Ijetcas14 624
Ijetcas14 624Ijetcas14 624
Ijetcas14 624
 
Ijetcas14 615
Ijetcas14 615Ijetcas14 615
Ijetcas14 615
 
Ijetcas14 608
Ijetcas14 608Ijetcas14 608
Ijetcas14 608
 
Ijetcas14 605
Ijetcas14 605Ijetcas14 605
Ijetcas14 605
 
Ijetcas14 604
Ijetcas14 604Ijetcas14 604
Ijetcas14 604
 
Ijetcas14 598
Ijetcas14 598Ijetcas14 598
Ijetcas14 598
 
Ijetcas14 594
Ijetcas14 594Ijetcas14 594
Ijetcas14 594
 
Ijetcas14 593
Ijetcas14 593Ijetcas14 593
Ijetcas14 593
 
Ijetcas14 591
Ijetcas14 591Ijetcas14 591
Ijetcas14 591
 
Ijetcas14 589
Ijetcas14 589Ijetcas14 589
Ijetcas14 589
 
Ijetcas14 585
Ijetcas14 585Ijetcas14 585
Ijetcas14 585
 
Ijetcas14 584
Ijetcas14 584Ijetcas14 584
Ijetcas14 584
 
Ijetcas14 583
Ijetcas14 583Ijetcas14 583
Ijetcas14 583
 

Último

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Último (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Ijetcas14 619

  • 1. International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 280 ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 A Script Recognizer Independent Bi-lingual Character Recognition System for Printed English and Kannada Documents N. Shobha Rani1, Deepika B.D.2, Pavan Kumar S.3 Department of Computer Science Amrita Vishwa Vidyapeetham, Mysore Campus Bogadi, Mysore INDIA _____________________________________________________________________________________ Abstract: Recognition of text document images is the inclination of any optical character recognition systems. This paper aims at extending the functionality of optical character recognition system to recognize more than one language. At present optical character recognition technologies are able to recognize and translate only one language, however multi-lingual recognition capabilities for OCR are accomplished through incorporation of script recognizer. This paper eliminates the need of identifying the script type and achieves the automatic recognition of two different scripts with single optical character recognition system, which we are representing as bilingual OCR. Bilingual OCR recognizes the text document images composed of both English and Kannada scripts. The construction of bilingual OCR for English and Kannada is achieved by employing efficient constructs like multiple projection profiles, connected component analysis and principal component analysis. The devised system is proved to be effective and reliable by claiming around 95%-100% accuracy. Keywords: Bilingual document, script recognizer, principal component analysis (PCA), Optical Character Recognition (OCR), bilingual OCR. ______________________________________________________________________________________ I. Introduction OCR is software that recognizes characters by exploiting its structural/visual characteristics on the basis of script and represents the same in readable character format. The development of an efficient OCR is an interesting and challenging research area in the field of Pattern Recognition and Image processing for its usefulness since 1950. During 1960s and 1970s numerous OCRs are developed and sprang up in retail businesses, banks, hospitals, post offices, insurance, railways and aircraft companies, news paper publishers and many other industries to meet needs of different regional linguistic individuals. In a multi-lingual country like India there are numerous instances to design documents containing printed English characters as well as regional languages of many different states. In a multi-lingual country like India the existence of documents containing two different scripts are very popular and highly used. Some of the application requirements are creation of language learning books in digital libraries, processing of invoices, applications, forms, bank cheques, sorting of mails and magazines etc which are related to any Govt/Private organizations. The above factors implies that there is an increasing demand for recognition of bilingual documents through an efficient bilingual OCR Processing of documents plays very significant role in the country. Since eighteen official languages are in use, every government office uses at least two languages, English and the official language of the corresponding state respectively. The official language of the state Karnataka is Kannada. This system can interpret the Kannada and English words in question papers, Newspapers, Magazines, Books, Application forms, Railway Reservation forms, many national organizations such as Banks. However, most of the documents in the government offices of the state Karnataka adapt the languages English and Kannada. The proposed system considerably cuts down the efforts and saves time needed to process document images via Bi-Lingual OCR instead of using two different OCR’s. II. Literature Survey Researches in the area of uni-lingual optical character recognition system are considerably wide and almost successful. Some of the experimentations in the area of script recognition and optical character recognition systems are has reviewed below. Sanghamitra Mohanty et. al[1] has proposed an approach for the processing of printed documents containing both English and Oriya texts. The method works by taking into consideration the paragraph wise or line wise features of text. Sanghamitra Mohanty et. al [2] has propped an approach of distinguishing script for bilingual OCR for Oriya and Roman by employing horizontal projection profile features. Even though the method can efficiently distinguishes two types of scripts, it still requires the use of different OCR’s for each language to process the data. Rahiman M.A. et. al [3] had presented a bilingual OCR system for printed Malayalam and English text using
  • 2. N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 280-285 IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 281 bitmap obtained through segmentation of document line wise and character wise. The comparison is done using pixel-match algorithm. The matched character is displayed in the notepad. An efficiency of 87.25% is obtained using this approach. Still the method requires many performance issues to be experimented further. D. Dhanya et al. [4] had devised a optimal feature extraction techniques for distinguishing Tamil and Roman by incorporating techniques like structural and geometrical features, DCT based features, wavelet transform based features etc. S. Basava Raju Patil [5] had implemented Neural Network based Bilingual OCR system which can read printed document images, written in two scripts of English and Kannada languages. Dynamic feature extractor extracts distinctive equal number of features from each separated word irrespective of size of the word. These features are accepted by probabilistic neural classifier and are sorted by script, Kannada and Roman. All the above approaches discussed has been contributing towards script recognition and the experimentation by [5] based on neural networks proves to be effective in processing bilingual documents with bilingual OCR, but still the approach encourage the need of script recognition. The proposed system focuses on efficient feature extraction technique that can distinguish each and every component of image and processes the bilingual documents through bilingual OCR. III. Proposed Methodology The proposed bilingual OCR system for English and Kannada is composed of six phases like Pre Processing, Segmentation, Feature extraction, Classification, Recognition and Post-processing. Any OCR system begins by reading a scanned document image as input and produces an output which is in editable document format. The architecture of the bilingual OCR is depicted in the Figure 1. Figure 1: Architecture of Bi-Lingual OCR In general any bi-lingual OCR systems are incorporated with the script recognizer in order to distinguish the types of scripts and then script wise processing will be performed with separate OCR’s of that particular script. In the proposed system the computational complexities involved in script identification is eliminated and an effective feature extraction, classification and post-processing is performed by employing techniques like principal component analysis and template matching techniques. The document processing initiates with pre- processing, since the performance of any recognition system (OCR systems) depends on the detailed analysis of pre-processing and segmentation operations. Bi-lingual OCR system acquires a scanned image as an input of any legal image formats such as JPEG, BMP, PIX etc. A. Pre-processing Data pre-processing describes any type of processing performed on raw data to prepare it for another processing procedure. Hence, pre-processing is the preliminary step which transforms the data into a format that will be more easily and effectively processed. Pre-processing activity involves representation, noise reduction, binarization, skew estimation/detection, zoning, character segmentation. Therefore, the main task in pre-processing the captured data is to decrease the variation that causes a reduction in the recognition rate and increases the complexities, as for example, Pre-processing of the input raw stroke of characters is crucial for the success of efficient character recognition systems. Thus, pre-processing is an essential stage prior to feature extraction since it controls the suitability of the results for the successive stages. The stages in a pattern recognition system are in a pipeline fashion meaning that each stage depends on the success of the previous stage in order to produce optimal/valid results.
  • 3. N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 280-285 IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 282 B. Segmentation The segmentation is considered to be one of the very important as well as crucial phases of any optical character recognition system. In particular for South Indian language like Kannada the segmentation is a non trivial aspect. The language like Kannada complicates the process of segmentation due to its typical structure and its consonant as well as vowel modifier group may gives rise to widest collection of compound or connected characters. In the proposed methodology a hybrid approach is devised by incorporating traditional constructs of segmentation like projection profiles, XY-cut analysis and connected component analysis. Segmentation in the proposed system is comprised of two phases. In the first stage, mathematical morphology technique is used for constructing bridge between the components. The morphological operations avoid the intrusions that ensue during the recognition of a character. The second stage is the core process of the segmentation stage. The hybrid approach designed can handle the isolated text, connected components, overlapping lines/characters, broken characters and touching characters. Initially the pre-processed image from phase one of segmentation is subject to the line segmentation process using connected component analysis in line wise. The result of segmentation from original image is as represented in figure 2a and figure 2b. Figure 2a: Binarized Image Figure 2b: Line Images Extracted The figure 2a depicts the original image and figure 2b indicates the lines extracted from the binarized image. The segmentation algorithm automatically extracts all the line segments and stores each line as separate image. The line images extracted are normalized to a fixed size using interpolation techniques. Then the vertical projection profile of each line image is analyzed to perform the word segmentation. The proposed system is also able to deal with touching characters upto some extent. The document considered for touching character segmentation is shown in figure 3a figure 3b and figure 3c. Figure 3a: Original Image with few touching components Figure 3b: A Binarized Image corresponding to Figure 3a Figure 3c: Lines Extracted from figure 3b C. Character Segmentation The character segmentation is concerned with extraction of individual character components from segmented word images. The extracted word is divided into two zones as upper zone and bottom zone as depicted in figure 4a.
  • 4. N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 280-285 IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 283 Figure 4a: The upper zone and bottom zone of a segmented word block In the proposed methodology the segmented words are subject to connected component analysis [10] initially. The bounding boxes are used to enclose the connected characters with respect to the assumed width of various characters. The width of characters is defined from a knowledge base. The bounding boxes are assigned separately for upper zone and bottom zone. First the characters which come in the upper zone are assigned with bounding boxes and the maximum character width in the upper zone is 13, which is equal to number of columns in the character segment considered. The bounding box of length greater than 13 is considered to be the touching characters and which will be handled through water reservoir principle [14]. The maximum character width in the bottom zone is fixed to 7, which is inferred from knowledge base. Since, the proposed system is concentrating on the printed characters; the width of printed characters determined from knowledge base tends to be accurate and reliable. The printed English text blocks extracted will also be segmented in the same procedure as the Kannada character segmentation is carried out, but the bottom zone of English word block in more than 98% of cases does not contain any connected components, since it is the printed text considered here. In a similar fashion rest of the segmented word blocks will be processed iteratively and vice versa. The results obtained through segmentation algorithm are presented in figure 4b. Figure 4b: Few Segmented character Images corresponding to line 1 and line 2 D. Feature Extraction Feature extraction is an integral part of any recognition system. The aim of feature extraction is to describe the pattern by means of a minimum number of features or attributes that are effective in discriminating among pattern classes. The accuracy of feature extraction is depending upon the way we segment the characters in the document. In the proposed system, Principal components analysis (PCA) [15] features are extracted from each segmented character block to uniquely identify the characters in both English and Kannada. PCA is a linear transformation that chooses a new coordinate system for the data set such that the greatest variance by any projection of the data set comes to lie on the first axis (then called the first principal component), the second greatest variance on the second axis, and so on. PCA can be used for reducing dimensionality in a dataset while retaining those characteristics of the dataset that contribute most to its variance by eliminating the later principal components (by a more or less heuristic decision). These characteristics may be the "most important", but this is not necessarily the case, depending on the application. A covariance matrix for the matrix M is defined as (x - x')*(y - y'), and can be represented formally as C = M*MT. The covariance matrix C is then used to compute the matrix eigenvectors. Assuming zero empirical mean (the empirical mean of the distribution has been subtracted from the data set), the principal component wi of a dataset x can be calculate by finding the eigen values and eigenvectors of the covariance matrix of x, we find that the eigenvectors with the largest eigen values correspond to the dimensions that have the strongest correlation in the dataset. The original measurements are finally projected onto the reduced vector space. The variance features of PCA makes it possible to deal with both the type of scripts to identify the various alphabetic sets uniquely. Thus PCA is good alternative for differentiating the two types of scripts in the proposed system. E. Classification and Post-Processing The classification stage is the decision making stage of the recognition system. The extracted features are given as the input to the classification process. In the proposed system the PCA features of 2346 character samples are trained including with both English and Kannada font styles and font sizes. The covariance features of extracted
  • 5. N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 280-285 IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 284 in feature extraction process are compared with the trained set of features in 2346 classes defined. Once a class matched with test character features, It is immediately post processed. In post processing the Unicode of the corresponding character class matched will printed onto the Microsoft word processor as represented in figure 5e and figure 5f. IV. Experimental Results and Discussion An user friendly graphical user interface has been designed using MATLAB and made accessible to all types of users in a simple and comprehensive way. We have conducted three different experiments. The first one is to test only the printed English document images. The second one is to test only the printed Kannada document images. Finally, a bilingual printed document image containing both English and Kannada are subject to experimentation process. The graphical user interface and the results of experimentations are as represented in figure 5a, figure 5b, figure 5c, figure 5d, figure 5e and figure 5f as follows. Figure 5a: The GUI loaded with English document Figure 5b: The editable document output of fig. 5a image Figure 5c: The Kannada Document image loaded Figure 5d: The editable document output of fig. 5c Figure 5e: The Bilingual document input Figure 5f: The editable document output of fig. 5e The figures indicate that the results obtained are quite good and encouraging. The Table 1 shows recognition accuracy for independent input of printed English and Kannada documents and for the mixed printed Kannada and English documents respectively.
  • 6. N. Shobha Rani et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 280-285 IJETCAS 14- 619; © 2014, IJETCAS All Rights Reserved Page 285 The data set of more than 2000 character samples images are collected and trained to our database using template matching method with respect to principal component analysis features. The recognition accuracy of 95.25% - 97.05% was achieved and the results appeared to be encouraging in most of the cases with respect to both printed bilingual document images of English and Kannada respectively. The Table 1 shows recognition accuracy for independent input of printed English and Kannada documents and for the mixed printed Kannada and English documents respectively. Document Type Recognition Accuracy English only 100% Kannada only 98% English and Kannada 95-97% V. Conclusion and Future Scope By employing the concepts of image processing and MAT LAB it’s possible to design a system which could identify the different scripts used in a document which contain different scripts. In general, when a Bilingual script document is to be processed, the respective language OCR’s are to be used. But the proposed system effectively eliminates the need of using the script recognizer and produces reliable results for certain types of font styles and sizes. The output in post processing of our proposed system is displayed in Microsoft word processor, which quite interesting and challenging to test all the unicodes of more than 2000 classes including with English and Kannada. Along with the reasonable set of advantages, even there are some limitations that are associated with our proposed system. The first limitation is the complete system works for only certain font sizes and font styles that we have trained to the system, however the future work focuses on extending the same to font style/ size independent bilingual OCR system. The second limitation to be focused in future is errors of the segmentation process in case of many number of touching components, if document contains only touching characters the proposed system fails to work. VI. References [1] Sanghamitra Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera, "An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents," icapr, pp.398-401, 2009 Seventh International Conference on Advances in Pattern Recognition, 2009. [2] Sanghamitra Mohanty, Himadri Nandini Das Bebartta, “A Novel Approach for Bilingual (English - Oriya) Script Identification and Recognition in a Printed Document” International Journal of Image Processing (IJIP), Volume (4): Issue 2. [3] Rahiman, M.A, Adheena, C.V., Anitha, R., Deepa, N., “Bilingual OCR system for printed documents in Malayalam and English”, Electronics Computer Technology (ICECT), 2011 3rd International Conference Volume 3. [4] D. Dhanya, A. G. Ramakrishnan, “An Optimal feature extractor for bilingual OCR”, department of Electrical engineering, Indian Institute of Science, Springer, pp 25-36, 2002. [5] S. Basava Raju Patil, “Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents”, International Journal of Computer Applications (0975 – 8887) Volume 13– No.8, January 2011. [6] U. Pal, B. B. Choudhuri, “Script line separation from Indian multi-Script documents,” Proc. of fifth Intl. Conf. on Document Analysis and Recognition (IEEE computer society press), pp. 406-409, 1999. [7] Santanu Choudhury, Gaurav Harit, Shekar Madnani, R.B. Shet, “Identification of Scripts of Indian Languages by Combining Trainable Classifiers” ICVGIP, Bangalore, India, Dec.20-22, 2000. [8] S. Chanda, U. Pal, “English, Devanagari and Urdu Text Identification,”Proc. Intl. Conf. on Document Analysis and Recognition, pp. 538-545. [9] B.V. Dhandral, Mallikarjun Hangarge', Ravindra Hegadil and V.S. Malemathl IEEE “Word Level Script Identification in Bilingual Documents through Discriminating Features”, ICSCN 2007 Chennai,India, pp.630-635, Feb. 2007. [10] P. A. Vijaya, M. C. Padma, “Text line identification from a multilingual document,” Proc. of Intl.Conf. on digital image processing (ICDIP2009) Bangkok, pp. 302-305, March 2009. [11] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi V.S Malemath. “Word-wise Script Identification from Bilingual Documents Based on Morphological Reconstruction” PG department of study and research in computer science Gulbarg University, Gulbarg, INDIA. [12] Prakash K. Aithal, Rajesh G., Dinesh U. Acharya, Krishnamoorthi M. Subbareddy N. V. “Text Line Script Identification for Tri- lingual Document” Manipal Institute of Technology Manipal, Karnataka,INDIA. [13] [13] M.C. Padma and P. A. Vijaya, “Identification and separation of Text words of Kannada, Telugu, Tamil, Hindi and English languages through visual discriminating features,” Proc. of Intl. conf. on Advances in Computer Vision and Information Technology(ACVIT-2007),Aurangabad, India, pp. [14] U. Pal1, A. Belaïd, C. Choisy, “Water Reservoir Based Approach for Touching Numeral Segmentation”, Indian Statistical Institute, 203 B. T. Road, Calcutta-35, India. [15] Mirosław Miciak, “Character Recognition Using Radon Transformation and Principal Component Analysis in Postal Applications”, Proceedings of the International Multi conference on Computer Science and Information Technology, pp. 495 – 500 ISBN 978-83-60810-14-9, ISSN 1896-7094. D. Kornack and P. Rakic, “Cell Proliferation without Neurogenesis in Adult Primate Neocortex,” Science, vol. 294, Dec. 2001, pp. 2127-2130, doi:10.1126/science.1065467.