Más contenido relacionado La actualidad más candente (20) Más de Ijarcsee Journal (20) 6 111. ISSN: 2277 – 9043
International Journal of Advanced Research in Computer Science and Electronics Engineering
Volume 1, Issue 6, August 2012
Handwritten Script recognition using Soft
Computing
Akhilesh Pandey1, Sunita Singh2, Rajiv Kumar3, Amod Tiwari4
Abstract-Today, handwritten script recognition is reorganization is a complex text with following reasons-
challenging part in the computer science. It is complexity in preprocessing, complexity in feature
important to know a script used in writing. Script extraction, complexity in classification, sensitivity of the
recognitions have many important applications like scheme to the variation in handwritten text in documents
automatic transcription of multilingual documents, like font size, font style and document skew and the
searching document image, script sorting. Proposed performance of the scheme. Many researchers have been
work emphasis on the “block level technique” where done to solve handwritten Multi Script recognition
script recognition recognizes the script of the given problem in related areas such as Image Processing, Pattern
document in a mixture of various script documents. Recognition, Artificial Intelligence, and cognitive science
There has an important role of computational field like etc. Further researches are being done to improve
artificial intelligence, expect system. Feature extraction accuracy and efficiency. Recognition of Offline
technique is an important step in Script recognition. In Handwritten Multi Scripts is a goal of many research
this project, we have used combined approach of efforts in the pattern recognition field and A survey of
Discrete Cosine Transform (DCT) and discrete offline cursive script word recognition is presented in [1].
wavelets Transform (DWT) for feature extraction and The survey is classified into three section-in first
neural network (feed forward back propagation) introduction about automatic recognition of handwriting
classifier for classification and recognition purpose. and official regional scripts in India. The nine regional
Human mind can easily trace handwritten script so scripts are contain and then categorized into four
there have we use Artificial intelligence in which we subgroups based on their similarity and evolution
use classifier neural network. The proposed system has information.OCR work is done on Indian scripts reported
been experimented on three handwritten scripts Hindi, in [2] in which contain a benchmark database. Many
English and Urdu. Our database contains 961 techniques have been applied for recognition of
handwritten samples, written in three scripts. Every handwritten Multi Scripts but still it is the case of less
script (Hindi, English and Urdu) contains 320 samples efficiency and accuracy of recognition. Artificial
(160 samples are written in small font and another 160 Intelligence concepts like neural networks are used to
samples are in large font). perform the work as human mind can do. This explores
the idea of how humans recognize text in general and are
Keywords: Multi-script documents, handwritten script, used to develop machines that simulated this process.
Discrete Cosine Transform, Wavelets, neural network Developing these intelligent machines for recognizing
classifier. Multi Scripts is not an easy task; this is because a Multi
Script can be written in different ways. Also there are so
many imperfections and variation of handwriting such as
1. INTRODUCTION
alignment, noise and angles, which make handwritten
Multi Script recognition difficult to implement with a
Today, many researchers have been done to recognize machine. . Existing script identification depends on the
multi script recognition. But the problem of interchanging different feature extraction like DCT and DWT presented
data between human beings and computing machines is a in [3].the OCR technique is applied on the devanagri
challenging one. Even in present, many algorithms have script on [4] paper. In [5] paper metadata describing the
been proposed by many researchers so-that these multi text in paragraph, page and line level. Tools to extract
script (Hindi, English, Urdu) can be easily recognize. But paragraphs from pages, segment paragraphs into lines
the efficiency of these algorithms is not satisfactory. have also been developed. two approaches for Amharic
Multi-script document is a document in that contains text word recognition in unconstrained handwritten text using
information in more than one script. Handwritten script HMMs describe in [6].in which first approach builds word
6
All Rights Reserved © 2012 IJARCSEE
2. ISSN: 2277 – 9043
International Journal of Advanced Research in Computer Science and Electronics Engineering
Volume 1, Issue 6, August 2012
models from combined features of constituent characters such as the direction, speed and the order of strokes of the
and in the handwriting.
second method HMMs of constituent characters are
concatenated to form word model. In [7] paper offline A. Handwritten Multi Script Recognition
arbiclFarsi handwritten recognition algorithm on a subset
Handwritten Multi Script Recognition (HMSR) is an area
of Farsi name is proposed. There have use RBF neural
of pattern recognition that has been the subject of
network and combination of GA and K-Means clustering
considerable research since last some decades. There are
algorithm. The [8] paper is works on street name
too many applications in Indian offices such as bank,
recognition on Indian language. we know that some street
sales-tax, railway, etc. are used English, Hindi and Urdu
name contain two or more than words so it is concatenate
languages. Many forms and applications are filled in
that’s word and create in a single word. Hence, in this
these languages and sometimes those forms have to be
paper, we present a multiple feature based approach that
scanned directly. If there is no standard HMSR system,
combines Discrete Cosine Transform (DCT) and Wavelet
then image is directly captivated and there is no option
based frequency contents for three Indian scripts including
for editing those documents. Handwritten script
English, Hindi and Urdu. The classification is done using
recognition (HSR) is a process of automatic computer
feed forward back propagation neural network classifier.
recognition of scripts in optically scanned and digitized
The experiments are carried out on the database at block
pages of text. The main objective of an HMSR system is
level.
to recognize multi script, which are in the form of digital
images, without any human intervention. This is done by
II. BACKGROUND INFORMATION
searching a match between the features extracted from the
given script’s image and the library of image models.
Multi Script recognition is a process, which associates
various script objects (words) drawn on an image, i.e., B. Pre-processing
Multi Script recognition techniques associate a word
identity with the image of a Multi Script. Mainly, Multi In HMSR, typical preprocessing operations include
1. Binarization
Script recognition machine takes the raw data that further
implements the process of preprocessing of any 2. Noise reduction
recognition system. 3. Skew detection
The main objectives of Pre-processing methods are:-
On the basis of that data acquisition process, Script
recognition can be categorized into following two parts:
- In preprocessing technique we perform 2
1. Online Script Recognition operation
2. Offline Script Recognition
Off-line handwriting recognition refers to the process of Binarization:-transform colored image in to black
recognizing words that have been scanned from a paper & white image
and are stored digitally in grey scale format. After being img= im2double(rgb2gray(imread(’coins.png’)));
stored, it is conventional to perform further processing to
allow recognition scheme. In case of online handwritten Thinning:-Morphological operations on binary
script recognition, the handwriting is captured and stored images. Thinning is a morphological operation that is
in digital form via different means. Usually, a special pen used to remove selected foreground pixels from binary
is used in conjunction with an electronic surface. As the images.
pen moves across the surface, the two- dimensional img= bwmorph(img,'thin');
coordinates of successive points are represented as a
function of time and are stored in order [1]. It is generally After pre-processing phase, a cleaned image is available
accepted that the on-line method of recognizing that goes to the segmentation phase. The raw data,
handwritten text has achieved better results than its off- depending on the data acquisition type, is subjected to a
line counterpart. This may be attributed to the fact that number of preliminary processing steps to make it usable
more information may be captured in the on-line case in the descriptive stages of Script analysis. Preprocessing
aims to produce data that are easy for the HMSR system
to operate accurately.
7
All Rights Reserved © 2012 IJARCSEE
3. ISSN: 2277 – 9043
International Journal of Advanced Research in Computer Science and Electronics Engineering
Volume 1, Issue 6, August 2012
It is an operation that seeks to disintegrate an image of
sequence of Scripts into sub images of individual
symbols. The utility of conventional systems script
segmentation play the main requirement. Different
methods used can be classified based on the type of text
and strategy being followed like straight segmentation
method, recognition-based segmentation and cut
classification method. In order to achieve broad utility, it
is important that a segmentation method have the
following properties:
1. Capture perceptually important groupings, which often
ruminating global aspects of the image. Two central
issues those are provided precise scriptizations of what
are perceptually important, and to be able to specify what
a given segmentation technique does. There should be
precise definitions of the properties of a resulting
segmentation, in order to better understand the method as
well as to alleviate the comparison of different
approaches.
Figure 1: Block Diagram of Script Identification
2. In order to be of practical use, segmentation methods
that runs at several frames per second can be used in
video processing applications.
D. Feature extraction
Every Script has features, which play a big role in pattern
recognition. English, Hindi and Urdu Scripts have many
particular features. Feature extraction describes the
Figure 2: Script Sample of English Language
relevant shape information contained in a pattern so that
the task of classifying the pattern is made easy by a
formal procedure. Feature extraction stage in HMSR
system analyses these Script segment and selects a set of
features that can be used to uniquely identify in the script
segment. Mainly, this stage is heart of HMSR system
because the expected output depends on these features.
Figure 3: Script Sample of Hindi Language Feature extraction is the name given to a family of
procedures for measuring the relevant shape information
contained in a pattern so that the task of classifying the
pattern is made easy by a formal procedure. Among the
different design issues involved in building a recognizing
system, perhaps the most significant one is the selection
Figure 4: Script Sample of Urdu Language of set of features.
Feature extraction for exploratory data projection enables
high-dimensional data visualization for better data
structure understanding and for cluster analysis. In feature
extraction for classification, it is desirable to extract high
discriminative reduced-dimensionality features, which
Figure 5: Combined Sample of multi script reduce the classification computational requirements.
However, feature extraction criteria for exploratory data
C. Segmentation projection regularly aim to minimize an error function,
8
All Rights Reserved © 2012 IJARCSEE
4. ISSN: 2277 – 9043
International Journal of Advanced Research in Computer Science and Electronics Engineering
Volume 1, Issue 6, August 2012
such as the mean square error or the inter pattern distance
difference whereas feature extraction criteria for
classification aim to increase class reparability as possible
calculated for exploratory data projections are not
necessarily the optimum features of the image.
III. REPRESENTATION OF SCRIPT FEATURES
a. b.
After extracting the features, the data should be
represented in one of two ways, either as a boundary or
as a complete region. When the focus is on external shape
script such as corners and modulations then boundary
representation is appropriate. While regional
representation is appropriate when the focus is on internal
properties such as textures or skeleton shape. In some
applications like script recognition these representations
coexist, which often require algorithm based on boundary
shape as well as skeletons and other internal properties. c. d.
a. b. e. f.
c. d.
g.
Figure 7. a. Original Cropped Image of Hindi Script b.
Black & White Image c. Invert color d. Clear component
clear border e. Applying thinning f. DCT form g. DWT
form of Hindi Script
e. f.
a. b.
g.
Figure 6. a. Original Cropped Image of English Script
b. Black & White Image c. Invert color d. Clear
componenet clear border e. Applying thining f. DCT
form g. DWT form of English Script
c. d.
9
All Rights Reserved © 2012 IJARCSEE
5. ISSN: 2277 – 9043
International Journal of Advanced Research in Computer Science and Electronics Engineering
Volume 1, Issue 6, August 2012
Scripts No. of Train/test Recognition
samples result
Hindi 373
e. f.
English 369
Urdu 320 481/480 82.70%
Table 1. Result of Multiple classifiers
g.
Figure 8. a. Original Cropped Image of Urdu Script b.
Black & White Image c. Invert color d. Clear component
clear border e. Applying thinning f. DCT form g. DWT
form of Urdu Script
III. RESULTS
The sets of handwritten scripts are made. The data set
was partitioned into two parts. The first one is used for
training the system and the second one for testing. For
each script, features were computed and stored for
training the network. Three network layers, i.e. one input
layer, one hidden layer and one output layer are taken. If
number of neurons in the hidden layer is increased, then a
problem of allocation of required memory is occurred. By
that recognitions rate we find out the 82.70% accurate
result in all three script. Here we use 50-50 set for the
training and testing purpose.
Table 2. Confusion Matrix
REFERENCE [4] Jayadevan, R. Pune Inst. of Comput. Technol., Pune, India Kolhe,
S.R. ; Patil, P.M. ; Pal, U.,” Offline Recognition of Devanagari Script:
[1] Nabin Sharma. With Co-Authored with U. Pal, and R. Jayadevan, A Survey”, Volume: 41 , Issue: 6,Product Type: Journals &
”Handwriting recognition in Indian regional scripts: A survey of offline Magazines,2011.
techniques”
[5] AlKhateeb, J.H.,” A new approach for off-line handwritten Arabic
[2]ram sarkar, nibaran das, subhadip basu, mahantapas kundu, mita
word recognition using KNN classifier”, 18-19 Nov. 2009.
nasipuri and dipak kumar basu,” cmaterdb1: a database of unconstrained
handwritten bangla and bangla-english mixed script document image”,
[6] Assabie, Y.,’ HMM-Based Handwritten Amharic Word Recognition
international journal on document analysis and recognition Volume 15,
with Feature Concatenation”, Document Analysis and Recognition,
number 1 (2012), 71-83, doi: 10.1007/s10032-011-0148-6, 2012.
ICDAR '09. 10th International Conference, 2009.
[3] G. G. Rajput and Anita H. B.,” Handwritten Script Recognition using
[7] Bahmani, Z., Alamdar, F., Azmi, R., Haratizadeh, S.,” 8) Off-
DCT and Wavelet Features at Block Level”,2010.
line Arabic/Farsi handwritten word recognition using RBF neural
10
All Rights Reserved © 2012 IJARCSEE
6. ISSN: 2277 – 9043
International Journal of Advanced Research in Computer Science and Electronics Engineering
Volume 1, Issue 6, August 2012
network and genetic algorithm ”, Intelligent Computing and Intelligent [13] C. V. Lakshmi and C. Patvardhan, “A high accuracy OCR system
Systems (ICIS),IEEE International Conference on 2010. for printed Telugu text”, in the Proceedings of Conference on
Convergent Technologies for Asia-Pacific Region (TENCON 2003),
[8] Pal, U., Roy, R.K., Kimura, F.,” Handwritten street name recognition Vol. 2, pp. 725-729, 2003.
for Indian postal automation”, Document Analysis and Recognition
(ICDAR), International Conference on 2011. [14] Lei Han, Jue Zhong, Arkady Voloshin, Image analysis and data
processing of time series fringe pattern of PCBs by using moiré
[9] Liangrui Peng, Changsong Liu, Xiaoqing Ding, Hua Wang, interferometry,in: Proceedings of HDP’04, 2004, pp. 141–145.
"Multilingual document recognition research and its application in
China," dial, pp.126-132, Second International Conference on Document [15] Ping Zhong, Chenjie Song, Nian Luo, Method of extracting high-
Image Analysis for Libraries (DIAL'06), 2006. resolution digital moiré fringe in warpage measurement, Physical and
Failure Analysis of Integrated Circuits, IPFA, 2009, pp. 527–530.
[10] U. Pal and B. Chaudhuri. Automatic identification of English,
Chinese, Arabic, Devnagari and Bangla script line. In International [16] V. Ablavsky and M.R. Stevens, “Automatic Feature Selection with
Conference on Document Analysis and Recognition, pages 790-794, Applications to Script Identification of Degraded Documents,” Proc.
2001. Int’l Conf. Document Analysis & Recognition, Edinburgh, pp.750-754,
Aug. 2003.
[11]u.bhattacharya,T.K Das,A.Datta,S.K.Parui,B.B Chaudhuri,”A
hybrid scheme for hand printed numeral recognition based on a self- [17] [2] D.Dhanya, A.G Ramakrishnan and Peeta Basa pati, “Script
organizing network and MPL Classifiers,Int.J.Pattern Recognitoin identification in printed bilingual documents,” Sadhana, vol. 27, part-1,
Artificial Intelligence”.16(2002) 845-864. pp. 73-82, 2002.
[12] K. H. Aparna, V. Subramaniam, M. Kasirajan, G. V. Prakash, V. S.
Chakravarthy and S. Madhvanath, “Online handwrting recognition for
Tamil”, in the Proceedings of 9th International Workshop on Frontiers
in Handwriting Recognition(IWFHR), pp. 438-443, 2004.
AUTHORS PROFILE:
Akhilesh Pandey is an Asst. Professor in Technology degree in Information Technology from Bengal
department of computer science and engineering Engineering College, Shibpur(DU), West Bengal. His main
Shridhar University, Pilani. He did his MCA from interest area is Image Processing, Pattern recognition,Neural
IGNOU in 2002 and after that he worked as a faculty member Networks.
in different engineering college. After that he acquired his M.
Tech. (CSE) at Sharda University, Gr. Noida, India., His area Dr. Amod Tiwari acquired his Bachelor degree in Mathematics
of Interest is Pattern Recognition and neural network. and Science from CSJM Kanpur University Kanpur and master
degree in Computer Science and Engineering from Bilaspur
Sunita singh done her B.Tech. (CSE) from Lord
Central University Bilaspur (CG) in India. His Academic
Krishna College, Gaziyabad and done her M.tech
from sharda university, Gr. Noida, India. She is a excellence shines further with PhD in Computer Science and
member of our team and work on the MATLAB. Engineering from Indian Institute of Technology Kanpur with
Her programming is very excellent. Her area of interest is the awarded from UPTU Lucknow. His immense experience in
Image processing. working for reputed firm like LML Scooter India Ltd, Kanpur,
at senior level more than two years. He has been associated with
Rajiv Kumar is an Assistant Professor at School
Indian Institute of Technology Kanpur from 2005 to 2010. He is
of Engineering & Technology, Sharda University,
Greater Noida,India. He acquired his Master of currently working as Associate professor in the department of
Computer Science and Engineering PSIT Kanpur. Dr. Tiwari
has more than 37 Publications in his credit.
11
All Rights Reserved © 2012 IJARCSEE