SlideShare a Scribd company logo
1 of 5
Download to read offline
ONLINE HANDWRITTEN SCRIPT RECOGNITION
                                    IEEE APPROACH
Automatic identification of handwritten script facilitates many important applications such as
automatic transcription of multilingual documents and search for documents on the Web containing
a particular script. The increase in usage of handheld devices which accept handwritten input has
created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data.
This project proposes a method to classify words and lines in an online handwritten document into
one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification
is based on 11 different spatial and temporal features extracted from the strokes of the words.
The proposed system attains an overall classification accuracy of 87.1 percent at the word level with
5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves
to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for
complete text lines consisting of an average of seven words.


OBJECTIVE
•   Automatic identification of handwritten script facilitates many important applications such as
    automatic transcription of multilingual documents and search for documents on the Web
    containing a particular script.
•   The increase in usage of handheld devices which accept handwritten input has created a
    growing demand for algorithms that can efficiently analyze and retrieve handwritten data.
•   This project proposes a method to classify words and lines in an online handwritten document
    into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman.


DESCRIPTION OF PROBLEM
The field of work on automatic script recognition deals with offline documents, i.e., documents which
are either handwritten or printed on a paper and then scanned to obtain a two-dimensional digital
representation. The existing method deals with script recognition in off-line.
Today we have lot of electronic devices and lot of input devices are available such as digital pen,
digital tablet, stylus through which user can input any type of script in side computer. Such devices,
which generate handwritten documents with online or dynamic (temporal) information, require
efficient algorithms for processing and retrieving handwritten data.
Online documents may be written in different languages and scripts. A single document page in
itself may contain text written in multiple scripts. So the project aims to develop efficient algorithm to
recognize the script.

Existing System
The existing method deals with languages are identified:
•   Using projection profiles of words and character shapes.
•   Using horizontal projection profiles and looking for the presence or absence of specific shapes
    in different scripts.
•   Existing method deals with only few characteristics.
•   Most of the method does this in off-line.

PROPOSED SYSTEM
The proposed method uses the features of connected components to classify six different scripts
(Arabic, Chinese, Cyrillic, Devnagari, Japanese, and Roman) and reported a classification accuracy
of 88 percent on document pages.
There are a few important aspects of online documents that enable us to process them in a
fundamentally different way than offline documents.
The most important characteristic of online documents is that they capture the temporal sequence of
strokes while writing the document. This allows us to analyze the individual strokes and use the
additional temporal information for both script identification as well as text recognition.
In the case of online documents, segmentation of foreground from the background is a relatively
simple task as the captured data, i.e., the (x; y) coordinates of the locus of the stylus, defines the
characters and any other point on the page belongs to the background.
We use stroke properties as well as the spatial and temporal information of a collection of strokes to
identify the script used in the document. Unfortunately, the temporal information also introduces
additional variability to the handwritten characters, which creates large intraclass variations of
strokes in each of the script classes.


Data collection and Preprocessing
The control for drawing that is writing the script is provided. User can choose the language and
write the script document with several pages and store inside script folder. For one language
various script can be written and stored. These documents are samples used for recognition
process. When the number of sample increases the error rate for recognition can be reduced.
So better keep as such script as possible before you are going for recognition.           During
preprocessing, the individual strokes are resampled to make the sampled points equidistant. This
helps to reduce the variations in scripts due to different writing speeds and to avoid anomalous
cases such as having a large number of samples at the same position when the user holds the pen
down at a point. The strokes are then smoothed using a Gaussian (lowpass) filter.
The x and y coordinates of the sequence of points are independently filtered by convolution with
one-dimensional Gaussian kernels. This reduces noise due to pen vibrations and errors in the
sensing mechanism.
The individual strokes are again resampled to make the points equidistant. The resampling
distances in both cases were set to 10 pixels and the standard deviation of the Gaussian was set to
6, both determined experimentally for the data. During the lowpass filtering and resampling
operations, the critical points in a stroke are retained.
A critical point is defined as a point in the stroke where the x or y direction of the stroke reverses
(changes sign), in addition to the extreme points (pen-up or pen-down) of the stroke. Fig. 3 shows
an example of preprocessing the online character a (consisting of a single stroke).




The character “r” written in two different styles. The offline representations ((a) and (b)) look similar,
but the temporal variations make them look very different ((c) and (d)).
The writing direction is indicated using arrows in (a) and (b). The vertical axes in (c) and (d)
represent time.
Each user was asked to write one page of text in a particular script, with each page containing
approximately 20 lines of text. No restriction was imposed on the content or style of writing (cursive
or handprint).
The writers consisted of college graduate students, professors, and employees in private companies
and businessmen. The details of the database used for this work. Multiple pages from some of the
writers were collected at different times.
Line and Word Detection
The data available to a script recognizer is usually a complete handwritten page or a subset of it. To
recognize the script of individual lines or words in the page, we first need to segment the page into
lines and words. The problem of text line identification in online documents has been attempted.
To identify the individual lines, first the interline distance is estimated. The interline distance, d, is
defined as the distance between successive peaks in the autocorrelation of the y-axis projection of
the text.The y-axis projection of the document and the autocorrelation of the projection.
The interline distance estimate is indicated as d. Lines are identified by finding valleys in the
projection, keeping the interline distance as a guiding factor. To avoid local minima, we choose only
those points, which have the smallest magnitude within a window of width d, as valleys.
The text-line boundaries identified for the document along with the y-axis projection. Once the line
boundaries are obtained, the text is divided into lines by collecting all the strokes which fall in
between two successive line boundaries.
The temporal information from stroke order is used to disambiguate strokes which fall across line
boundaries and to correctly group small strokes, which may fall into an adjacent line.
A word is defined as a set of strokes that overlap horizontally. The segmentation of a line into words
is done using an x-axis projection of the text in the line.
The valleys in the projection are noted as word boundaries and the strokes which fall between two
boundaries are collected and labeled as a word. The minimum width of a valley in the projection for
word segmentation was experimentally determined as n numbers pixels for the resolution of the
device used.


FEATURE EXTRACTION
Each sample or pattern that we attempt to classify is either a word or a set of contiguous words in a
line. It is helpful to study the general properties of each of the six scripts for feature extraction.
1.   Arabic: Arabic is written from right to left within a line and the lines are written from top to
     bottom. A typical Arabic character contains a relatively long main stroke which is drawn from
     right to left, along with one to three dots.
     The character set contains three long vowels. Short markings (diacritics) may be added to the
     main character to indicate short vowels. Due to these diacritical marks and the dots in the
     script, the length of the strokes vary considerably.
2.   Cyrillic: Cyrillic script looks very similar to the cursive Roman script. The most distinctive
     features of Cyrillic script, compared to Roman script are: 1) individual characters, connected
     together in a word, form one long stroke, and 2) the absence of delayed strokes. Delayed
     strokes cause movement of the pen in the direction opposite to the regular writing direction.
3.   Devnagari: The most important characteristic of Devnagari script is the horizontal line present
     at the top of each word, called “Shirorekha”. These lines are usually drawn after the word is
     written and hence are similar to delayed strokes in Roman script. The words are written from
     left to right in a line.
4.   Han: Characters of Han script are composed of multiple short strokes. The strokes are usually
     drawn from top to bottom and left to right within a character. The direction of writing of words in
     a line is either left to right or top to bottom. The database used in this study contains Han script
     text of the former type (horizontal text lines).
5.   Hebrew: Words in a line of Hebrew script are written from right to left and, hence, the script is
     temporally similar to Arabic. The most distinguishing factor of Hebrew from Arabic is that the
     strokes are more uniform in length in the former.
6.   Roman: Roman script has the same writing direction as Cyrillic, Devnagari, and Han scripts. We
     have already noted the distinguishing features of these scripts compared to the Roman script.
     In addition, the length of the strokes tends to fall between that of Devnagari and Cyrillic scripts.
     The features are extracted either from the individual strokes or from a collection of strokes.
     Here, we describe the features and their method of computation.
Horizontal Interstroke Direction (HID)
This is the sum of the horizontal directions between the starting points ofconsecutive strokes in the
pattern. The feature essentially captures the writing direction within a line.

Shirorekha Strength
This feature measures the strength of the horizontal line component in the pattern.

Average Stroke length
Average length of individual strokes in the pattern.

Shirorekha Confidence
We compute a confidence measure for a stroke being a Shirorekha Each stroke in the pattern is
inspected for three properties width of a word, always occur at the top of the word, and are
horizontal.

Recognition
The above mentioned statistics are calculated and the classifier is used to find which script is used
in the document.
In this project the classifier used is k-Nearest Neighbor classifier. It finds the distance between two
feature vectors was computed using the Euclidean distance metric.
The transformation was applied to every feature to make their mean zero, and variance unity, in the
training set. This transformation is referred to as normalization.
With help of this training set database, the experiments are performed and this classifier is invoked
to recognize the script.

System Analysis
System analysis can be defined, as a method that is determined to use the resources, machine in
the best manner and perform tasks to meet the information needs of an organization.

System Description
It is also a management technique that helps us in designing a new systems or improving an
existing system.
The four basic elements in the system analysis are
     •    Output
     •    Input
     •    Files
     •    Process
The above-mentioned are mentioned are the four basis of the System Analysis.

System Design
Design is concerned with identifying software components specifying relationships among
components. Specifying software structure and providing blue print for the document phase.
Modularity is one of the desirable properties of large systems. It implies that the system is divided
into several parts. In such a manner, the interaction between parts is minimal clearly specified.
Design will explain software components in detail. This will help the implementation of the system.
Moreover, this will guide the further changes in the system to satisfy the future requirements.

Form Design
Form is a tool with a message; it is the physical carrier of data or information. The user interface
form gives option for the user to select the language and user can write a new document, can open
an existing document to view or can able to delete.
If new document written it can be saved into the database.
HARDWARE & SOFTWARE SPECIFICATION
Processor    :    Intel Processor IV
RAM          :    128 MB
Hard disk    :    20 GB
CD drive     :    40 x Samsung
Floppy drive :    1.44 MB
Monitor      :    15’ color Monitor
Keyboard     :    108 mercury keyboard
Mouse        :    Logitech mouse


Software Specification
Operating System      –     Windows XP/2000
Language used         –     Java


                 ONLINE HANDWRITTEN SCRIPT RECOGNITION

System Flow Chart

More Related Content

Similar to Online Handwritten Script Recognition with 87.1% Accuracy

A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesiosrjce
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
 
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKijaia
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character RecognitionIOSR Journals
 
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...Editor IJCATR
 
An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...IJMER
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSIJCSEA Journal
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)IJCSEA Journal
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSIJCSEA Journal
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2IAEME Publication
 
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionPreprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionEditor IJCATR
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSijait
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONIJNLC Int.Jour on Natural Lang computing
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONkevig
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationCSCJournals
 
A study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognitionA study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognitionIJECEIAES
 

Similar to Online Handwritten Script Recognition with 87.1% Accuracy (20)

P01725105109
P01725105109P01725105109
P01725105109
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document images
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
 
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character Recognition
 
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
 
An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2
 
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionPreprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
 
A study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognitionA study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognition
 

More from ncct

Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological SignalsBiomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological Signalsncct
 
Digital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy DetectionDigital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy Detectionncct
 
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...ncct
 
Cockpit White Box
Cockpit White BoxCockpit White Box
Cockpit White Boxncct
 
Rail Track Inspector
Rail Track InspectorRail Track Inspector
Rail Track Inspectorncct
 
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...ncct
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detectorncct
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protectionncct
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammerncct
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1ncct
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projectsncct
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects Bncct
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009ncct
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009ncct
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...ncct
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects Bncct
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S Pncct
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C Tncct
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projectsncct
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computingncct
 

More from ncct (20)

Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological SignalsBiomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
 
Digital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy DetectionDigital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy Detection
 
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
 
Cockpit White Box
Cockpit White BoxCockpit White Box
Cockpit White Box
 
Rail Track Inspector
Rail Track InspectorRail Track Inspector
Rail Track Inspector
 
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detector
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protection
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammer
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projects
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects B
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects B
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S P
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C T
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projects
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computing
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Online Handwritten Script Recognition with 87.1% Accuracy

  • 1. ONLINE HANDWRITTEN SCRIPT RECOGNITION IEEE APPROACH Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. This project proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attains an overall classification accuracy of 87.1 percent at the word level with 5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for complete text lines consisting of an average of seven words. OBJECTIVE • Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. • The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. • This project proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. DESCRIPTION OF PROBLEM The field of work on automatic script recognition deals with offline documents, i.e., documents which are either handwritten or printed on a paper and then scanned to obtain a two-dimensional digital representation. The existing method deals with script recognition in off-line. Today we have lot of electronic devices and lot of input devices are available such as digital pen, digital tablet, stylus through which user can input any type of script in side computer. Such devices, which generate handwritten documents with online or dynamic (temporal) information, require efficient algorithms for processing and retrieving handwritten data. Online documents may be written in different languages and scripts. A single document page in itself may contain text written in multiple scripts. So the project aims to develop efficient algorithm to recognize the script. Existing System The existing method deals with languages are identified: • Using projection profiles of words and character shapes. • Using horizontal projection profiles and looking for the presence or absence of specific shapes in different scripts. • Existing method deals with only few characteristics. • Most of the method does this in off-line. PROPOSED SYSTEM The proposed method uses the features of connected components to classify six different scripts (Arabic, Chinese, Cyrillic, Devnagari, Japanese, and Roman) and reported a classification accuracy of 88 percent on document pages. There are a few important aspects of online documents that enable us to process them in a fundamentally different way than offline documents.
  • 2. The most important characteristic of online documents is that they capture the temporal sequence of strokes while writing the document. This allows us to analyze the individual strokes and use the additional temporal information for both script identification as well as text recognition. In the case of online documents, segmentation of foreground from the background is a relatively simple task as the captured data, i.e., the (x; y) coordinates of the locus of the stylus, defines the characters and any other point on the page belongs to the background. We use stroke properties as well as the spatial and temporal information of a collection of strokes to identify the script used in the document. Unfortunately, the temporal information also introduces additional variability to the handwritten characters, which creates large intraclass variations of strokes in each of the script classes. Data collection and Preprocessing The control for drawing that is writing the script is provided. User can choose the language and write the script document with several pages and store inside script folder. For one language various script can be written and stored. These documents are samples used for recognition process. When the number of sample increases the error rate for recognition can be reduced. So better keep as such script as possible before you are going for recognition. During preprocessing, the individual strokes are resampled to make the sampled points equidistant. This helps to reduce the variations in scripts due to different writing speeds and to avoid anomalous cases such as having a large number of samples at the same position when the user holds the pen down at a point. The strokes are then smoothed using a Gaussian (lowpass) filter. The x and y coordinates of the sequence of points are independently filtered by convolution with one-dimensional Gaussian kernels. This reduces noise due to pen vibrations and errors in the sensing mechanism. The individual strokes are again resampled to make the points equidistant. The resampling distances in both cases were set to 10 pixels and the standard deviation of the Gaussian was set to 6, both determined experimentally for the data. During the lowpass filtering and resampling operations, the critical points in a stroke are retained. A critical point is defined as a point in the stroke where the x or y direction of the stroke reverses (changes sign), in addition to the extreme points (pen-up or pen-down) of the stroke. Fig. 3 shows an example of preprocessing the online character a (consisting of a single stroke). The character “r” written in two different styles. The offline representations ((a) and (b)) look similar, but the temporal variations make them look very different ((c) and (d)). The writing direction is indicated using arrows in (a) and (b). The vertical axes in (c) and (d) represent time. Each user was asked to write one page of text in a particular script, with each page containing approximately 20 lines of text. No restriction was imposed on the content or style of writing (cursive or handprint). The writers consisted of college graduate students, professors, and employees in private companies and businessmen. The details of the database used for this work. Multiple pages from some of the writers were collected at different times.
  • 3. Line and Word Detection The data available to a script recognizer is usually a complete handwritten page or a subset of it. To recognize the script of individual lines or words in the page, we first need to segment the page into lines and words. The problem of text line identification in online documents has been attempted. To identify the individual lines, first the interline distance is estimated. The interline distance, d, is defined as the distance between successive peaks in the autocorrelation of the y-axis projection of the text.The y-axis projection of the document and the autocorrelation of the projection. The interline distance estimate is indicated as d. Lines are identified by finding valleys in the projection, keeping the interline distance as a guiding factor. To avoid local minima, we choose only those points, which have the smallest magnitude within a window of width d, as valleys. The text-line boundaries identified for the document along with the y-axis projection. Once the line boundaries are obtained, the text is divided into lines by collecting all the strokes which fall in between two successive line boundaries. The temporal information from stroke order is used to disambiguate strokes which fall across line boundaries and to correctly group small strokes, which may fall into an adjacent line. A word is defined as a set of strokes that overlap horizontally. The segmentation of a line into words is done using an x-axis projection of the text in the line. The valleys in the projection are noted as word boundaries and the strokes which fall between two boundaries are collected and labeled as a word. The minimum width of a valley in the projection for word segmentation was experimentally determined as n numbers pixels for the resolution of the device used. FEATURE EXTRACTION Each sample or pattern that we attempt to classify is either a word or a set of contiguous words in a line. It is helpful to study the general properties of each of the six scripts for feature extraction. 1. Arabic: Arabic is written from right to left within a line and the lines are written from top to bottom. A typical Arabic character contains a relatively long main stroke which is drawn from right to left, along with one to three dots. The character set contains three long vowels. Short markings (diacritics) may be added to the main character to indicate short vowels. Due to these diacritical marks and the dots in the script, the length of the strokes vary considerably. 2. Cyrillic: Cyrillic script looks very similar to the cursive Roman script. The most distinctive features of Cyrillic script, compared to Roman script are: 1) individual characters, connected together in a word, form one long stroke, and 2) the absence of delayed strokes. Delayed strokes cause movement of the pen in the direction opposite to the regular writing direction. 3. Devnagari: The most important characteristic of Devnagari script is the horizontal line present at the top of each word, called “Shirorekha”. These lines are usually drawn after the word is written and hence are similar to delayed strokes in Roman script. The words are written from left to right in a line. 4. Han: Characters of Han script are composed of multiple short strokes. The strokes are usually drawn from top to bottom and left to right within a character. The direction of writing of words in a line is either left to right or top to bottom. The database used in this study contains Han script text of the former type (horizontal text lines). 5. Hebrew: Words in a line of Hebrew script are written from right to left and, hence, the script is temporally similar to Arabic. The most distinguishing factor of Hebrew from Arabic is that the strokes are more uniform in length in the former. 6. Roman: Roman script has the same writing direction as Cyrillic, Devnagari, and Han scripts. We have already noted the distinguishing features of these scripts compared to the Roman script. In addition, the length of the strokes tends to fall between that of Devnagari and Cyrillic scripts. The features are extracted either from the individual strokes or from a collection of strokes. Here, we describe the features and their method of computation.
  • 4. Horizontal Interstroke Direction (HID) This is the sum of the horizontal directions between the starting points ofconsecutive strokes in the pattern. The feature essentially captures the writing direction within a line. Shirorekha Strength This feature measures the strength of the horizontal line component in the pattern. Average Stroke length Average length of individual strokes in the pattern. Shirorekha Confidence We compute a confidence measure for a stroke being a Shirorekha Each stroke in the pattern is inspected for three properties width of a word, always occur at the top of the word, and are horizontal. Recognition The above mentioned statistics are calculated and the classifier is used to find which script is used in the document. In this project the classifier used is k-Nearest Neighbor classifier. It finds the distance between two feature vectors was computed using the Euclidean distance metric. The transformation was applied to every feature to make their mean zero, and variance unity, in the training set. This transformation is referred to as normalization. With help of this training set database, the experiments are performed and this classifier is invoked to recognize the script. System Analysis System analysis can be defined, as a method that is determined to use the resources, machine in the best manner and perform tasks to meet the information needs of an organization. System Description It is also a management technique that helps us in designing a new systems or improving an existing system. The four basic elements in the system analysis are • Output • Input • Files • Process The above-mentioned are mentioned are the four basis of the System Analysis. System Design Design is concerned with identifying software components specifying relationships among components. Specifying software structure and providing blue print for the document phase. Modularity is one of the desirable properties of large systems. It implies that the system is divided into several parts. In such a manner, the interaction between parts is minimal clearly specified. Design will explain software components in detail. This will help the implementation of the system. Moreover, this will guide the further changes in the system to satisfy the future requirements. Form Design Form is a tool with a message; it is the physical carrier of data or information. The user interface form gives option for the user to select the language and user can write a new document, can open an existing document to view or can able to delete. If new document written it can be saved into the database.
  • 5. HARDWARE & SOFTWARE SPECIFICATION Processor : Intel Processor IV RAM : 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ color Monitor Keyboard : 108 mercury keyboard Mouse : Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – Java ONLINE HANDWRITTEN SCRIPT RECOGNITION System Flow Chart