4. Problem definition(continue)
Current Approaches
OCR - What it is ?
Optical Character Recognition, or OCR, is a
technology that enables you to convert different
types of documents, such as scanned paper
documents, PDF files or images captured by a
digital camera into editable and searchable data
6. Problem definition(continue)
Solution
Apply New Feature extraction
Technique
Using Artificial Neural Network
Expected 100% accuracy of character
identification.
8. Preprocessing
Preprocessing stage has several tasks
to be done:
Binarization
Noise filtering
Smoothing
normalization
Pre Processing
Segmentation
Feature Extraction
Classification and
recognition
9. Segmentation
An image of the sequence of
characters is decomposed into sub-
images of individual character.
Pre Processing
Segmentation
Feature Extraction
Classification and
recognition
10. Feature Extraction
Feature Extraction Based on Character
Geometry
It extracts different line types that
form a particular character.
The feature extraction technique
explained was tested using a Neural
Network which was trained with the
feature vectors obtained from the
system proposed.
Pre Processing
Segmentation
Feature Extraction
Classification and
recognition
11. Feature Extraction(continue)
Universe of Discourse
Original Image Universe of
Discourse
Pre Processing
Segmentation
Feature Extraction
Classification and
recognition
17. Artificial Neural Network(continue)
Pre Processing
Segmentation
Feature Extraction
Classification and
recognition
Parameters Used for the ANN
Number of
layers
Node of layers
3 Input 108
Hidden 78
Output 34
Number of
layers
Node of layers
3 Input 108
Hidden 76
Output 34
22. Result
Using a PC with Intel core i5 – 6200u @ 2.30 GHz processor
and 8GB RAM with Windows 10 premium environment.
Technique
Used
Total
Character
in database
No: of
Training
characters
No: of
Testing
characters
Performanc
e
Artificial
Neural
Network
850 680 170 82.1%
23. Conclusion
The proposed neural network architecture
has an ability to classify the character
patterns in some degree.
But it shows difficulties during the
classification of unknown samples. Since as a
future enhancement, it is expected to
improve the current architecture
24. Conclusion and future works
Make the system more font independent
Increase the number of nodes and layers in
ANN.
Try different recognition algorithms such
HMM(Hidden Markov Model).
Improve the separation of touching characters.
Improve the efficiency of the feature
extraction method.
Improve the system to identify any other
characters.
25. Reference
1. https://www.abbyy.com/en-apac/finereader/about-ocr/what-is-
ocr/
2. https://in.mathworks.com/?requestedDomain=www.mathworks.co
m
3. Dinesh Deleep. A feature extraction technique based on character
geometry for character recognition.
4. SANDHYA ARORA,DEBOTOSH BHATTACHARJEE,MITA NASIPURI,
L.MALIK,M.KUNDU, D.K.BASU, Performance Comparison of SVM and
ANN for Handwritten Devanagari Character Recognition,
International Journal of Computer Science Issues (IJCSI) , Vol. 7
Issue 4, p18. (July 2010)
5. RANPREET KARU,BALJITH SINGH, A hybrid neural Approach for
Character Recognition System,(IJCSIT) International Journal of
Computer Science and Information Technologies, Vol. 2 (2) , 721-
726. ( 2011)
I have divided my presentation into 6 main topics.
Hand writing character recognition has been one of the most fascinating and challenging research areas in the field of image processing and pattern recognition in the recent years.
Many efforts has been made in recognizing both online and off-line character recognition automatically.
Many approaches have been proposed most of them focus on the English language.
A little attention has been given for Sinhala character recognition.
There are number of different technologies are being used and tested for Sinhala handwritten character identification. But due to the variations of the handwritten characters, still the handwritten character recognition became a task for researchers.
And also it is harder to develop a global system for Sinhala character identification.
Due to the variation of the characters. because
Handwritten characters are changed per person to person.
Also same person for several time
Bad algorithm for feature extraction.
Keeping this problem in mind. I started my research works
By analyzing the works I have got a solution.
There are 4 steps in my methodology. Those are ….
The pre-processing is a series of operations performed on the scanned input image.
It essentially enhances the image rendering it suitable for segmentation.
The role of pre-processing is to segment the interesting pattern from the background.
Generally, noise filtering, smoothing, and normalization should be done in this step.
IN Binarization process converts a gray scale image into a binary image. Dilation of edges in the binarized image is done using Sobel technique.
The pre-processed input image is segmented into isolated characters by assigning a number to each character using a labeling process. This labeling provides information about a number of characters in the image. Each individual character is uniformly resized into pixels.
It also concentrates on the positional features of the same.
The feature extraction technique explained was tested using a Neural Network which was trained with the feature vectors obtained from the system proposed.
The universe of discourse is defined as the shortest matrix that fits the entire character skeleton.
The Universe of discourse is selected because the features extracted from the character image include the positions of different line segments in the character image.
So every character image should be independent of its Image size.
So I have convert into 50 * 50 pixel images.
The image is divided into windows of equal size, and the feature is done on individual windows.
The image was zoned into 9 equal-sized windows.
Feature extraction was applied to individual zones, rather than the whole image.
This gives more information about fine details of character skeleton.
Also, positions of different line segments in a character skeleton become a feature if zoning is used.
This is because a particular line segment of a character occurs in a particular zone in almost cases.
Starters are those pixels with one neighbor in the character skeleton. Before character traversal starts, all the starters in the particular zone are found and are populated in a list.
I have circled the starting points in the image.
it should have more than one neighbor.
The training features from the characters are extracted using the feature extraction technique as mentioned in last sides.
The ANN is provided 108 feature values from the character features.
Next ann is used in the recognition process
The artificial neural network used for recognizing the handwritten font characters is contained in three layers. The Experimentally finalized parameters for the Artificial neural network for a training set of 850 characters of handwritten characters is as follows
Implementation of the input layer.
The input layer for the neural network is contained 108 nodes itself. During the implementation of the neural network using MATLAB neural network toolbox, the layers of the ANN were represented through the two-dimensional matrix ( 108 x 850 ) .
Implementation of hidden layers.
The hidden layers also represented through the one-dimensional array. The size of the array is depended on the number of nodes used for the hidden layer. For the implementation of the neural network, it was used 76 nodes for the hidden layer. The outputs calculation associated with the hidden nodes are based on the tangent sigmoid function.
Implementation of the output layer.
The output layer of the neural network is represented using a one-dimensional double type array with 34 indexes. The array may store the result values of hidden layer after applying the activation function on them.
In order to test the ANN for the character identification, the neural network was trained using character patterns. The convergence of the ANN can be monitored using the graph drawn between mean square error and the number of iterations. With the time the error of the neural network has to be reduced. The following graph represented at the following figure shows the minimization of the error with the iterations.
1. Font size
2 . Hidden markov model
3 . Since the system is not able to identify the touching characters. In order to accomplish this problem, water reservoir concept has to be used in future.
4 . Since all of the performance in the character pattern identification is based on feature extraction methodology, it is important to make error free (noise free) character features. Since as the next step, it is expected to insert high-level image processing techniques for the feature extraction process.
My presentation is almost over now .it is time for me to express my gratitude.
First i would like to express my sincere thanks to my supervisor dr hmm naleer senior lecture in computer science ,for his kind cooperation and advise throughout this project.
Then i would like to thanks the computer science department and my friends who helped me in various ways.
Finally and most of all i would like to thank my parent for the eternal support,love and encouragement.