National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012)
[1] A novel method is proposed to differentiate between palm leaf scribers using 2D correlation. Palm leaves from two scribers are used as samples.
[2] The method captures depth information from palm leaf images and compares pixel point coordinates of characters written by different scribers in the YZ plane.
[3] 2D correlation values are calculated between test character images and database images of the same character written by different scribers. Higher correlation with one scriber indicates the test character was written by that scriber. The method correctly identified the scriber for over 300 tested characters.
novel method of differentiating palm leaf scribers using 2D corelationPaper electronics conference-cbit
1. National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012)
A Novel Method of Differentiating Palm Leaf
Scribers Using 2D Correlation
Panyam Narahari Sastry*, Dr. N.V.Srinivasulu**
*Department of Electronics and Communication Engineering, CBIT, Hyderabad, India
**Department of Mechanical Engineering, C.B.I.T., Hyderabad, India.
E-Mail: *ananditahari@yahoo.com
Abstract- Character Recognition (CR) is one of the With the passage of time, most of these palm leaves, if left
oldest applications of automatic pattern recognition. unattended, will deteriorate as they are coming to the end of
Recognizing Hand-Written Characters (HWC) is an their natural lifetime as they face destructive elements such
effortless task for humans, but for a computer it is an as dampness, fungus, bacteria, ants and cockroaches. For this
extremely tricky job. Research in character recognition is reason, Rashtriya Sanskrit Vidyapeeth (RSVP), Tirupati,
very popular for various application potentials in banks, Andhra Pradesh, India and Oriental Research Institute, (ORI)
post offices, defense organizations, reading aid for the is establishing the Palm Leaf Manuscript Preservation
blind, library automation, language processing and Project for the discovery, preservation and protection of
multi-media design. Even though Epigraphical work palm leaf manuscripts and to extract knowledge from the
dealing with stone inscriptions have been analyzed, these ancient world [3, 4]. Currently, computer technology can
have been done largely manually and also on 2D traces. A store and process the ancient image documents in multimedia
large collection of these are available in the classical systems. It is possible to collect and access those manuscripts
Indian languages like Sanskrit, Tamil, Pali etc as well as and preserved them in digital formats in the computer [5].
in more modern languages like Telugu. These palm Although currently storing systems can store document
leaves contain religious texts and treaties on a host of images, there is no specific system to retrieve the knowledge
subjects such as art, medicine, astronomy, astrology, from these ancient documents. However, it is recognized that
mathematics, law and music. However, they have not it is not an easy task as there are many styles of traditional
been exploited in the manner they deserve to be. While handwriting, noise on the images, and fragmentation or
the reasons for this are manifold the minimally available cracks due to fragility of the aged leaves. It is common that
methods applicable to the specific purpose of Palm Leaf images of the collected ancient documents are of poor
Character Recognition (PLCR) and digitization is one of quality due to insufficient attention paid to the condition of
the primary reasons. These characters on the palm leaf the storage and the quality of the written material. As a
have the additional properties like depth, an added result, the foreground and background in the scanned images
feature which can be gainfully exploited in character are difficult to be separated. Many of the palm leaf images
recognition. This paper describes the method to find out have varying contrast and illuminant, smudges, smear, stains,
if the palm leaves of two different folios or sets get mixed and contaminations due to seeping ink from the other side of
up using 2D Correlation values. The results obtained the palm leaf elimination is also proposed.
show very distinct 2D correlation values between the test Character Recognition (CR) is one of the oldest
samples and the database samples. applications of automatic pattern recognition. To recognize
Hand-Written Characters (HWC) is an effortless task for
Key Words- Palm Leaf Character recognition, 2D humans, but for a computer it is an extremely tricky job. This
Correlation, Folio, Pattern recognition. is mainly due to the vast differences or the impreciseness
associated with handwritten patterns written by different
I. INTRODUCTION individuals [6, 7]. Machine recognition involves the ability
of a computer to receive input from sources such as paper
Palm leaf manuscripts were one of the popular written and other documents, photographs, touch screens and other
documents for over a thousand years in South and Southeast devices, which is an ongoing research area [8].
Asia [1, 2]. In Indian history, dried palm leaves have been
used to record Buddhist teaching and doctrines, folklores, II. DATA ACQUISITION METHOD
knowledge and use of indigenous medicines, stories of
dynasties, traditional arts and architectures, astrology, Palm leaves were provided by Oriental Research Institute
astronomy and techniques of traditional massages. Recently, (ORI), S.V. University Campus, Tirupati, Andhra Pradesh.
several universities and institutes including medical For the present research, we have chosen palm leaves of two
departments and religion organizations have initiated different scribers. The photographs of the palm leaves are
projects to collect, recover and preserve Indian palm leaf shown in figure 1 wherein the red arrow depicts the holes of
manuscripts. It is recognized that these documents contain the Folio which helps to store the leaves between the wooden
invaluable knowledge, history, culture, and local wisdoms of boards.
Indian civilization. In particular, knowledge concerning
indigenous medicines has been studied with great attention
due to their potential in treating many ailments and diseases.
2. A Novel Method of Differentiating Palm Leaf Scribers Using 2D Correlation
III. PROPOSED SCRIBER RECOGNITION METHOD
In this proposed method, 2D correlation is used to find
whether the leaf pertains to a specific scriber’s folio or it is
written by a different scriber. In handwritten documents on
paper, writers can be differentiated distinctly on the basis of
appearance of the letters. Since for counterfeits it is easy to
copy the appearance of characters, identifying writers to a
writing/signature can also be a tricky affair. Lipikaras were
highly trained professionals and scribing on the palm leaves
was an extremely serious affair. Thus distinguishing the
scribings on the basis of appearance of the letters is not
simple. However, the pressure applied by the scriber at
various pixel points in a character is different for different
scribers. It is presumed that pressure is directly proportional
to the depth of indentation (in microns) which is available to
us from the Z axis data. Using this concept, images were
compared in YZ plane for two different scribers for the
various Telugu palm leaf characters. For the same character
5 samples for 2 different scribers were considered for testing
and training. Table no. 1 and Table no 2 are showing the co-
ordinates for the Telugu Characters “Aa” and “Tha”.
Our basis for the differentiation was to compare the
correlation value obtained for the character scribed by the
author at different positions and correlation of the same
Fig. 1 Palm leaves chosen for the study character when scribed by a different author. Thus if the test
image of a particular character say “Ae” belongs to scriber 1
Table No. 1 Co-ordinates of Aa then the correlation coefficient obtained between the test
image and any other sample image of “Ae” of scriber 1
Aa should be distinctly greater compared to the correlation
Pixel points X (mm) Y (mm) Z (mm)
1 1.091 0.16 25 coefficient obtained between the test image and “Ae” of
2 1.456 0.49 24 scriber 2 samples. Recognizing the scriber of a certain
3 0.925 0.999 26 document is a great challenge in terms of pattern recognition
4 0.338 0.725 29 but is also of immense value.
5 0 0 29
6 0.338 -0.547 28
Traditional paper based documents are being replaced by
7 1.832 -0.825 27 digital documents for official and legal purposes. Hence
8 2.797 -0.547 28 authenticating these digital documents is extremely critical.
9 3.002 0.396 29 Authentication of the security documents including
10 2.51 0.756 33 banknotes, passports, etc. which may be printed on paper or
11 2.281 0.306 28
12 3.042 -0.087 34
any other support is a very important application of Digital
13 2.098 -0.047 34 Document Analysis. Automatic document authentication
14 0.741 -0.06 36 consists of an image acquisition system such as a CCD
15 0.741 -0.08 38 camera and a processor whose job is to compare the acquired
intensity profile with a pre stored reference image. The
Table No. 2 Co-ordinates of Tha document handling device accepts or rejects the document
depending on the match, which is connected to the
Tha comparing processor. E-mails are the electronic documents
Pixel points X (mm) Y (mm) Z (mm)
1 0.308 -0.400 26 which have replaced the paper documents due to the need of
2 0.687 0.066 26 quick response and faster means of communication but still
3 0.300 0.327 34
4 0.000 0.000 31
lack the accountability. Digital watermarking and public key
5 0.188 -0.426 20 encryption-based authentication are the most common
6 0.418 -0.789 25 methods used for authentication of the digital documents. It
7 0.833 -0.658 25
8 1.428 -0.618 39
is possible to apply this concept of pen pressure to the online
9 1.620 -0.423 38 signature verification in addition to the two dimensional
10 1.670 -0.144 35 character matching.
11 1.180 0.412 38
12 1.310 -0.127 39
13 1.400 0.390 28
14 0.842 0.798 34
15 0.284 0.876 96
16 0.842 0.812 94
17 1.345 1.342 48
18 1.949 1.554 59
3. National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012)
test character set and the training character set are
IV. IMPLIMENTATION OF SCRIBER RECOGNITION disjoint sets in this work.
METHOD CC* in the table depicts Correlation Coefficient value (r)
The flow chart of proposed method of implementing scriber calculated using the following equation 1 :
recognition is shown in figure 2.
Load Test Images and data base images
Step 1 of size 50x50 pixels
(1)
Step 2 Read all the images
Convert the image into binary type
Step 3 using a threshold value of 0.7 where A and B are the matrices of images of same size and r
indicates the Correlation Coefficient in the range of 0 to 1.
Find Average correlation co-efficient of
Step 4 test images belonging to scriber 1 with
all the scriber 1 data base images of same
character
Find Average correlation co-efficient of
Step 5 test images belonging to scriber 1 with
all the scriber 2 data base images of same
character
Step 6 Allot the test character to the scriber
displaying a higher average correlation
coefficient
Step 7 Check for the number of matched
and mismatched characters
Fig: 2 Implementation of scriber authentication algorithm
V. RESULTS AND DISCUSSIONS
All the experiments are carried on a PC machine with P4
3GHz CPU and 512MB RAM memory under Matlab 7.0
platform. The database images consists of 4 different images
of each class and hence 29X4=116 images. These images are
of size 50X50 pixels and are in the .tiff format. More than
300 character images were tested. All the images of the
database and the images to be tested were of YZ plane of
projection (XY data failed to differentiate the characters
between authors). All the 28 different Telugu characters
(Classes) were used as test characters to test the accuracy of
the proposed method. The Database images consisted of
both Scriber No. 1 and 2 where as the test characters
(Images) were of Scriber No. 1 (One from each class).
Each of the test image character was tested using
Correlation with all the available database images (consisting
of both Scriber1 and Scriber 2) for a particular character. The
average correlation co-efficient of the test character with
both Scriber 1 and 2 was determined separately and the
results are tabulated in Table No. 3. Also, the time taken to
run the program has been captured in the same table. The
4. A Novel Method of Differentiating Palm Leaf Scribers Using 2D Correlation
Table No. 3 Scriber Authentication results
Author Author
Test Program Scriber
1 2 Difference of % Difference Of
S.No. Char time in Recognized (Yes
average average CC Correlation
acter Seconds / No)
CC* CC*
1. a 0.2096 0.0277 0.1819 86.78435115 0.89 Y
2. aa 0.1182 0.0051 0.1131 95.68527919 0.39 Y
3. ala 0.1093 0.0253 0.084 76.85269899 0.46 Y
4. bra 0.1505 0.0049 0.1456 96.74418605 0.44 Y
5. khaa 0.2414 0.0204 0.221 91.54929577 0.53 Y
6. la 0.0706 0.0614 0.0092 13.03116147 0.39 Y
7. tha 0.1614 0.1154 0.046 28.50061958 0.48 Y
8. ae 0.0384 0.001 0.0374 97.39583333 0.46 Y
9. gha 0.1433 0.1284 0.0149 10.39776692 0.39 Y
10. haa 0.1022 0.0067 0.0955 93.44422701 0.46 Y
11. na 0.0898 0.0199 0.0699 77.83964365 0.4 Y
12. pa 0.2665 0.0115 0.255 95.684803 0.46 Y
13. sa 0.1255 0.0744 0.0511 40.71713147 0.46 Y
14. shaa 0.3062 0.0005 0.3057 99.83670803 0.46 Y
15. va 0.1858 0.0371 0.1487 80.03229279 0.45 Y
16. ya 0.2905 0.1143 0.1762 60.65404475 0.45 Y
17. ka 0.1223 0.0038 0.1185 96.89288635 0.47 Y
18. ksha 0.1633 0.0044 0.1589 97.30557257 0.46 Y
19. ba 0.1104 0.0657 0.0447 40.48913043 0.45 Y
20. bha 0.0996 0.0091 0.0905 90.86345382 0.46 Y
21. ja 0.127 0.0406 0.0864 68.03149606 0.46 Y
22. ru 0.3121 0.2251 0.087 27.87568087 0.4 Y
23. da 0.1697 0.0571 0.1126 66.35238656 0.47 Y
24. cha 0.0905 0.0402 0.0503 55.5801105 0.39 Y
25. dha 0.128 0.0074 0.1206 94.21875 0.46 Y
26. ee 0.0538 0.0439 0.0099 18.40148699 0.45 Y
27. ga 0.058 0.0244 0.0336 57.93103448 0.51 Y
28. saa 0.0311 0.0165 0.0146 46.94533762 0.45 Y
Characters in red are the lowest CC values and in blue are > 95% CC difference
5. National Congress on Communications and Computer Aided Electronic Systems (CCAES 2012)
VI. CONCLUSIONS [6]. V.N.Manjumeh Aradhya, G.Hemanth Kumar,
S.Noushat, “Multilingual OCR system for South Indian
1. All the test characters belonging to Scriber 1 had higher Scripts and English documents: An approach based on
average correlation co-efficient when tested with Fourier transform and PCA”, Elsevier, Engineering
scriber1 compared to characters of scriber 2 located at applications of artificial intelligence, pp. 658-668, 2008.
other position on the leaf. The test and the training [7]. B.B.Chaudhuri and Ujwal Bhattacharya, Handwritten
character set are disjoint sets. numeral databases of Indian scripts and multistage
2. The Characters La, Tha, Gha, Ru and Ee have shown recognition of mixed numerals, IEEE transcations on pattern
less than 30% of difference of average correlation analysis and machine intelligence, Vol.31 No.3, pp.444-457,
between the test character and database characters of March 2009.
Scriber1 and 2. [8] Senior and Robinson , “An Off-Line Cursive
3. The time taken for identification of the Scriber is very Handwriting Recognition System”, IEEE Transactions on
low and is less than 1 second. Pattern analysis and Machine Intelligence, Vol.20, No.3, pp.
4. If the right characters are selected as test character then 309-321, 1998.
the scriber identification is 100 %.
5. A rigorous test of this idea needs to be further
established with data from a greater number of
samples/scribers, which is beyond the scope of the
present work. If the above mentioned characters remain
poor differentiators we can select specific characters
(characters in blue in the table 10.1) which can be used
for differentiation/authentication more accurately.
ACKNOWLEDGMENT
The author whole heartedly acknowledges the co-
operation extended by Sri S. Anand, Finance Officer, RSVP
(Rashtriya Sanskrit Vidyapeeth), Tirupati in procuring the
palm leaves from Oriental Research Institute, Tirupati, A.P,
India. Further, the author expresses sincere gratitude to Dr.
Vally Maya who has actively participated in the technical
discussions and rendered appropriate suggestions at every
stage in the work.
REFERENCES
[1] O. Surinta and R. Chamchong, "Image Segmentation of
Historical Handwriting from Palm Leaf Manuscripts," in 5th
IFIP International Conference on Intelligent
Information Processing, Beijing, China, 2008, p. 280.
[2] Z. Shi, S. Setlur, and V. Govindaraju, "Digital
Enhancement of Palm Leaf Manuscript Images using
Normalization Techniques," in 5th InternationalConference
On Knowledge Based Computer Systems, Hyderabad, India,
2004.
[3] Panyam Narahari Sastry, Ramakrishnan Krishnan,
Bhagavatula Venkata Sanker Ram, Telugu Character
Recognition on Palm Leaves-A three dimensional Approach
Technology Spectrum (JNTU Hyderabad), Vol. 2, No. 3,
pp.19-26, November 2008.
[4].Panyam Narahari Sastry, Ramakrishnan Krishnan and
Bhagavatula Venkata Sanker Ram, Classification and
Identification of Telugu hand written characters extracted
from palm leaves using decision tree approach, ARPN
Journal of Engineering and Applied Sciences, Vol. 5, No. 3,
March 2010.
[5] Panyam Narahari Sastry, Ramakrishnan Krishnan and
T.V.Rajanikant, Palm Leaf Telugu Character Recognition
using Hough Transform , International conference on
advanced computing Methodologies, Elsevier, pp 21-28,
December 2011.