SlideShare una empresa de Scribd logo
1 de 13
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
10
Finding Similarities between Structured Documents as a Crucial
Stage for Generic Structured Document Classifier
Hamam Mokayed (Corresponding author)
Universiti Teknology Mara UiTM, Faculty of Information Technology and Quantitative Sciences
Universiti Teknologi MARA (UiTM) 40450 Shah Alam, Selangor Darul Ehsan, Malaysia
E-mail : Homammo@gmail.com
Azlinah Hj. Mohamed
Universiti Teknology Mara UiTM, Faculty of Information Technology and Quantitative Sciences
Universiti Teknologi MARA (UiTM) 40450 Shah Alam, Selangor Darul Ehsan, Malaysia
E-mail : Azlinah@tmsk.uitm.edu.my
Abstract
One of the addressed problems of classifying structured documents is the definition of a similarity measure that
is applicable in real situations, where query documents are allowed to differ from the database templates.
Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents
as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount
of forms could be written in different languages, for example here in Malaysia forms could be written in Malay,
Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify
the requested documents taking into consideration that OCR is considered more easier and accurate rather than
the layout detection.
Keywords: Feature Extraction, Document processing, Document Classification.
1. Introduction
In the light of globalization and its radical economic and technological evolution, offices and banks are forced to
manage a large number of documents in different formats from customers, partners, etc. which must be classified
and organized quickly in order to provide a differential service. This involves staff training costs, the establishing
of procedures that ensure the categorization criteria, verification process, etc. which increase costs and document
processing time, without ensuring the utmost quality in the process.
In this work we find a way to extract similarities between the documents based on reference lines and distinctive
blobs of structured document. A dynamic tilting technique based on clustering has been proposed on the adaptive
thresholded image. The tilted image is used as inputs to the two different feature extraction modules, the first one
is based on extracting the reference lines (vertical, horizontal, and diagonal), and the second detects distinctive
data in the structured document such as Logo and title area. The software is developed using C++ programming
language and its block diagram is shown as in Fig. 1.
This paper has been organized with the introduction first. The pre-processing steps are described in the next
section. The proposed tilting module is next described. Feature extraction modules are discussed in the sections
that follow.
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
11
2. Pre-processing Stage
The pre-processing part of the proposed system consists of three steps as is shown in stage1 in Fig.1
2.1 Grayscale Conversion
This method tries to convert the scanned image to grayscale
2.2 Size Normalization
The important idea underlying the stage carried out is to get normalized length images to enhance system
performance by scaling each image both horizontally and vertically to get the same scaled image.
W
xx
xx
x
o
i
i
minmax
min
−
−
=
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
12
H
yy
yy
y
o
i
i
minmax
min
−
−
=
Where
( )o
i
o
i yx , are the original points
( )ii yx ,
are the points after transformation.
{ }o
ii xx minmin =
{ }o
ii xx maxmax =
{ }o
ii yy minmin =
{ }o
ii yy maxmax =
W is the width of the normalized data = 500
H is the height of the normalized data = 500
Figure 2. Example of Normalization Process Applied on the Scanned Image.
2.3 Adaptive Thresholding
The system thresholding technique is developed based on the ordinal structure fuzzy logic model which has an
advantage due to their ability in handling multiple inputs and outputs. Based on the 3 highest accurate
pre-developed thresholding techniques such as SIS, Deravi, and Rosenfeld the fuzzy logic model is developed to
predict the value of the pixel after thresholding whether it is black or white. Software has been developed for
such purpose and the performance of the structured document classifier is compared to the different thresholding
techniques. The results based on using SOFM for thresholding show a high level of accuracy which can be used
(0,0)
500
X
Y
0 500
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
13
for many applications.
3. Tilting Structured Document
Tilting and skewing the scanned structured document is considered as a crucial stage as it has a great impact on
the accuracy of the whole structured document classification system. Various techniques have been developed to
calculate the value of proper rotation angle to adjust the tilted image. However, most of these techniques were
applied as pre-stage of OCR systems and had the problem of expensive computational costs. In the proposed
system, tilting will be accomplished by using the reference lines in the structured documents as a base of
calculation. The objective is to find a way with lower computation time and higher accuracy results over
different layouts that might contain non-text areas. Several algorithms were implemented to find the proper skew
angle of tilted image [4,5], some of the applied methods can be classified as the following:
• Projection-based Methods [6,7]: basic concept of these methods is to use the extracted features out
of projection profile calculation to determine the skew value.
• Hough transform-based methods [8,9]: methods depend on transforming the coordinates from
Cartesian to polar and using the new values to calculate the skew angle.
• Cross – correlation methods [10]: vertical deviation along the image is used to calculate the skew
of image.
• Clustering – based methods: these methods try to find a way to connect the pixels and use this
connection to calculate the skew value. Shivakumara and Kumar [11] apply the nearest neighbor
technique by labeling the connected black pixels and grouping them in blocks, these blocks were
used to calculate the skew value. In the other hand, Sarfraz et al. [12] identify words within the
same line to produce blob used in skew angle calculation. Chou et al [13] proposed a method based
on drawing scan lines over the document at various angles. The skew angle is estimated by
constructing parallelograms with these scan lines that cover objects in the image.
3.1 Clustering – Based Proposed Approach
As structured document contain of different components such as lines, text, check boxes and circles, logos, etc.,
lines are the essential layout structure of the document and can be detected more reliably than other components.
So, we calculate the skew angle based on the document detected lines. Proposed approach is based on looking
for the connected lines and estimating the optimum skew angles of the reference line. The idea behind this
approach is that the distance between the consecutive points of the line is too small than the other components.
So if we are able to detect the lines and find the reference line (line which has the largest number of pixels) out
of them. We then estimate the skew angle based on that line.
To achieve the previous mentioned aim, the following stages should be executed:
3.2 Noise removal
After binarizing the image, this stage hopes to eliminate the randomly distributed noisy pixels that might affect
on the calculation of the tilting value. The method is based on segmenting the whole binarized image to 5*5
sub-images, it statistically examines the intensity values of the active pixels in the sub window by counting total
number of them and comparing the value with a threshold value in order to decide whether to flip their values to
background or not. The size of the window has to be large enough to cover sufficient foreground and
background pixels and to avoid either keeping the noise in case of too small size or disconnecting the line
because of choosing too large region. After applying the proposed method to remove the noise, the values of the
first and last three columns and rows of the image will be set to zero to avoid the edge noise caused by the low
scanning resolution of the image.
If < Th then = 0
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
14
Else Do Nothing
Where
Total Number of active pixels within 5*5 window.
Th = (is* is + is ) / 3 : is=2 increment Step for a pixel to constitute the sub window
000000111000000
000000001000000
000010010010000
000100000000010
000100010000000
001000100100010
010000000011001
110001000000000
000000000000000
000010000000100
000000000000000
000010000001000
000000000000010
001100000000001
010001000000000
Figure 3. Noise removal process.
3.3 Tilting and skewing
Method using clustering technique will be presented to calculate the value of the angle. before starting the
proposed method, a calculation of the first black pixel Pi(xi,yi) will be done and based on the condition of
finding a small value of Y axis histogram at yi (histY[yi]), the angle calculation will start, otherwise no
skewing is required .
Sample of image tilted to left Sample of image tilted to Right
If xi < width/2 then image might be tilted to left, otherwise right
Figure 3. Examples of Different Tilted Images.
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
15
Four referencing points are used to calculate the best value of the angle as the following:
Pclk: first black pixel by scanning the columns form left to right
Pcrk: first black pixel by scanning the columns form right to left
Prtk: first black pixel by scanning the rows from top to bottom
Prbk: first black pixel by scanning the rows form bottom to top
A sequence of operations should be executed over each of the previous mentioned referencing points to calculate
the most accurate tilting angle.
Applying the proposed method over Pclk as an example
:
( )}
After calculating Number of pixels in each group of the referencing points (Num []), group which has the highest
number of pixels will be used to calculate the skewing angle. Let us suppose that is the nominated group
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
16
1- Original Image 2-thresholding and removing noise 3- Reference line used for
tilting 4- Tilted Image
Figure 5. Result of Applying Tilting Process on Different Samples.
4. Feature Extraction
Finding similarities in document has offered great challenge to the solution provider due to the huge amount and
unsorted documents which were collected every day and differed in terms of outlook, layout, size, meaning etc
[14]. In addition to, finding generic features that able to interpret different types of forms is not feasible using
present techniques, it is possible to develop a specific method to process the specific type of forms contained in
the documents. A little research had concentrated on the layout of document, as a primary key for the feature
extraction to be used in classification. The most recent work applied watermark embedding method to obtain
layout information of the document with concentration of documents integrity detection and offered secret
information to be hidden, and it was robust to hard and soft documents [15].
As the proposed technique is language separable, reference lines and blobs could be considered as good features
in order to classify the structured documents. Next section will clarify the way of extracting features and all the
pre-processing stages that carried out to enhance the extraction process.
4.1 Extraction of reference lines
Proposed system recognizes ruled lines by extracting vertical, horizontal, and diagonal lines from the binary
image, obtained by adaptive thresholding method. Proposed line extraction algorithm uses pruning as one of the
morphological operations, thinning, connecting gaps of broken and disconnected lines, and the clustering
technique to extract the reference lines and use them as a one of the feature in the classification system.
a. Noise removal
After thresholding and skewing the image, this stage hopes to eliminate the noisy pixels and spurs on the end
points in order to enhance the lines extraction process. Pruning as one of the morphological operations is applied to
remove these noisy scattered pixels which are not related to any one of the objects in the document.
The applied pruning uses two basic structuring elements and iterates them to produce other six structuring
elements. All structuring elements are applied once over the processed image to get the required results.
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
17
Figure 6. The result of Applying Pruning Structuring Element ver Structured Document.
b. Thinning
Thinning is usually used in applications that aiming to extract lines for different purposes such as OCR to
improve the recognition accuracy [16]. Thinning is applied to reduce thickness of each line to just a single pixel
line in order to facilitate the extraction process of the reference lines. The applied thinning method uses two
basic structuring elements and iterates them till the convergence. Iterating the elements will produce eight
structuring elements used for skeletonizing the whole image. As shown in the below figure, the image is first
thinned by two basic structuring elements, and then with the remaining six 90° rotations of the two elements.
The process is repeated till there is no change by applying anyone of the structuring element.
0 0 0 0 0 0 0 0 0 -1 0 0 -1 -1 0 0 -1 -1
0 1 0 0 1 0 -1 1 0 -1 1 0 0 1 0 0 1 0
0 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 0 0 0 0 0
1st
structured
Element
2nd
structured
Element
1st
clockwise(1)
2nd
clockwise(1)
1st
clockwise(2)
2nd
clockwise(2)
0 0 -1 0 0 0
0 1 -1 0 1 -1
0 0 0 0 0 -1
1st
clockwise(3)
2nd
clockwise(3)
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
18
Figure 7. The Result of Applying Thinning Structuring Element over Structured Document.
c. Joining broken lines
before starting the proposed method to detect the lines, values of the histograms (histY , hsitX) will be used as
inputs to decide whether to process joining for horizontal and vertical lines or not.
After specifying the target column or row, the decision of connecting tow groups of sequential active points
separated by spaces will be depended on the number of active pixel in the first part, number of non active points
(Numspaces), and number of active points (Num2active) in the second part. As shown in the following formula
0 0 0 -1 0 0 1 -1 0 -1 1 -1 1 1 1 -1 1 -1
-1 1 -1 1 1 0 1 1 0 1 1 0 -1 1 -1 0 1 1
1 1 1 -1 1 -1 1 -1 0 -1 0 0 0 0 0 0 0 -1
1st
structured
Element
2nd
structured
Element
1st
clockwise(1)
2nd
clockwise(1)
1st
clockwise(2)
2nd
clockwise(2)
0 -1 1 0 0 -1
0 1 1 0 1 1
0 -1 1 -1 1 -1
1st
clockwise(3)
2nd
clockwise(3)
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
19
d. Line Segments Detecting and Grouping
Proposed method will be used to extract the vertical, horizontal, and diagonal lines out of the previous processed
binary image. First of all, pruning operation is applied to find the end points of the lines and use them as starting
points of the line tracing process. The applied pruning method is exactly like the once implemented before but it
will be used here to mark the pixels which are detected by the process not to eliminate them. After specifying the
pixels, the following steps will be executed
• Get one of the pruning points
• Mark the point in order not to be used anymore, save it as one of the line’s pixel, increase the
number of line pixels, and compare the coordinates to get the value of Xmin , Ymin, Xmax, Ymax
of that line.
• Scan all the eight neighboring pixels and count how many pixels are active and not marked.
• In case the value retrieving back from stage 3 is
o 1: repeat steps starting from 2
o >1 obtain the coordinates of each active pixel, push them into a branch vector, and jump
to step 5
o 0: jump to step 5
• Analyze the detected line and decide whether it is one of the reference line or not by applying the
following steps:
Xdiff = Xmax - Xmin
Ydiff = Ymax - Ymin
Rval = (Ydiff / Xdiff ) in case Ydiff < Xdiff , otherwise Rval = (Xdiff / Ydiff )
o If (number of line’s pixel > 30 and Rval > 0.2 and (Ydiff < 25 || Xdiff < 25) ) then the line
isn’t one of the reference line and no need to save it; otherwise save the line
o If the branch vector has any values get one of the value and jump to 2
• Repeat the steps starting from one.
Figure 8. Applying Line Extraction Process over Two Different Samples.
4.2 Extraction of blobs
This stage tries to detect distinctive data in the structured document such as Logo and title area, the places of
required information can be notices as connected regions in binary image. The aim of this stage is to enhance the
accuracy by adding the useful information of the detected places as inputs to the classifier beside the information
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
20
extracted from the reference lines in the previous stage. Proposed blob extraction algorithm uses dilation as one of
the morphological operations, joining pixel, and proposed technique to extract the blobs and use them as a one of
the feature in the classification system.
a. Dilation
Dilation is usually applied to merge nearby regions by expanding all the white parts (active pixels) and facilitating
the extraction process of the blobs. The applied method uses one structuring element (3*7) as shown in the figure
below. The process is flipped all the neighboring pixels of the target pixel to one in case the pixel is active,
otherwise nothing will change.
b. Joining Pixels
Before starting the proposed method to detect the accurate blobs, this step joins the connected target pixel into
different blobs and removes non distinctive blobs that contain area below specific threshold.
-1 -1 -1 -1 -1 -1 -1
-1 -1 -1 1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1
Figure 9. Applying Joining Pixels Process over Two Different Headers.
c. Determining distinctive Blobs
Due to the fact that not all the detected blobs could be used as unique features for the classifiers system, blobs
should be achieving the following conditions in order to be nominated to the next stages.
First condition related to the dimension Ratio
Second condition is ratio of the height
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
21
Third condition is ratio of the width
Last condition is location of the blobs should be located at top 10% part of the image
Figure 10. Applying Blob Extraction Process over Two Different Samples.
1- Conclusion
As the performance of the document classification depends upon the uniqueness of extracted features, the paper
proposed to find both referencing lines and blobs to address the problem of retrieving similarities in document
which has offered great challenge to the solution provider due to the huge amount and unsorted documents which
were collected every day. Another potential avenue is that the proposed technique able to classify structured
documents written in different language as it is mainly based on the layout of the form regardless the language
used.
References
Zhang Junyou. (2010). A Quickly Skew Correction Algorithm of Bill Image. Paper presented at Third
International Conference on Information and Computing
Daniel Lopresti & Ergina Kavallieratou. (2010) , Ruling Line Removal in Handwritten Page Images. Paper
presented at International Conference on Pattern Recognition.
Hernâni Gonçalves, José Alberto Gonçalves, and Luís Corte-Real. (2011). A Method for Automatic Image
Registration Through Histogram-Based Image Segmentation. IEEE transaction in image processing, VOL. 20,
NO. 3.
Jonathan J. H (1998), Document Image Skew Detection: Survey and Annotated Bibliography, World Scientific,
40-64.
Hull, J., (1998). Document image skew detection: Survey and annotated bibliography. Document Analysis
Systems II. World Scientific Pub. Co. Inc. pp. 40–64.
Postl, W., (1986). Detection of linear oblique structures and skew scan in digitized documents. In: Proc. 8th
Internat. Conf. on Pattern Recognition, Paris, France, 687–689.
Li, S., Shen, Q., Sun, J., (2007). Skew detection using wavelet decomposition and projection profile analysis.
Pattern Recognition Lett. 28 (5), 555–562.
Amin, A.; Fisher, S (2000). A Document Skew Detection Method Using the Hough Transform, Pattern Analysis
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.5, 2013
22
& Applications 3 (3) pp. 243-253.
Singh, C., Bhatia, N., Kaur, A., (2008). Hough transform based fast skew detection and accurate skew correction
methods. Pattern Recognition 41 (12), 3528–3546.
Avanindra, S., (1997). Robust detection of skew in document images. IEEE Trans. Image Process. 6 (2),
344–349.
Shivakumara, P., Kumar, G.,(2006). A novel boundary growing approach for accurate skew estimation of binary
document images. Pattern Recognition Lett. 27 (7), 791–801.
Muhammad Sarfraz , Zeehasham Rasheed (2008). Skew Estimation and Correction of Text using Bounding
Box” Proc. of Fifth International Conference on Computer Graphics, Imaging and Visualization.IEEE CGIV
2008 pp259-264.
Chou, C.; Chu, S.; Chang, F (2007). Estimation of skew angles for scanned documents based on piecewise
covering by parallelograms”, Pattern Recognition 40 (2) . pp. 443-455.
Y. Cao, S.H. & Wang, H. Li (2002). Automatic recognition of tables in construction tender documents. Paper
presented at Automation in Construction, 11 (5) ,573 – 584.
He, Y., Luo, L., Su, M., Shao, L., & Xiang, Z. (2011). Embedding and detecting watermarks based on embedded
positions in document layout: Google Patents.
V.N Manjunath Aradhya (2007). Skew Estimation Technique for Binary Document Images based on Thinning
and Moments, Journal of Engineering Letters, Vol 14, No.1, pp. 127-134.

Más contenido relacionado

La actualidad más candente

DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSDETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSijaia
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Recordscsandit
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff DistanceIRJET Journal
 
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework forScalable and efficient cluster based framework for
Scalable and efficient cluster based framework foreSAT Publishing House
 
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingScalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingeSAT Journals
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
Image Compression Through Combination Advantages From Existing Techniques
Image Compression Through Combination Advantages From Existing TechniquesImage Compression Through Combination Advantages From Existing Techniques
Image Compression Through Combination Advantages From Existing TechniquesCSCJournals
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Editor IJARCET
 
An effective and robust technique for the binarization of degraded document i...
An effective and robust technique for the binarization of degraded document i...An effective and robust technique for the binarization of degraded document i...
An effective and robust technique for the binarization of degraded document i...eSAT Publishing House
 
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGES
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGESGEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGES
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGEScsandit
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingPriyanka Wagh
 

La actualidad más candente (16)

Verification of Efficacy of Inside-Outside Judgement in Respect of a 3D-Primi...
Verification of Efficacy of Inside-Outside Judgement in Respect of a 3D-Primi...Verification of Efficacy of Inside-Outside Judgement in Respect of a 3D-Primi...
Verification of Efficacy of Inside-Outside Judgement in Respect of a 3D-Primi...
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSDETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
 
isprsarchives-XL-3-381-2014
isprsarchives-XL-3-381-2014isprsarchives-XL-3-381-2014
isprsarchives-XL-3-381-2014
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework forScalable and efficient cluster based framework for
Scalable and efficient cluster based framework for
 
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexingScalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for multidimensional indexing
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Image Compression Through Combination Advantages From Existing Techniques
Image Compression Through Combination Advantages From Existing TechniquesImage Compression Through Combination Advantages From Existing Techniques
Image Compression Through Combination Advantages From Existing Techniques
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
 
An effective and robust technique for the binarization of degraded document i...
An effective and robust technique for the binarization of degraded document i...An effective and robust technique for the binarization of degraded document i...
An effective and robust technique for the binarization of degraded document i...
 
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGES
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGESGEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGES
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGES
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
 

Destacado

Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...
Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...
Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...Alexander Decker
 
Cheap Serviced Apartment Providers in London, UK.
Cheap Serviced Apartment Providers in London, UK.Cheap Serviced Apartment Providers in London, UK.
Cheap Serviced Apartment Providers in London, UK.Troy Scott
 
Farmers’ assessment of the government spraying program in ghana
Farmers’ assessment of the government spraying program in ghanaFarmers’ assessment of the government spraying program in ghana
Farmers’ assessment of the government spraying program in ghanaAlexander Decker
 
First model of one stop service for drug users in drug dependent centers in s...
First model of one stop service for drug users in drug dependent centers in s...First model of one stop service for drug users in drug dependent centers in s...
First model of one stop service for drug users in drug dependent centers in s...Alexander Decker
 
Factors determining capital structure a case study of listed
Factors determining capital structure a case study of listedFactors determining capital structure a case study of listed
Factors determining capital structure a case study of listedAlexander Decker
 
Factor analysis of service quality in university libraries in sri lanka
Factor analysis of service quality in university libraries in sri lankaFactor analysis of service quality in university libraries in sri lanka
Factor analysis of service quality in university libraries in sri lankaAlexander Decker
 

Destacado (6)

Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...
Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...
Fabrication and electrical characteristic of quaternary ultrathin hf tiero th...
 
Cheap Serviced Apartment Providers in London, UK.
Cheap Serviced Apartment Providers in London, UK.Cheap Serviced Apartment Providers in London, UK.
Cheap Serviced Apartment Providers in London, UK.
 
Farmers’ assessment of the government spraying program in ghana
Farmers’ assessment of the government spraying program in ghanaFarmers’ assessment of the government spraying program in ghana
Farmers’ assessment of the government spraying program in ghana
 
First model of one stop service for drug users in drug dependent centers in s...
First model of one stop service for drug users in drug dependent centers in s...First model of one stop service for drug users in drug dependent centers in s...
First model of one stop service for drug users in drug dependent centers in s...
 
Factors determining capital structure a case study of listed
Factors determining capital structure a case study of listedFactors determining capital structure a case study of listed
Factors determining capital structure a case study of listed
 
Factor analysis of service quality in university libraries in sri lanka
Factor analysis of service quality in university libraries in sri lankaFactor analysis of service quality in university libraries in sri lanka
Factor analysis of service quality in university libraries in sri lanka
 

Similar a Finding similarities between structured documents as a crucial stage for generic structured document classifier

IRJET - Object Detection using Hausdorff Distance
IRJET -  	  Object Detection using Hausdorff DistanceIRJET -  	  Object Detection using Hausdorff Distance
IRJET - Object Detection using Hausdorff DistanceIRJET Journal
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET Journal
 
Improving AI surveillance using Edge Computing
Improving AI surveillance using Edge ComputingImproving AI surveillance using Edge Computing
Improving AI surveillance using Edge ComputingIRJET Journal
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...IRJET Journal
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET Journal
 
Feature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual AnalysisFeature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual Analysisvivatechijri
 
Web Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink StructureWeb Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink Structureaciijournal
 
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...IRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
Document Images Are Acquired By Scanning Journal
Document Images Are Acquired By Scanning JournalDocument Images Are Acquired By Scanning Journal
Document Images Are Acquired By Scanning JournalSusan White
 
A Simple Signature Recognition System
A Simple Signature Recognition System A Simple Signature Recognition System
A Simple Signature Recognition System iosrjce
 
PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...
PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...
PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...ijcsit
 
Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...ijfcstjournal
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
 
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET Journal
 
Hangul Recognition Using Support Vector Machine
Hangul Recognition Using Support Vector MachineHangul Recognition Using Support Vector Machine
Hangul Recognition Using Support Vector MachineEditor IJCATR
 

Similar a Finding similarities between structured documents as a crucial stage for generic structured document classifier (20)

IRJET - Object Detection using Hausdorff Distance
IRJET -  	  Object Detection using Hausdorff DistanceIRJET -  	  Object Detection using Hausdorff Distance
IRJET - Object Detection using Hausdorff Distance
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten Characters
 
F045053236
F045053236F045053236
F045053236
 
Improving AI surveillance using Edge Computing
Improving AI surveillance using Edge ComputingImproving AI surveillance using Edge Computing
Improving AI surveillance using Edge Computing
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
 
Feature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual AnalysisFeature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual Analysis
 
Web Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink StructureWeb Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink Structure
 
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Document Images Are Acquired By Scanning Journal
Document Images Are Acquired By Scanning JournalDocument Images Are Acquired By Scanning Journal
Document Images Are Acquired By Scanning Journal
 
K012647982
K012647982K012647982
K012647982
 
K012647982
K012647982K012647982
K012647982
 
A Simple Signature Recognition System
A Simple Signature Recognition System A Simple Signature Recognition System
A Simple Signature Recognition System
 
PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...
PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...
PARALLEL GENERATION OF IMAGE LAYERS CONSTRUCTED BY EDGE DETECTION USING MESSA...
 
Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf Manuscripts
 
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
 
Hangul Recognition Using Support Vector Machine
Hangul Recognition Using Support Vector MachineHangul Recognition Using Support Vector Machine
Hangul Recognition Using Support Vector Machine
 

Más de Alexander Decker

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Alexander Decker
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inAlexander Decker
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesAlexander Decker
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksAlexander Decker
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dAlexander Decker
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceAlexander Decker
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifhamAlexander Decker
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaAlexander Decker
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenAlexander Decker
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksAlexander Decker
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget forAlexander Decker
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabAlexander Decker
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesAlexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloudAlexander Decker
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveragedAlexander Decker
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenyaAlexander Decker
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health ofAlexander Decker
 

Más de Alexander Decker (20)

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale in
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
 

Último

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Último (20)

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

Finding similarities between structured documents as a crucial stage for generic structured document classifier

  • 1. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 10 Finding Similarities between Structured Documents as a Crucial Stage for Generic Structured Document Classifier Hamam Mokayed (Corresponding author) Universiti Teknology Mara UiTM, Faculty of Information Technology and Quantitative Sciences Universiti Teknologi MARA (UiTM) 40450 Shah Alam, Selangor Darul Ehsan, Malaysia E-mail : Homammo@gmail.com Azlinah Hj. Mohamed Universiti Teknology Mara UiTM, Faculty of Information Technology and Quantitative Sciences Universiti Teknologi MARA (UiTM) 40450 Shah Alam, Selangor Darul Ehsan, Malaysia E-mail : Azlinah@tmsk.uitm.edu.my Abstract One of the addressed problems of classifying structured documents is the definition of a similarity measure that is applicable in real situations, where query documents are allowed to differ from the database templates. Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount of forms could be written in different languages, for example here in Malaysia forms could be written in Malay, Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify the requested documents taking into consideration that OCR is considered more easier and accurate rather than the layout detection. Keywords: Feature Extraction, Document processing, Document Classification. 1. Introduction In the light of globalization and its radical economic and technological evolution, offices and banks are forced to manage a large number of documents in different formats from customers, partners, etc. which must be classified and organized quickly in order to provide a differential service. This involves staff training costs, the establishing of procedures that ensure the categorization criteria, verification process, etc. which increase costs and document processing time, without ensuring the utmost quality in the process. In this work we find a way to extract similarities between the documents based on reference lines and distinctive blobs of structured document. A dynamic tilting technique based on clustering has been proposed on the adaptive thresholded image. The tilted image is used as inputs to the two different feature extraction modules, the first one is based on extracting the reference lines (vertical, horizontal, and diagonal), and the second detects distinctive data in the structured document such as Logo and title area. The software is developed using C++ programming language and its block diagram is shown as in Fig. 1. This paper has been organized with the introduction first. The pre-processing steps are described in the next section. The proposed tilting module is next described. Feature extraction modules are discussed in the sections that follow.
  • 2. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 11 2. Pre-processing Stage The pre-processing part of the proposed system consists of three steps as is shown in stage1 in Fig.1 2.1 Grayscale Conversion This method tries to convert the scanned image to grayscale 2.2 Size Normalization The important idea underlying the stage carried out is to get normalized length images to enhance system performance by scaling each image both horizontally and vertically to get the same scaled image. W xx xx x o i i minmax min − − =
  • 3. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 12 H yy yy y o i i minmax min − − = Where ( )o i o i yx , are the original points ( )ii yx , are the points after transformation. { }o ii xx minmin = { }o ii xx maxmax = { }o ii yy minmin = { }o ii yy maxmax = W is the width of the normalized data = 500 H is the height of the normalized data = 500 Figure 2. Example of Normalization Process Applied on the Scanned Image. 2.3 Adaptive Thresholding The system thresholding technique is developed based on the ordinal structure fuzzy logic model which has an advantage due to their ability in handling multiple inputs and outputs. Based on the 3 highest accurate pre-developed thresholding techniques such as SIS, Deravi, and Rosenfeld the fuzzy logic model is developed to predict the value of the pixel after thresholding whether it is black or white. Software has been developed for such purpose and the performance of the structured document classifier is compared to the different thresholding techniques. The results based on using SOFM for thresholding show a high level of accuracy which can be used (0,0) 500 X Y 0 500
  • 4. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 13 for many applications. 3. Tilting Structured Document Tilting and skewing the scanned structured document is considered as a crucial stage as it has a great impact on the accuracy of the whole structured document classification system. Various techniques have been developed to calculate the value of proper rotation angle to adjust the tilted image. However, most of these techniques were applied as pre-stage of OCR systems and had the problem of expensive computational costs. In the proposed system, tilting will be accomplished by using the reference lines in the structured documents as a base of calculation. The objective is to find a way with lower computation time and higher accuracy results over different layouts that might contain non-text areas. Several algorithms were implemented to find the proper skew angle of tilted image [4,5], some of the applied methods can be classified as the following: • Projection-based Methods [6,7]: basic concept of these methods is to use the extracted features out of projection profile calculation to determine the skew value. • Hough transform-based methods [8,9]: methods depend on transforming the coordinates from Cartesian to polar and using the new values to calculate the skew angle. • Cross – correlation methods [10]: vertical deviation along the image is used to calculate the skew of image. • Clustering – based methods: these methods try to find a way to connect the pixels and use this connection to calculate the skew value. Shivakumara and Kumar [11] apply the nearest neighbor technique by labeling the connected black pixels and grouping them in blocks, these blocks were used to calculate the skew value. In the other hand, Sarfraz et al. [12] identify words within the same line to produce blob used in skew angle calculation. Chou et al [13] proposed a method based on drawing scan lines over the document at various angles. The skew angle is estimated by constructing parallelograms with these scan lines that cover objects in the image. 3.1 Clustering – Based Proposed Approach As structured document contain of different components such as lines, text, check boxes and circles, logos, etc., lines are the essential layout structure of the document and can be detected more reliably than other components. So, we calculate the skew angle based on the document detected lines. Proposed approach is based on looking for the connected lines and estimating the optimum skew angles of the reference line. The idea behind this approach is that the distance between the consecutive points of the line is too small than the other components. So if we are able to detect the lines and find the reference line (line which has the largest number of pixels) out of them. We then estimate the skew angle based on that line. To achieve the previous mentioned aim, the following stages should be executed: 3.2 Noise removal After binarizing the image, this stage hopes to eliminate the randomly distributed noisy pixels that might affect on the calculation of the tilting value. The method is based on segmenting the whole binarized image to 5*5 sub-images, it statistically examines the intensity values of the active pixels in the sub window by counting total number of them and comparing the value with a threshold value in order to decide whether to flip their values to background or not. The size of the window has to be large enough to cover sufficient foreground and background pixels and to avoid either keeping the noise in case of too small size or disconnecting the line because of choosing too large region. After applying the proposed method to remove the noise, the values of the first and last three columns and rows of the image will be set to zero to avoid the edge noise caused by the low scanning resolution of the image. If < Th then = 0
  • 5. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 14 Else Do Nothing Where Total Number of active pixels within 5*5 window. Th = (is* is + is ) / 3 : is=2 increment Step for a pixel to constitute the sub window 000000111000000 000000001000000 000010010010000 000100000000010 000100010000000 001000100100010 010000000011001 110001000000000 000000000000000 000010000000100 000000000000000 000010000001000 000000000000010 001100000000001 010001000000000 Figure 3. Noise removal process. 3.3 Tilting and skewing Method using clustering technique will be presented to calculate the value of the angle. before starting the proposed method, a calculation of the first black pixel Pi(xi,yi) will be done and based on the condition of finding a small value of Y axis histogram at yi (histY[yi]), the angle calculation will start, otherwise no skewing is required . Sample of image tilted to left Sample of image tilted to Right If xi < width/2 then image might be tilted to left, otherwise right Figure 3. Examples of Different Tilted Images.
  • 6. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 15 Four referencing points are used to calculate the best value of the angle as the following: Pclk: first black pixel by scanning the columns form left to right Pcrk: first black pixel by scanning the columns form right to left Prtk: first black pixel by scanning the rows from top to bottom Prbk: first black pixel by scanning the rows form bottom to top A sequence of operations should be executed over each of the previous mentioned referencing points to calculate the most accurate tilting angle. Applying the proposed method over Pclk as an example : ( )} After calculating Number of pixels in each group of the referencing points (Num []), group which has the highest number of pixels will be used to calculate the skewing angle. Let us suppose that is the nominated group
  • 7. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 16 1- Original Image 2-thresholding and removing noise 3- Reference line used for tilting 4- Tilted Image Figure 5. Result of Applying Tilting Process on Different Samples. 4. Feature Extraction Finding similarities in document has offered great challenge to the solution provider due to the huge amount and unsorted documents which were collected every day and differed in terms of outlook, layout, size, meaning etc [14]. In addition to, finding generic features that able to interpret different types of forms is not feasible using present techniques, it is possible to develop a specific method to process the specific type of forms contained in the documents. A little research had concentrated on the layout of document, as a primary key for the feature extraction to be used in classification. The most recent work applied watermark embedding method to obtain layout information of the document with concentration of documents integrity detection and offered secret information to be hidden, and it was robust to hard and soft documents [15]. As the proposed technique is language separable, reference lines and blobs could be considered as good features in order to classify the structured documents. Next section will clarify the way of extracting features and all the pre-processing stages that carried out to enhance the extraction process. 4.1 Extraction of reference lines Proposed system recognizes ruled lines by extracting vertical, horizontal, and diagonal lines from the binary image, obtained by adaptive thresholding method. Proposed line extraction algorithm uses pruning as one of the morphological operations, thinning, connecting gaps of broken and disconnected lines, and the clustering technique to extract the reference lines and use them as a one of the feature in the classification system. a. Noise removal After thresholding and skewing the image, this stage hopes to eliminate the noisy pixels and spurs on the end points in order to enhance the lines extraction process. Pruning as one of the morphological operations is applied to remove these noisy scattered pixels which are not related to any one of the objects in the document. The applied pruning uses two basic structuring elements and iterates them to produce other six structuring elements. All structuring elements are applied once over the processed image to get the required results.
  • 8. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 17 Figure 6. The result of Applying Pruning Structuring Element ver Structured Document. b. Thinning Thinning is usually used in applications that aiming to extract lines for different purposes such as OCR to improve the recognition accuracy [16]. Thinning is applied to reduce thickness of each line to just a single pixel line in order to facilitate the extraction process of the reference lines. The applied thinning method uses two basic structuring elements and iterates them till the convergence. Iterating the elements will produce eight structuring elements used for skeletonizing the whole image. As shown in the below figure, the image is first thinned by two basic structuring elements, and then with the remaining six 90° rotations of the two elements. The process is repeated till there is no change by applying anyone of the structuring element. 0 0 0 0 0 0 0 0 0 -1 0 0 -1 -1 0 0 -1 -1 0 1 0 0 1 0 -1 1 0 -1 1 0 0 1 0 0 1 0 0 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 0 0 0 0 0 1st structured Element 2nd structured Element 1st clockwise(1) 2nd clockwise(1) 1st clockwise(2) 2nd clockwise(2) 0 0 -1 0 0 0 0 1 -1 0 1 -1 0 0 0 0 0 -1 1st clockwise(3) 2nd clockwise(3)
  • 9. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 18 Figure 7. The Result of Applying Thinning Structuring Element over Structured Document. c. Joining broken lines before starting the proposed method to detect the lines, values of the histograms (histY , hsitX) will be used as inputs to decide whether to process joining for horizontal and vertical lines or not. After specifying the target column or row, the decision of connecting tow groups of sequential active points separated by spaces will be depended on the number of active pixel in the first part, number of non active points (Numspaces), and number of active points (Num2active) in the second part. As shown in the following formula 0 0 0 -1 0 0 1 -1 0 -1 1 -1 1 1 1 -1 1 -1 -1 1 -1 1 1 0 1 1 0 1 1 0 -1 1 -1 0 1 1 1 1 1 -1 1 -1 1 -1 0 -1 0 0 0 0 0 0 0 -1 1st structured Element 2nd structured Element 1st clockwise(1) 2nd clockwise(1) 1st clockwise(2) 2nd clockwise(2) 0 -1 1 0 0 -1 0 1 1 0 1 1 0 -1 1 -1 1 -1 1st clockwise(3) 2nd clockwise(3)
  • 10. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 19 d. Line Segments Detecting and Grouping Proposed method will be used to extract the vertical, horizontal, and diagonal lines out of the previous processed binary image. First of all, pruning operation is applied to find the end points of the lines and use them as starting points of the line tracing process. The applied pruning method is exactly like the once implemented before but it will be used here to mark the pixels which are detected by the process not to eliminate them. After specifying the pixels, the following steps will be executed • Get one of the pruning points • Mark the point in order not to be used anymore, save it as one of the line’s pixel, increase the number of line pixels, and compare the coordinates to get the value of Xmin , Ymin, Xmax, Ymax of that line. • Scan all the eight neighboring pixels and count how many pixels are active and not marked. • In case the value retrieving back from stage 3 is o 1: repeat steps starting from 2 o >1 obtain the coordinates of each active pixel, push them into a branch vector, and jump to step 5 o 0: jump to step 5 • Analyze the detected line and decide whether it is one of the reference line or not by applying the following steps: Xdiff = Xmax - Xmin Ydiff = Ymax - Ymin Rval = (Ydiff / Xdiff ) in case Ydiff < Xdiff , otherwise Rval = (Xdiff / Ydiff ) o If (number of line’s pixel > 30 and Rval > 0.2 and (Ydiff < 25 || Xdiff < 25) ) then the line isn’t one of the reference line and no need to save it; otherwise save the line o If the branch vector has any values get one of the value and jump to 2 • Repeat the steps starting from one. Figure 8. Applying Line Extraction Process over Two Different Samples. 4.2 Extraction of blobs This stage tries to detect distinctive data in the structured document such as Logo and title area, the places of required information can be notices as connected regions in binary image. The aim of this stage is to enhance the accuracy by adding the useful information of the detected places as inputs to the classifier beside the information
  • 11. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 20 extracted from the reference lines in the previous stage. Proposed blob extraction algorithm uses dilation as one of the morphological operations, joining pixel, and proposed technique to extract the blobs and use them as a one of the feature in the classification system. a. Dilation Dilation is usually applied to merge nearby regions by expanding all the white parts (active pixels) and facilitating the extraction process of the blobs. The applied method uses one structuring element (3*7) as shown in the figure below. The process is flipped all the neighboring pixels of the target pixel to one in case the pixel is active, otherwise nothing will change. b. Joining Pixels Before starting the proposed method to detect the accurate blobs, this step joins the connected target pixel into different blobs and removes non distinctive blobs that contain area below specific threshold. -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Figure 9. Applying Joining Pixels Process over Two Different Headers. c. Determining distinctive Blobs Due to the fact that not all the detected blobs could be used as unique features for the classifiers system, blobs should be achieving the following conditions in order to be nominated to the next stages. First condition related to the dimension Ratio Second condition is ratio of the height
  • 12. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 21 Third condition is ratio of the width Last condition is location of the blobs should be located at top 10% part of the image Figure 10. Applying Blob Extraction Process over Two Different Samples. 1- Conclusion As the performance of the document classification depends upon the uniqueness of extracted features, the paper proposed to find both referencing lines and blobs to address the problem of retrieving similarities in document which has offered great challenge to the solution provider due to the huge amount and unsorted documents which were collected every day. Another potential avenue is that the proposed technique able to classify structured documents written in different language as it is mainly based on the layout of the form regardless the language used. References Zhang Junyou. (2010). A Quickly Skew Correction Algorithm of Bill Image. Paper presented at Third International Conference on Information and Computing Daniel Lopresti & Ergina Kavallieratou. (2010) , Ruling Line Removal in Handwritten Page Images. Paper presented at International Conference on Pattern Recognition. Hernâni Gonçalves, José Alberto Gonçalves, and Luís Corte-Real. (2011). A Method for Automatic Image Registration Through Histogram-Based Image Segmentation. IEEE transaction in image processing, VOL. 20, NO. 3. Jonathan J. H (1998), Document Image Skew Detection: Survey and Annotated Bibliography, World Scientific, 40-64. Hull, J., (1998). Document image skew detection: Survey and annotated bibliography. Document Analysis Systems II. World Scientific Pub. Co. Inc. pp. 40–64. Postl, W., (1986). Detection of linear oblique structures and skew scan in digitized documents. In: Proc. 8th Internat. Conf. on Pattern Recognition, Paris, France, 687–689. Li, S., Shen, Q., Sun, J., (2007). Skew detection using wavelet decomposition and projection profile analysis. Pattern Recognition Lett. 28 (5), 555–562. Amin, A.; Fisher, S (2000). A Document Skew Detection Method Using the Hough Transform, Pattern Analysis
  • 13. Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.4, No.5, 2013 22 & Applications 3 (3) pp. 243-253. Singh, C., Bhatia, N., Kaur, A., (2008). Hough transform based fast skew detection and accurate skew correction methods. Pattern Recognition 41 (12), 3528–3546. Avanindra, S., (1997). Robust detection of skew in document images. IEEE Trans. Image Process. 6 (2), 344–349. Shivakumara, P., Kumar, G.,(2006). A novel boundary growing approach for accurate skew estimation of binary document images. Pattern Recognition Lett. 27 (7), 791–801. Muhammad Sarfraz , Zeehasham Rasheed (2008). Skew Estimation and Correction of Text using Bounding Box” Proc. of Fifth International Conference on Computer Graphics, Imaging and Visualization.IEEE CGIV 2008 pp259-264. Chou, C.; Chu, S.; Chang, F (2007). Estimation of skew angles for scanned documents based on piecewise covering by parallelograms”, Pattern Recognition 40 (2) . pp. 443-455. Y. Cao, S.H. & Wang, H. Li (2002). Automatic recognition of tables in construction tender documents. Paper presented at Automation in Construction, 11 (5) ,573 – 584. He, Y., Luo, L., Su, M., Shao, L., & Xiang, Z. (2011). Embedding and detecting watermarks based on embedded positions in document layout: Google Patents. V.N Manjunath Aradhya (2007). Skew Estimation Technique for Binary Document Images based on Thinning and Moments, Journal of Engineering Letters, Vol 14, No.1, pp. 127-134.