SlideShare una empresa de Scribd logo
1 de 74
Descargar para leer sin conexión
Image Processing with Character Recognition
using Matlab
Ciaran Cooney
This thesis is submitted to Dundalk Institute of Technology in partial
fulfilment of the requirements for the degree of
B.Eng. (Hons) in Sustainable Design
School of Engineering
Dundalk Institute of Technology
Supervisor: Tim Daly, Paul Egan, Tommy Gartland, Alan
Kennedy
2016
i
Abstract
Text detection and character recognition in natural scene images is a challenging and
complex operation due to the potential for varying degrees of quality expected from
the input data. Therefore development of a robust and adaptable algorithm requires
several stages of pre-processing to identify regions of interest before character
recognition can be applied. This paper presents a methodology for implementation of
a character recognition algorithm based on identification of the alphanumeric digits
on vehicle registration plates.
The text detection algorithm has been integrated within a system requiring
initial image acquisition and a visual indication of results. The reason for this
development is to promote the use of the technique in a commercial application. A
wireless network and graphical user interface are incorporated to supplement the
primary utility of the system i.e. image processing and character recognition.
Results demonstrate the strengths and weaknesses of the techniques employed.
The quality of the input image, ambient conditions and various parameters within the
algorithm itself are found to impact the Optical Character Recognition (OCR) engines
ability to accurately detect text.
ii
Acknowledgments
iii
Declaration
I, the undersigned declare that this thesis entitled:
Image Processing with Character Recognition using Matlab
is entirely the author’s own work and has not been taken from the work of others,
except as cited and acknowledged within the text.
The thesis has been prepared according to the regulations of Dundalk Institute of
Technology and has not been submitted in whole or in part for an award in this or any
other institution.
Author Name: Ciaran Cooney
Author Signature:
Date:
iv
List of Abbreviations and Symbols
RoI Region of Interest
OCR Optical Character Recognition
MSER Maximally Stable Extremal Regions
PCB Printed Circuit Board
CPU Central Processing Unit
GPIO General Purpose Input/Output
LED Light Emitting Diode
v
Table of Contents
Abstract......................................................................................................................... i
Acknowledgments........................................................................................................ ii
Declaration..................................................................................................................iii
List of Abbreviations and Symbols............................................................................. iv
Table of Contents......................................................................................................... v
List of Figures............................................................................................................ vii
List of Tables .............................................................................................................. ix
1 Introduction.......................................................................................................... 1
1.1 Introduction.................................................................................................. 1
2 Literature Review................................................................................................. 3
2.1 Introduction.................................................................................................. 3
2.2 Technique..................................................................................................... 4
2.3 Optical Character Recognition..................................................................... 6
2.4 Software....................................................................................................... 7
3 Theory................................................................................................................ 11
3.8 Common Issues with Text Detection......................................................... 17
4 Methodology...................................................................................................... 19
4.2 System Design ........................................................................................... 19
4.3 Hardware Specification – Raspberry Pi 2 Model B................................... 21
4.5 PCB Design and Manufacture ................................................................... 24
4.7 MSER Regions .......................................................................................... 27
4.8 Regionprops............................................................................................... 28
4.9 Stroke-Width Variation.............................................................................. 29
4.11 OCR Function............................................................................................ 31
4.12 String Comparison..................................................................................... 31
5 Experimental Testing......................................................................................... 35
vi
6 Results and Discussion ...................................................................................... 37
6.1 Introduction................................................................................................ 37
6.2 Basic Detection.......................................................................................... 38
6.3 Complex Detection .................................................................................... 42
6.6 Further Work.............................................................................................. 56
7 Conclusions........................................................................................................ 57
Appendix A................................................................................................................ 64
vii
List of Figures
Figure 1 System Flowchart.......................................................................................... 2
Figure 2 System Design Flowchart............................................................................ 20
Figure 3 Raspberry Pi Pin Layout.............................................................................. 22
Figure 4 PCB Design ................................................................................................. 25
Figure 5 Software Design Flowchart ......................................................................... 26
Figure 6 Polling a Switch........................................................................................... 27
Figure 7 Malab MSER command .............................................................................. 27
Figure 8 MSER Example Result................................................................................ 28
Figure 9 Geometric Properties Thresholds ................................................................ 28
Figure 10 Stoke-Width Thresholding ........................................................................ 29
Figure 11 Bounding Boxes ........................................................................................ 30
Figure 12 Merging of Bounding Boxes ..................................................................... 30
Figure 13 Merged Bounding Boxes........................................................................... 31
Figure 14 OCR function code.................................................................................... 31
Figure 15 Cell Arrays................................................................................................. 32
Figure 16 'if' statement in Matlab .............................................................................. 33
Figure 17 Graphical User Interface............................................................................ 34
Figure 18 Complete System Hardware...................................................................... 36
Figure 19 Basic Detection - Input Image................................................................... 38
Figure 20 Basic Detection - MSER regions............................................................... 39
Figure 21 Basic Detection - Geometric Properties method ....................................... 39
Figure 22 Basic Detection - Stoke-width thresholding.............................................. 39
Figure 23 Basic Detection - Bounding Box comparison ........................................... 40
Figure 24 Basic Detection - Bounding Box Comparison (1)..................................... 41
Figure 25 OCR result (1) ........................................................................................... 41
Figure 26 OCR result (2) ........................................................................................... 41
viii
Figure 27 Alfa Romeo Input Image........................................................................... 42
Figure 28 Processing results - First Iteration............................................................. 43
Figure 29 Processing Results - Second Iteration ....................................................... 46
Figure 30 Processing Results - Third Iteration .......................................................... 47
Figure 31 Processing Results - Fourth Iteration ........................................................ 48
Figure 32 Complete Test (Basic) - Input ................................................................... 49
Figure 33 Complete Test (Basic) - MSER regions .................................................... 50
Figure 34 Complete Test (Basic) - Bounding Boxes................................................. 51
Figure 35 Complete Test (Basic) - Text Region........................................................ 51
Figure 36 Complete Test (Basic) - Result.................................................................. 52
Figure 37 Complete Test (Complex) - Input.............................................................. 52
Figure 38 Complete Test (Complex) - MSER regions .............................................. 53
Figure 39 Complete Test (Complex) - Post-Geometric Properties............................ 53
Figure 40 Complete Test (Complex) - Post-Stroke-width thresholding.................... 54
Figure 41 Complete Test (Complex) - Bounding Boxes ........................................... 54
Figure 42 Complete Test (Complex) - Text Region.................................................. 55
Figure 43 Complete Test (Complex) - Result............................................................ 55
Figure 44 Schematic Diagram ................................................................................... 64
Figure 45 Breadboard Construction........................................................................... 64
ix
List of Tables
Table 1 Parameter Values - First Iteration................................................................. 43
Table 2 Parameter Values - Second Iteration ............................................................ 45
Table 3 Parameter Values - Third Iteration ............................................................... 46
Table 4 Parameter Values - Fourth Iteration.............................................................. 48
1
1 Introduction
1.1 Introduction
Image processing in general and object recognition in particular is becoming an
increasingly important facet in modern electronics and communications. Some of the
more prevalent applications include medical imaging using fMRI (Steele et al., 2016),
process automation in industrial settings (Choi, Yun, Koo, & Kim, 2012) and text
detection in natural scene images (Zhao, Fang, Lin, & Wu, 2015) (Liu, Su, Yi, & Hu,
2016). The techniques deployed across these applications are wide-ranging and
diverse due to the different requirements of each. With such a vast array of criteria for
investigation it is necessary to define a specific area of interest.
Text Detection, or Character Recognition, is a field of study with an extensive
literature behind it and a burgeoning market for applications. Typical applications
where character recognition is especially important include scanning of text
documents, reading license plate numbers and language translation of text images.
Just as there are many applications for text detection, there are many techniques and
methodologies for implementation of a detection algorithm. Edge-detection,
thresholding and Hough transforms are three of the most common methods employed.
In fact, Otsu’s Method (Otsu, 1979) is a thresholding technique often implemented
within commercial Optical Character Recognition (OCR) algorithms.
License plate recognition is a standard paradigm for investigation and
experimentation of character recognition techniques and is the frame in which this
project has been carried out. A variety of methods have been implemented in license
plate detection such as Harris Corner and Character Segmentation (Panchal, Patel, &
Panchal, 2016), the use of SIFT descriptors (Yu Wang, Ban, Chen, Hu, & Yang,
2015) and probabilistic neural networks (Öztürk & Özen, 2012).
Much of the preliminary work undertaken has been focused on obtaining a
deeper understanding of the various techniques involved in text detection processes,
particularly those related to natural-scene images. Although the theory is extremely
important, practical usage must also be considered. With this, hardware and software
platforms are investigated in the literature review for this project to ascertain their
relative compatibility with image processing applications.
2
To test the efficacy of the investigation into the various detection and
recognition methods a practical implementation of these techniques is developed. In
most cases character recognition systems will consist of several component parts
including acquisition, pre-processing and recognition. The system proposed here
incorporates each of these elements within a wireless network which will provide an
automated response to positive character detection and an equivalent alert to failed or
negative detection.
The system is framed as a method for detecting the characters of a vehicle
registration plate and permitting or denying entry based on comparison of the detected
text and a pre-existing vehicle-registration database. However there is inherent
flexibility in the model and it may be adapted to service other applications. Figure 1 is
a flowchart depicting a high-level description of the required functionality of the
system.
Figure 1 System Flowchart
The methodology is based upon use of a central microcontroller which will acquire an
image when triggered. The acquired image is then transmitted wirelessly to a laptop
or PC on which the filtering and pre-processing of the image will take place. Post-
processing, the image is applied to a commercial OCR algorithm which will output a
digital representation of the vehicle registration number obtained. Finally, comparison
of the number obtained with a database of expected numbers is carried out to
determine the action of the automated response.
Image Acquisition
Image Transmission
Pre-Processing
OCR
Results Comparison
Automated Response
3
All the relevant theory, methodology and results relating to implementation of the
system described are contained within the main body of this document.
2 Literature Review
2.1 Introduction
Image processing and text recognition are increasingly important areas for research
and development in the modern world. Sectors in which image processing techniques
provide the basis for critical applications include medical, communications and
security. In the medical industry image processing techniques, such as improving the
quality of fMRI scans, have been employed in diagnostics (Misaki et al., 2015), with
some modern applications facilitating automated diagnosis of certain conditions.
Text recognition is an area with increasing relevance and the technology in this
area is keeping pace with this need. One of the most impressive applications present
in the literature is the use of text recognition technology in the development of a text-
to-speech synthesis system(Rebai & BenAyed, 2015).
Not only are the potential applications for image processing widespread but the
techniques used to extract the information are equally diverse. Methods deployed are
of course dependent on the desired outcome and there is no shortage of techniques
that can be tailored towards a specific target. Image processing is not unlike other
types of data processing in that the particular process is chosen based on the exact
requirements of the intended application.
With the project for which this literature review has been compiled being
primarily concerned with character recognition in a static image, much of this report
has been written with reference to this area (Zhao et al., 2015; Zhu, Wang, & Dong,
2015).
The expected outcome of this paper is to review, understand and analyse the
present literature on image processing techniques, the platforms used to implement
these techniques and the applications which most commonly employ image
processing as a means of achieving a desired outcome. Section 2 of the report gives an
overview of the techniques employed in the processing of images, usually to extract a
specific piece of information. Section 3 will discuss the operation of Optical
Character Recognition (OCR), which is an adaptable algorithm designed to recognise
4
specific features contained within an image i.e. text. The fourth and fifth sections of
the report will feature an assessment of the hardware and software platforms which
could be used to implement the specific techniques associated with image processing.
The report will conclude with a concise summary of the key findings from the
literature review. An outline will be included providing some of the relevant
information which will inform the future progress of this project.
2.2 Technique
There are numerous techniques documented and discussed in the literature available
on image processing. Among those most prominently featured are segmentation,
edge-detection and thresholding. Of course, the technique(s) employed by researchers
or professionals are largely dependent upon the requirements of a given application,
although not exclusively so. In some cases the limitations of software or hardware
may be the deciding factor in choices regarding technique.
Edge-Detection is one of the most common approaches to segmentation with its
method of detecting meaningful discontinuity in intensity values(Rafael C. Gonzalez,
Woods, & Eddins). The method makes use of derivatives and generally computed
using a LaPlacian filter. In their 1997 paper, (Smith & Brady, 1997) document an
approach to low level image processing, labelled the SUSAN principle which was
basically developed on existing edge-detection and corner protection techniques.
Another method with considerable presence within the literature is the use of
Moment Invariants. Moments are used to analyse and characterize the patterns
contained within image and are thus useful in character recognition. For instance,
Zernike moment invariants have been shown to be extremely effective in pattern
recognition applications(Belkasim, Shridhar, & Ahmadi, 1991).
Alongside Edge-Detection, Thresholding is one of the most commonly used
techniques used in image processing, specifically segmentation. The reason for this
prevalence seems to be its simplicity of implementation as well as the intuitive
properties it exibits(Rafael C. Gonzalez et al.). Thresholding is used for all sorts of
applications that require the extraction of information from a given image. One such
application is the detection of glioblastoma multiforme tumors from brain magnetic
resonance images(Banerjee, Mitra, & Uma Shankar, 2016). Global thresholding is
shown in this case to estimate the statistical parameters of the “object” and
5
“background” of an image. The literature in this area certainly supports the view that
thresholding is among the primary techniques used in image processing.
As well as the most common image processing techniques in the literature exist
some that are more specialized. One such technique is Nonnegative Matrix
Factorisation (NMF). Problems can occur with this method and several algorithms
have been proposed to solve these(Hu, Guo, & Ma, 2015). Although NMF is
purported to be an effective tool for large scale data processing it is not one that is
likely to be pursued for the requirements of this project.
Another less prominent but interesting method sometimes used for image
processing is Fuzzy Logic (Amza & Cicic, 2015). Among its current uses are in
automated quality control image processing systems. It works by extracting
geometrical characteristics of an object and then using this information with a fuzzy
pre-filtering unit to estimate the probability of a foreign body being present on the
object being analyzed. Although the use of this fuzzy logic is extremely successful in
these types of applications it does not appear to be the logical approach to a text
recognition application.
Before the more technical aspects of the image processing algorithm are
activated, it may be necessary to implement some of the more basic image processing
techniques to prepare an image for this. These basic adjustments may come in the
form of an image resizing, rotation or cropping, depending on the particular
characteristics of the image and the data to be extracted. In an article on low-quality
underwater images (Abdul Ghani & Mat Isa, 2015), the authors reference Eustace et
al. by adapting a contrast-limited adaptive histogram specification (CLAHS) as a pre-
processing step.
In most cases, the literature presents a combination of techniques that have been
chosen because of a particular capability to carry out a specific function or as a means
of experimentation in order to improve existing techniques. With regards to any
nascent image processing project or assignment, it is quite clear that a pragmatic
approach should be taken from the outset so that a suitable technique(s) can be
chosen.
6
2.3 Optical Character Recognition
One of the more dominant themes present in the literature surrounding image
processing techniques is that of Optical Character Recognition (OCR). OCR appears
as the final processing step in many of the papers research on image extraction and
recognition. There is clearly a wide range of applications and extraction methods that
OCR can be used in conjunction with. Among some of the potential applications for
the use of OCR are keyword searches and document characterization in printed
documents(M. R. Gupta, Jacobson, & Garcia, 2007).
A summary of the theories underpinning the OCR function is provided in Optical
Character Recognition-Theory and Practice(Nagy, 1982). Among the topics
discussed in this book is the classical decision-theoretic formulation of the character
recognition problem. Statistical approximations, including dimensionality reduction,
feature extraction and feature detection are discussed with regard to the appropriate
statistical techniques.
Commercially available OCR algorithms are primarily designed to interpret
binary (black and white) images. However, more and more pre-processing techniques
are being developed as a means of preparing images for use with this function. An
example of this is the denoising and binarizing of historical documents as a pre-
processing step(M. R. Gupta et al., 2007). Many researchers have pursued methods
based on development of a new or unique method of extraction that can be used along
with existing OCR functions (Roy et al., 2015).
One of the limitations associated with OCR-based applications is that they may
not work well when properties of the captured character images are significantly
different from those in the training data set. A supervised adaptation strategy is one
that has been developed as a potential solution to this problem(Du & Huo, 2013).
Nagy et al. also demonstrated that a character classifier trained on many typefaces can
be adapted effectively to text in a single unknown typeface by using a self-adaptation
strategy.
A further problem which can sometimes be faced when using an OCR algorithm
for text recognition is the assumption that individual characters can be isolated
(Fernández-Caballero, López, & Castillo, 2012). Some traditional methods of OCR
implementation have less than ideal recognition performance because of the difficulty
in achieving clear binary character images.
7
The literature clearly indicates that OCR is a vital function in relation to image
processing and text recognition. However, due to some of the limitations stated above,
it is important that any image be properly processed and segmented before being put
through an OCR algorithm.
2.4 Software
The extensive literature on image processing and text recognition techniques
incorporates the use of several types of software for implementation. Whether it is due
to personal preference or application specific criteria, it appears that there are a large
number of platforms available for consideration when undertaking an image
processing project.
Software which has been developed with the specific intention of being used for
image processing applications are available, often initiated from academic research. A
classic example of this is ImageJ, software written in Java and designed to run on any
operating system. ImageJ supports various functions and capabilities. For instance, it
is able to acquire images directly scanners, cameras or video sources. The program
also supports all common image manipulations including reading and writing of
image files and operations on individual pixels (Abràmoff et al., 2004).
The use of Labview as a tool for image acquisition and processing is an interesting
proposition and does have some presence in the literature. A program named Image-
Sensor Software (ISS) is one that is based on the Labview programming
language(Jurjo, Magluta, Roitman, & Batista Gonçalves, 2015). Use of this type of
software enables image acquisition tools such as zoom, focus and capture. The
features required by the overall image recognition system must be defined by the user
when programming.
Matlab is a powerful piece of software with many uses in modelling,
experimentation and signal analysis. Its connectivity with many advanced
programming languages (like C, Java, VB) and availability of a wide range of
toolboxes make it popular among the scientific and research community(R. Gupta,
Bera, & Mitra, 2010). It possesses an extensive array of tools which can be harnessed
in the interests of image recognition. The use of the segmentation method id
particularly powerful within Matlab. It’s use has been demonstrated by tracing yarn to
accurately compute useful parameters of fibre migration by statistically calculating
8
mean yarn axis and tracing out mean fibre axis(Khandual, Luximon, Rout, Grover, &
Kandi, 2015).
By employing Matlab as the means of processing an image for some form of
character recognition, the user has the ability to tailor code to develop algorithms with
specific image properties in mind. This may involve text or shape recognition, simple
colour recognition or perhaps properties contained within the image such as depth
perception.
Matlab has the additional advantage of being compatible for use in connection with
some form of hardware acquisition unit that may be implemented as part of an
embedded system. It’s use in this context has been proven successfully(R. Gupta et
al., 2010), as a method for controlling image acquisition as well as image processing.
There are some specialised software packages that have been designed to facilitate
a specific function. A prime example of one of these is Xmipp, software developed
primarily as a means of image processing in electron microscopy(de la Rosa-Trevín et
al., 2013). Graphical tools incorporated within this software include data visualisation
and particle picking which can allow visual selection of some of the key parameters of
an image. It can be seen from reviewing the literature that image processing software
is both prevalent and sophisticated. At times it can appear overwhelming from the
sheer density of techniques available, however this does suggest that the type of
application being pursued in this project is very much achievable.
Although not always used exclusively, Matlab is very often used as a sun-section
in an overall processing technique. This seems to be due to the vast array of different
commands available within its image processing toolboxes. Images can be treated
using commands such as “fspecist” and “imfilter” in Matlab (HashemiSejzei &
Jamzad), before being processed elsewhere for different reasons. This is certainly a
consideration for the progress of the project being considered here, particularly in the
earlier stages of development when the use of some of these Matlab commands could
prove to be extremely informative.
2.5 Hardware
As with software, hardware is an important factor that must be given careful
consideration when entering into an image processing project. The relative strengths
and weaknesses of a specific hardware platform must be carefully gauged with
9
reference to the processing requirements. Not only this, but compatibility with a
chosen piece of software must be given due consideration. The presence of discussion
and critique of specific hardware units is not as strong as in software. This is primarily
due to the fact that most of the experimental work in this area is focused on the
various image processing algorithms, which are generally cross-platform.
The presence of embedded systems as a means of computing image processing is
fairly extensive in the literature. An ARM processor in conjunction with Matlab and a
Linux based operating system has been used to automatically identify cracks in a wall
(Pereira & Pereira, 2015).
Some applications may require the use of high-speed image processing systems.
Due to demands that may include increasing the speed of a transform process of
decreasing overall processing time, it may be necessary to design a specific
architecture to support the function. This is often the case with complex algorithms
which can be implemented using an FPGA for prototyping and verification (Mondal,
Biswal, & Banerjee).
As commented upon at the beginning of this section, there is a comparative lack
of hardware-related literature. The obvious conclusion to draw from this fact is that
the choice of hardware is secondary to the choices of technique, algorithm and
software. However one of the key hardware considerations is the processing
capability of any PC or laptop being used. A powerful CPU and specifically the
inclusion of a Graphics Processing Unit (GPU) can dramatically improve the
performance of any image processing application (Cugola & Margara, 2012).
2.6 Conclusions
There are several component factors to be investigated when considering a project
related to image processing. The relative importance of each of these factors is
reflected in their presence in the literature. Certainly the techniques or algorithms to
be implemented are critical factors which will determine the success or failure of a
given project. As has been documented previously in this report, there are many
potential techniques that can be useful in a variety of applications. This being the
case, it is always an important first step to define the functionality of an application’s
before determining the correct method for achieving this aim.
10
With one of the potential objects of a project being text recognition from a scene
image, the use of segmentation and particularly thresholding techniques are very
likely to be required in some form. As well as these processing techniques, Optical
Character Recognition (OCR) in one form or other is almost ubiquitous across text
recognition applications. As there are many commercially available OCR engines, the
decision of which to use is almost entirely intertwined with the choice of software
platform. Matlab for example as an OCR algorithm associated with its own image
processing toolboxes.
With regards to software selection for image processing functions, it appears as if
this may come down to a personal preference for a particular interface in many cases.
However, an analytical approach should be taken to ensure that the chosen software
has the desired capabilities. A secondary, or perhaps even primary, factor worth
consideration is the relative expense of some of the software available for image
processing tasks. As noted in this literature review, there are free image processing
programs currently available and extensively developed, although it is possible that
they may come with certain compatibility issues. At the opposite end of the spectrum
software such as Matlab may only include its best image processing software at
additional expense, separate from the main program license.
One of the key decisions to be made is in the choice between the possible
implementation of an embedded system or developing the process on a PC or laptop.
Depending on the overall functionality of a system, it may be more desirable to have
an embedded image processing algorithm that acts as a device for detecting very
specific types of data. Alternatively, the use of a PC or laptop in this area allows for
continuing flexibility in the processing techniques even after completion of the final
design. As with every aspect related to this topic, decisions must be primarily based
upon the end-requirements of the application.
Overall impressions of the available literature on image processing techniques are
that the research and experimentation in this area is both extensive and expanding. It
is a field that is extremely relevant in the technology and communications sector
today and the work being undertaken reflects this status. Of course this means that its
pace of development is exceptionally fast but it also means that the potential
applications for its use will continue to grow.
11
3 Theory
3.1 Introduction
Text detection has of course been heavily researched with multiple methods being
suggested for application (cite). There are some differences in the literature as to how
these methods are categorised. (Zhang, Zhao, Song, & Guo, 2013) for example,
categorise these techniques into four groups: edge-based, texture-based, connected-
component (CC)-based and others. However (Chen et al., 2011) have categorised
these techniques into two primary classes: texture-based and CC-based.
Maximally Stable Extremal Regions is the technique being employed in this
case. The use of an MSER approach to text detection is advocated for several reasons.
Among these are the observations that text regions tend to have quite high colour-
contrasts with their backgrounds and they also typically consist of homogenous colour
formations (Liu et al., 2016).
The following sections introduce the theory underpinning the methodology
being implemented for this image processing algorithm in various stages. Each of the
key components of the algorithm are discussed indivually and their anticipated effects
on a given input image stated. The theory in this section is laced with referances to
Matlab and the methods available on this software for applying these techniques. The
section begins with a note on the image formats typically used in this type of
application. In many instances the image format itself is not a critical factor in image
processing but it is nevertheless worthy of consideration.
3.2 Image Formats
There are certain specifications that an input image must meet for use with the Matlab
OCR function. The image classification, i.e. ‘.png’, ‘.jpeg’, ‘.tiff’ etc. is not a critical
factor in this implementation but it must be a real, non-sparse value (Mathworks.com,
2016a). This simply means that the image matrix must not be populated entirely by
zeroes.
The OCR function accepts any of the following three input image types:
M-by-N-by-3 truecolour – A true colour image is a 24-bit image (8-bits for each
colour Red, Blue and Green (RGB)) such as a JPEG, capable of displaying millions of
12
colours (224
) (Robbins, 2007). The quantity of possible colours is due to the fact each
byte is able to represent 256 different shades.
M-by-N 2D grayscale – This is an image in which all colours are a different shade of
grey. One of the virtues of this format is that less information is required for each
pixel. They are stored in an 8-bit integer allowing for 256 (28
) shades of grey from
white to black (Fisher, Perkins, Walker, & Wolfart, 2003b). Grayscale is a common
format in image processing.
M-by-N binary – In binary images pixels have only two possible intensity values.
These values are typically displayed as black and white, with 0 used for black and 1
or 255 used for white (Fisher, Perkins, Walker, & Wolfart, 2003a). The binary format
is often used to distinguish between text and background in pattern recognition
algorithms.
As stated above, the class of image is not a defining factor in the success or
failure of the recognition algorithm. Due to this two image types have been
implemented throughout the testing and experimentation process: PNG and JPEG.
PNG is a relatively new image format and uses 24-bit true colour
(Willamette.edu, 2016). Although the files can be considerably larger than the JPEG
format this is not a major concern in this instance as all image files are to be deleted
immediately after use.
JPEG is said to be a ‘lossy format’ (Willamette.edu, 2016) as it has the potential
for some data loss associated. These losses result in slight degradation of the image
but have minimal impact on the visual perception of the image. JPEG is not limited in
colour and is a popular format for images containing natural scenes and vibrant
colours. However the vibrancy of the colour image is not a primary factor for
consideration in this case.
3.3 Maximally Stable Extremal Regions
The first detection method employed in the text recognition algorithm is known as
Maximally Stable Extremal Regions (MSER). MSER is a technique used extensively
in many image processing applications from text recognition (Chen et al., 2011) to
visual tracking (Gao et al.). One of the basic principles of an MSER approach has
13
been defined as “blob detection” (Matas, Chum, Urban, & Pajdla, 2004), meaning that
the MSER command in Matlab will return relevant information pertaining to MSER
features in a given input image.
Due to the fact that an input image will present significant variation in
granulation, resolution and grey-scale levels, amongst other features, the roughness or
smoothness of the edges within that image can vary also (Moreno-Díaz, Pichler, &
Quesada-Arencibia, 2012). For this reason the blob detection is applied with an
MSER algorithm for detecting sections of significant intensity within an image. The
Extremal region associated with the MSER acronym is an area within an image with
connected components which maintain intensity levels below a threshold.
Through this technique areas of interest can be filtered to allow an OCR
algorithm to attempt character recognition.
3.4 Removal Based on Geometric Properties
MSER algorithms in general, particularly the Matlab one in use on this project, are
quite good at detecting most of the text regions within an image. However it is not
immune to the possibility of detecting other non-text stable regions present within an
image. Matlab facilitates a rule-based methodology for removal of these non-text
regions (Mathworks.com, 2015).
The principle behind this method is the removal of unwanted regions based on
a series of geometric properties that are ideal for distinguishing between text and non-
text areas of an image. The regionprops command is used to measure properties of an
image region. Several properties can be selected for measurement and their statistics
returned; ‘Orientation’ and ‘Area’ for example.
Thresholds are required to be set for each of the properties selected for
measurement. This may be considered one of the more dynamic sections of the
algorithm as these threshold values can be tuned to perform better with different
images. An mserRegions command can then be applied to an index array with each
of the geometric properties selected so that certain regions of the image can be
removed. This is effectively working as a filter, eliminating those “blobs” within the
image that do not conform to certain characteristics of the image text.
14
3.5 Stoke-Width Thresholding
In an effort to obtain more consistent results a stroke width transform of the MSER
regions is generated and applied to perform filtering and pairing of the connected
components (Chen et al., 2011). The stroke width is computed with the bwdist
command which calculates the Euclidean distance transform of a binary image.
(Epshtein, Ofek, & Wexler, 2010) designed a method of stroke-width transformation
based on the premise that text characters could be detected from the regions where
stable stroke widths occurred.
The reason for including this approach within a character detection algorithm
is that it can be effectively implemented as a means of reducing background noise.
This is because regions contained within the image are grouped into blocks, having
been further verified as containing properties relating to likely text characters (Yi &
Tian, 2011). For example, the stroke-width of the letter ‘T’ should be identical to the
stroke-width of the letter ‘D’ assuming the text font is the same. However a non-text
region is not likely to share this stroke-width and can therefore be eliminated as a text
region.
Thinning is a method of reducing binary objects in an image to strokes which
are a single pixel wide (R.C. Gonzalez, Woods, & Eddins, 2010). The Matlab
command bwmorph implements this approach with a series of operations including
dilations and erosions. Matlab enables the programmer to set the number of iterations
for which the thinning operation occurs. In fact, the number of iterations can be set to
infinity (inf) indicating that the operation will continue until the image ceases to
change.
The results from the distance transform and the thinning operation are then
combined to provide the stroke width values contained within the image. A
measurement for stroke width is calculated by dividing the standard deviation of the
stroke width values by the mean of the same stroke width values:
Stoke Width Measurement =
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝐷𝐷 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ
An array index is computed which is comprised of those regions of the image with
a greater stroke width measurement value than the value of the predefined stroke
width threshold. It is expected that those regions with a greater than threshold value
15
will be the text regions of the image. This index is then subject to the operation of the
mserRegions command so that the desired regions of the image, i.e. the text regions
can be removed.
3.6 Bounding Boxes
Bounding boxes are often employed within image processing applications as a
method of making some sense from the data obtained. Examples of the use of
bounding boxes include collision detection as applied to computer graphics and
animation (Yao Wang, Hu, Fan, Zhang, & Zhang, 2012), and the segmentation of
hand-written Chinese characters which are prone to overlap (Tseng & Chen, 1998). In
a typical text recognition system it is essential that the OCR engine is able to return
complete words or paragraphs, rather than a list of the individual characters acquired.
To help ensure that order is maintained so that the correct registration can be
obtained from an input image, bounding boxes are used to amalgamate the individual
character regions into lines of text (Mathworks.com, 2015). These bounding boxes
surround each of the individual text regions and can be expanded to overlap with each
other, thus forming a chain of overlapping boxes which are used to form complete
words, or in this case a vehicle registration number.
Bounding boxes are obtained for all the regions of interest remaining from the
image by concatenating the MSER properties for bounding boxes previously obtained
with the regionprops command. These bounding boxes are then expanded in line with
the theory of having characters overlap with their neighbours. A value for the level of
expansion, or Expansion Coefficient (E.C.) is entered into the algorithm and is used to
set new limits for the x and y axis of the bounding boxes. Minimum and maximum
axis values for the expanded bounding boxes are calculated in the following manner:
xmin = (1- E.C.) x xmin
xmax = (1 + E.C.) x xmax
Prudence dictates that the minor precaution of ensuring the expanded bounding boxes
do not exceed the outer limits of the image. This is achieved by comparing the
maximum axis limits calculated from the expansion coefficient with the axis limits
defined by the size of the image. The new axis limits of the bounding boxes are then
16
taken as the minimum value computed from the previous comparison. This is
implemented in Matlab in the following fashion:
xmax = min(xmax, size(I,2))
Overlapping bounding boxes can be combined to form a single box around
multiple characters. This method of merging overlapping bounding box components
to make a single component has been used in the processing and segmentation of
ancient historical documents (Kavitha, Shivakumara, Kumar, & Lu). However it is
most often implemented to distinguish between separate words. In the case of a
vehicle registration there are two likely outcomes: Either the entire registration will be
surrounded by a single bounding box or the two distinct sections of the registration
will be surrounded by separate bounding boxes. The effect on the overall result of one
of these events occurring over the other is negligible.
An overlap-ratio is applied to quantify the distance between each of the text
regions detected in the image. Matlab provides a function for this purpose which is
activated with the bboxOverlapRatio command. The function returns the overlap
ratio between each pair of bounding boxes contained within the image. Those
characters with non-zero overlap ratios are considered to be connected in the context
of the bounding box and are therefore likely to exist as part of the same line of text.
Any characters with zero overlap ratios are not connected and are thus considered as
separate sections within the text image.
A graph of these overlap ratios is generated in Matlab for to determine which
regions are connected for the purpose of merging them into a single text region.
3.7 Optical Character Recognition
The OCR function is applied to the area of the filtered version of the input image
encompassed by the merged bounding boxes. By defining the area of the image upon
which the OCR function is going to operate a more consistent detection performance
is expected.
Use of the Matlab command txt = ocr(I, roi) enables recognition of text in the
image (I) within a specified region of interest (roi). The region of interest is the area
defined by the bounding boxes generated in the detection algorithm and must take the
form of one or more rectangular regions defined by an m-by-4 matrix. The width and
17
height of the region of interest is determined by x and y coordinates established for
the bounding boxes and these must not extend beyond the area of the image.
Almost all of the commercially available OCR functions are designed to operate
on binary images and Matlab is one of these. Matlab’s OCR function uses Otsu’s
method of thresholding (Otsu, 1979) to convert an input image into a binary
equivalent before the recognition process is implemented. Otsu’s method has been
demonstrated to exhibit better overall performance in OCR than other techniques (M.
R. Gupta et al., 2007).
Modern OCR algorithms like the one employed by Matlab add multiple
algorithms of neural network technology to analyse character stroke-edge. This stroke
edge is effectively the collision point between the concentration of character pixels
and the background image. The algorithm takes averages of the black and white
values along the edge of each character. The result is then matched to the characters
contained in the dataset and the closest estimation is selected as the output character
(Potocnik & Zadnik, 2016).
When the OCR algorithm has completed the recognition process the results are
printed in the Matlab command line with the following entry: [txt.Text]. Should the
user require information on the properties of the OCR output the command ocrText
contains recognised text and metadata collected during optical character recognition
(Mathworks.com, 2016b). However, some of these features are not available with the
student edition of Matlab used during this project.
3.8 Common Issues with Text Detection
There are many difficulties associated with character recognition is scene images.
Typically there is a significant amount of inter-character and intra-character confusion
leading to mistaken identification (Mishra, Alahari, & Jawahar, 2016). For instance,
partial capture of a character can result in it being recognised as a completely different
alpha-numeric digit.
Extraction of text regions from natural scene images is a challenging task due to
the many factors influencing the quality of detection. These factors include variation
in light intensity, alignment of text, colour, font-size and camera angles (Zhang et al.,
2013). Some text components do not display a high level of colour contrast and fail to
be detected with MSER (Liu et al., 2016). This is one of the weaknesses associated
18
with implementation of an MSER methodology. However, under favourable lighting
conditions should not pose a substantial problem as vehicle registration plates have
reasonably distinct contrast between character and background regions.
19
4 Methodology
4.1 Introduction
Implementation of a system which will acquire an image, process it and provide
automated indication of the success or failure of the operation requires an elaborate
methodology to incorporate each of the individual components into one design. This
section documents how the complete system has been put together.
There are two distinct sections covered in this methodology section. The first is
the hardware element of the project containing all the communications involved,
including image acquisition and an automated response. The second is the procedure
implemented for processing of the acquired image and extraction of the desired text
region.
The overall system design is discussed, providing insight into how the various
components are expected to interact. The selection of the Raspberry Pi 2 Model B and
the specifications of this microcontroller which lend itself to the application are
documented before the additional circuitry required of the system is discussed with
particular regard to the design of a PCB.
Following the hardware description of the project a detailed description of the
image processing algorithm is provided. This description discusses each of the major
sections of the algorithm individually, highlighting the effects of each technique on a
given input image. As stated previously, this image processing algorithm is the
technical focus of the project and the level of detail reflects this.
The concluding paragraphs of this methodology section are intended to provide
the reader with information on how the extracted number plate text is compared to an
existing text string and how this comparison is used to provide indication of
recognition.
4.2 System Design
Figure 2 is a flowchart depicting how the overall system to be implemented has been
conceived.
20
Figure 2 System Design Flowchart
Having framed the project in the context of a system for extracting the digits from a
vehicle registration plate and using these to produce an automated response, each of
the six steps in the flowchart are essential elements in this process.
Beginning with image acquisition, it is conceived that some form of wireless
sensor network will be used to trigger a camera to capture an image. This may be
something analogous to an infrared transmitter/receiver (IR) circuit. In this case an IR
sensor would be positioned to allow an incoming vehicle to break the beam and
consequently cause the camera module to acquire the image.
In order to facilitate image acquisition that would be automated in this way a
microcontroller will act as the central node in the system. Some of the
microcontrollers with potential for selection are documented in the literature review
section.
The second stage of the system design is entitled ‘Image transmission’. The
concept behind this title is that the acquired image will be transmitted wirelessly to a
laptop or PC for image processing and character recognition. The central
microcontroller must be equipped with a wireless protocol such as Wifi or Bluetooth
and be capable of transmitting the data in this way.
The third and fourth stages of the system design are Image Processing and
Optical Character Recognition. Matlab has been selected as the software platform for
implementing this process for several reasons, including its Image Processing
Toolbox and OCR engine. The image processing stage of the system will incorporate
Image Acquisition
Image Transmission
Pre-Processing
OCR
Results Comparison
Automated Response
21
a series of steps designed to provide the best possible image for the OCR function to
operate on. This will involve some of the methods mentioned in the introduction and
literature review of this document. The OCR function is used to produce a text string
output which is expected to match the characters present in the input image.
The result from the OCR function will then be used alongside some existing
database of vehicle registration numbers in a comparison function which will
determine whether or not the character string obtained is one of the registrations
expected. Finally, the result from the comparison function, which will be a Boolean 1
or 0, will be used to initiate an automated response, tailored to each condition.
4.3 Hardware Specification – Raspberry Pi 2 Model B
The Raspberry Pi 2 Model B is the second generation of the Raspberry Pi
microcontroller and has been selected as the central device for this project. The device
offers a flexible format for embedded projects, particularly those requiring low power
(Raspberrypi.org, 2016b). There are several features of the Raspberry Pi that make it
an ideal candidate for selection in this project. Central to this is the 900MHz quad-
core ARM Cortex-A7 CPU with 1GB of RAM. According to (Arm.com, 2016), the
Cortex A7 is the “most power-efficient multi-core processor.” This becomes a
particularly important factor when considering the sustainability of a given system or
product.
The Cortex-A7 allows the Raspberry Pi to run at 1.2-1.6GHz while requiring
less than 100mW of total power in typical conditions (Arm.com, 2016). The low-
power and high performance of the Raspberry Pi has led to its implementation in
many power-critical projects. (Tomar & Bhatia, 2015) employ the Pi as the central
device in development of a Software Defined Radio (SDR) for use in disaster affected
areas. Wireless sensor networks are a common application for this type of
microcontroller and the Raspberry Pi compares favourably with devices such as the
Arduino Uno (Ferdoush & Li, 2014).
Additional features of the Raspberry Pi making it a suitable device for the type of
application considered for this project include the following (Raspberrypi.org,
2016b):
• 4 USB ports
• 40 GPIO ports
• Full HDMI port
22
• Ethernet port
• Camera interface
• Display interface
• MicroSD card slot
• VideoCore IV 3D graphics core
The camera interface included in the list above enables the user to connect the
custom-designed add-on module for Raspberry Pi hardware (Mathworks.com, 2016d).
This small and lightweight device supports both still capture and video mode, making
it ideal for mobile projects. In still capture mode the camera has a 5 megapixel native
resolution, supporting 1080p30 and 720p60.
The Raspberry Pi camera module is popular in home security applications and
wildlife camera traps and is often used for time-lapse and slow motion imaging
(Raspberrypi.org, 2016a).
The GPIO pins on the Raspberry Pi model B are an essential element in its use
as the central node of a system as they facilitate connection with external electronic
circuitry and sensors (Vujović & Maksimović, 2015). These pins can accept input and
output commands which can be programmed to act as required. With particular
reference to this project these input pins can be used to monitor the status of switches
or sensors which can be implemented as triggers for other components of the system.
The pin layout in Figure 3 can be seen in the diagram taken from element14.com,
included below:
Figure 3 Raspberry Pi Pin Layout
23
As witnessed by the pin diagram, the Raspberry Pi model B is equipped with several
DC power lines which can be used as a power source for external circuitry. In terms
of portability and using the microcontroller remotely this is a powerful feature as it
eliminates the necessity for further external power supplies which may otherwise be
required.
The facility to integrate a wireless network, database server and web server into
a single compact, low-power computer, which can be configured to run without a
monitor, keyboard or mouse is a major advantage when working with the Raspberry
Pi (Ferdoush & Li, 2014). This became a particularly important feature for use in this
project as the Pi could be controlled remotely following initial setup. As the system
was developed and became more refined, the wireless element grew in importance,
not only as a means of data transmission but as a method for implementing overall
control. For this reason the selection of the Raspberry Pi for the hardware
requirements of the project proved correct.
There are a several options for powering the Raspberry Pi with the condition
that the source is able to provide enough current to the device (Vujović &
Maksimović, 2015). The device is powered by 5V from a micro-USB connector;
however the current requirements differ for each model of the device and depend on
the number of connections drawing power from the microcontroller. For the model
being used in this case (2B), a PSU current capacity of 1.8Amps is recommended
(Raspberrypi.org, 2016c).
With a device such as the Raspberry Pi acting as the central node of a system
like this one, there is a possibility that an excessive number of parasitic devices may
be connected and drawing current that the Pi cannot facilitate. It is therefore essential
that the number of connected devices and components are kept to the minimum
required. Typical connections to the Raspberry Pi including HDMI cable, keyboard
and mouse require between 50mA and several hundred milliamps of current
(Raspberrypi.org, 2016c) and the camera module being used here requires a
significant draw of 250mA. Those external devices are required during the testing and
prototyping stages of this project. However, due to the specification of the system
some of these current drawing devices are not required for the final construction. With
remote connectivity there is no need for GUI-related connections to the Raspberry Pi,
thus relieving the power-burden on the device somewhat.
24
4.4 Wireless Network
The system design specifies that some form of wireless network is used for
communication between the microcontroller and the computer containing Matlab.
Wifi has been selected as the protocol for this purpose and there are several reasons
behind this decision.
The prevalence of Wifi in commercial and academic premises makes it an
easily accessable resource for implementation of this system. Wifi also enables
greater range than could be provided by a single Bluetooth device. The use of Wifi for
transmission of the acquired image is not a major concern as only one picture is being
sent at any one time.
The simple fact that Matlab is able to communicate directly with the
Raspberry Pi by forming a connection via the devices IP address made the selection of
Wifi a certainty. An IP address along with a username and password for the
Raspberry Pi is all that is required to enable remote control of the device from Matlab.
The choice of Wifi as the network model may have been premature in regards
to experimental testing of the system due to the intermittent coverage in the lab
setting. This issue is discussed further in the section of this paper relating to testing.
4.5 PCB Design and Manufacture
As the primary area of investigation and experimentation undertaken for this project is
the image processing and character recognition elements, a model for some of the
hardware requirements is necessary to ensure effective use of the time allocated. The
use of modelling particularly relates to the inputs and output of the system i.e. the
initial triggering of acquisition and the automated response.
As the initial triggering of the camera module is premised on a traditional IR
sensor a simple push-button switch can be used to model this action. In a real world-
scenario the automated response of the system may be used as a means of enabling or
restricting access to a parking facility or even alerting an operator. The Raspberry Pi
GPIO pins can be utilised to initiate an automated response and in this case the use of
two LEDs has been chosen as the method for affirming the results of the overall
system. A green LED will denote positive detection and recognition while a red LED
with signify the corresponding negative result.
25
Initial testing of the system in this configuration required that construction of a
circuit be carried out on a breadboard. Once successfully tested and a final design
settled upon, this circuit could be designed and constructed as a Printed Circuit Board
(PCB).
The circuit design incorporated two push-button switches. One to simulate the arrival
of a vehicle at the position where image acquisition takes place and a second to
simulate the end of the operation and system reset. These two switches are connected
to one of the Raspberry Pi GPIO pins which will be polling for a change in state.
The two LEDs being used to simulate the systems output response are also
connected to GPIO pins on the Raspberry Pi. Due to a lack of intensity experienced
while testing with the LEDs two NPN transistors have been included in the circuit to
enable extra current to be driven to the LEDs.
The design of the circuit can be seen in the appendix and the PCB design in
Figure 5. Both the schematic and PCB layout have been drawn on proteus.
Figure 4 PCB Design
4.6 Software Structure
To implement the system on Matlab several important requirements of the design
specification must be met. This requires a systematic approach to development of the
program to ensure that none of the critical stages are overlooked. Figure 5 is a
flowchart depicting the various stages of the software design as it has been
programmed in Matlab.
26
Figure 5 Software Design Flowchart
27
The first critical objective of the program is to connect with the Raspberry Pi device
and take control of the onboard camera module. At this point the external LEDs are
set to ‘0’ to ensure they are not considered as false positives.
The program then ‘polls’ the appropriate GPIO pin which is connected to the
switch being used to trigger image acquisition. This ‘polling’ effectively sees the
system wait for this switch to be pressed before any other action can begin. Figure 6
shows how this has been implemented two simple lines of code.
Figure 6 Polling a Switch
When the switch is finally pressed the Raspberry camera model aquires an image and
it is transmitted to Matlab on a laptop. The image is saved into the associated Matlab
folder and applied to the image processing algorithm.
The image is converted to the grayscale format before it is processed through
each of the stages discussed in the theory section. The method employed for applying
these techniques in Matlab is documented in the following sections.
4.7 MSER Regions
As stated previously in the theory section, the initial phase of image processing is
application of the MSER technique. The Matlab command detectMSERFeatures
seen in Figure 6, returns information on region pixel lists and is used to determine the
‘blob’ regions in an image.
Figure 7 Malab MSER command
The lines of code in Figure 6 show how this command is implemented in the program.
There are a number of parameter values associated with the command, allowing the
user to determine certain ranges depending on their application. The
‘RegionAreaRange’ facilitates the size of the detected regions in pixels and can be
adjusted within the range of 30 to 14,000. In Matlab’s user guide the ‘ThresholdDelta’
value is stated as a method for specifiying the threshold intensity levels used in
selecting Extremal regions while testing for their stability. Put simply a greater
parameter value will return fewer regions of interest.
28
The parameter values seen in Figure 6 have generally been used thoughout
testing but they can be adjusted and additional parameters included if required.
The image in Figure 7 has been operated on by the MSER technique discussed
and exhibits all the potential text regions detected. Due to the relatively wide scope of
this image a large number of MSER regions have been returned. The weakness of
this technique is obvious from this image as the number of non-text regions
indentified vastly outnumbers the text regions.
Figure 8 MSER Example Result
4.8 Regionprops
The Matlab command regionprops is deployed to apply removal of MSER regions
based on geometric properties. Data from the MSER regions must be converted to
linear indices so that it can be operated on with regionprops. The regionprops
command then measures and returns statistical analysis of the MSER regions
previously identified. There are numerous property types which can be used for
geometric thresholding and selection may depend on the requirements of an
application. Thos properties selected in this instance are included in the section of
Matlab code in Figure 8.
Figure 9 Geometric Properties Thresholds
The ‘Extent’ parameter for example, returns a value that specifies the ratio of pixels in
the region to pixels in the total bounding box (Mathworks.com, 2016c). A threshold
range for this property is set, as in Figure 8, remvoving MSER regions based on these
29
criteria. ‘Eccentricity’, ‘Solidity’ and ‘Euler Number’ are each calculated in a similar
manner using the regionprops command.
The ‘Aspect Ratio’ is calculated as the ratio of the height of the image area to
its width. Information is extracted from the bounding box regions of the image using
regionprops and the ratio calculated as the width divided by the height. A threshold is
applied in the same way as with the other geometric properties.
MSER regions determined by the thresholds are removed from the image
based on this technique. It is anticipated that this would result in a significant
reduction in the number of those non-text regions present in an image like the one
seen in Figure 7.
4.9 Stroke-Width Variation
A series of steps are required for analysis of the stoke-width of the region images
before a threshold can be used to elimintate the remaining non-text regions. The
padarray command is used to ‘zero pad’ the image region, effectively encasing it in a
number of zeroes along its edge. This is to avoid corruption due to boundary effects
which can occur as a result of filtering (stackexchange.com, 2016).
In Matlab the bwdist command is used to calculate the distance transform of a
binary image. This function calculates a number that is the distance between a given
pixel and the nearest non-zero pixel. Morphological thinning is then applied to the
image to remove some foreground pixels from the image. This is commonly known as
skeletonisation and produces a drastically thinned image which retains the
connectivity and form of the original.
The results of the distance calculation and the thinning operation are combined
to determine the stoke-width values present in the image. The standard deviation and
the mean of the values are used in a calculation to determine a stoke-width
measurement. This measurement is then used along with a threshold value with the
intention of removing all remainin non-text regions. The section of Matlab code used
to determine the stroke-width measurement and threshold is included in Figure 9.
Figure 10 Stoke-Width Thresholding
30
4.10 Bounding Boxes
As stated in the theory section, bounding boxes are used to bring form to the data
present in the image. Matlab is equipped with considerable functionality for applying
bounding boxes and the process is initiated by determining bounding boxes for each
of the remaining text regions. These bounding boxes can be expanded slightly to help
ensure overlap between connected components. This is achieved by applying a small
expansion amout to the bounding boxes and is an important feature in determing the
structure of a text string returned from the OCR function. The effect of varying the
expansion amout is discussed in greater detail in the results section. Figure 10 shows
the effect of applying bounding boxes to each of the character regions and a clear
overlap is clearly visble among the components.
Figure 11 Bounding Boxes
A bounding box overlap ratio is calculated and graphed so that connected
regions within the image can be identified. These connected components are then
merged together based on a non-zero overlap ratio to form a text string or word. In
Figure 11 the lines of code remove the bounded boxes that only contain single
components and the text region presented to the OCR function is displayed in Figure
12.
Figure 12 Merging of Bounding Boxes
The example in Figure 12 is an ideal scenario as the entire number plate has been
identified as a single text string, making future comparison with stored registrations
much simpler.
31
Figure 13 Merged Bounding Boxes
4.11 OCR Function
The OCR function used for this project is an existing function in Matlab, the
operation of which has been discussed in the theory section of this document. In terms
of the methodology employed for using this function the process is a simple matter of
applying the text image, appropriately processed, with the correctly merged bounding
boxes to the OCR command. The section of code in Figure 13 depicts how this is
accomplished.
Figure 14 OCR function code
The result of the OCR function is a text string of the recognised alphanumeric digits
printed in the Matlab command line.
4.12 String Comparison
For the digits recovered from a vehicle registration plate to be relevant in any kind of
automated system a method is required for comparing them with what is expected or
required. In an operational, fully-automated system the alphanumeric digits extracted
from an image may be compared with an extensive database of all registration
numbers cleared for access. The result of the comparison would be a simple positive
or negative, resulting in action or inaction. This seems to be a relatively simple
procedure but due to the various data types and array structures present in Matlab, a
certain amount of manipulation is required to implement direct comparison.
Matlab is equipped with a function for comparing strings, called as
strcmp(A,B), which returns a true(1) or false(0) depending on whether or not the
strings match. It is important to note that the data operated on within this function
must be of the same type. Therefore the 1x10 character array generated from the OCR
32
function cannot be directly compared with the string entered as the expected
registration digits.
The solution to the problem of comparing different data types is to convert
them both to a mutual type. This requires the use of the cellstr(S) function in Matlab
which facilitates the creation of a cell array of strings from any character array. A cell
array in Matlab is one whose elements are cells. Each cell in a cell array can hold any
Matlab data type including numerical arrays, character strings, symbolic objects and
structures (Hanselman & Littlefield, 2001).
Taking the example of a vehicle registration plate accurately detected by the
OCR engine as ‘XJZ 7743’, the answer returned is a 1x10 Character Array and is
stored in Matlab as such. Entering the string B = 'XJZ 7742' is stored in Matlab
simply as the value ‘XJZ 7747’. Comparisons of these two results with the string
compare function returns a ‘0’ as the values are in different formats.
This error is overcome by creating two cell arrays from the stated values. The
lines of code in Figure 14 below show the method for comparing these two cell
arrays:
Figure 15 Cell Arrays
B is the manually entered string to provide comparison with; A is the output of the
OCR function converted to a character array; D is this OCR output generated as a 1x1
cell array; E is the string generated as a 1x1 cell array. F is the result of comparison
between the cell arrays D and E using the string compare function.
In this example the output of the string comparison function, F, is equal to ‘1’.
This provides positive confirmation of a match which can then be implemented as a
condition for the execution of an automated response function.
4.13 Indication of Recognition
In order to indicate whether or not the system has produced a positive match and to
represent an automated response to this match, some form of output from the system
33
would be required. The basic premise of this function, as stated previously in this
report, is to enable or block entry to a parking facility and to alert an operator when
this is considered necessary.
The Raspberry Pi provides a suitable platform for this purpose as its GPIO
pins can be implemented to trigger an external response to the system inputs. In real-
world applications this output may be tailored to meet the specific requirements of a
given system. For example, a servo motor may be triggered to raise a barrier or an
alarm sounded to alert a system operator. In this case a simple LED can be used for
simulation and testing of the efficacy of the Matlab code and external circuitry.
In Matlab code an ‘if’ statement can be used to determine a response which is
dependent upon the presence of a specified input condition(s). For instance, it may be
used to implement a certain set of conditions when the ‘if’ statement is ‘true’,
otherwise the status-quo persists. Alternatively it could be used to determine an output
based on several potential input conditions, determining the required output upon the
presence of a given condition.
For testing the output of the system the input condition is provided for by the
result of the string compare function discussed in the previous section. Therefore the
code could be compiled to trigger some form of response when the output of the
string compare function (F) is equal to ‘1’. In cases where the OCR function is unable
to determine a positive match F is equal to ‘0’, in which case the system can be
configured to produce no response at all or an alternative response such as a red light
to indicate that the comparison is negative.
A section of code containing the ‘if’ statement is inserted in Figure 15below.
Figure 16 'if' statement in Matlab
The initial condition to this section is ‘F == 1’. When this condition is met due to a
positive match from the OCR function and the string compare function, a digital
output pin on the Raspberry Pi is sent HIGH. An ‘else’ statement is included to ensure
that should this condition not be met the digital output pin will remain LOW.
To provide visual indication of a positive or negative match an external LED is
connected to the relevant output pin via a transistor. The transistor is required to
34
provide enough current so that the LED is easily visible. The current provided from
the GPIO pins on the Raspberry Pi is insufficient for this purpose.
In the event of a positive match, i.e. F==1, the green LED is switched on. When
the code is run and the result is a negative match, then the red LED will be switched
on.
A single iteration of the system is completed when the second push-button switch
is pressed as this switched off all external LEDs, closes all open figures, deletes the
input image and exits the While Loop.
4.14 Graphical User Interface
For the purpose of improving the utility of the character recognition algorithm a
Graphical User Interface (GUI) has been developed in Matlab. The software provides
tools for creation of the GUI and facilitates inclusion of push-buttons, graphs and text
etc. Among the virtues of using a GUI in Matlab is that it can disguise a vast and
complex program code behind an easy to use interface. The GUI created for this
system can be viewed in Figure 15.
Figure 17 Graphical User Interface
The image in Figure 15 depicts the GUI following acquisition of an image. Text
regions and the detected text are displaye on the screen which may be useful to a
system operator. The GUI also enables the user to establish connection with the
remote device and to manually override the system. Finally the Raspberry Pi can be
shut down remotely by pressing the associated push-button on the GUI.
35
5 Experimental Testing
5.1 Initial Testing
Testing has been carried out on the various components of this project throughout the
duration of the academic year. Having considerable elements of both hardware and
software, testing required a modular methodology to ensure that each part of the
system worked correctly before it could be integrated within the overall design.
Testing of the image processing and character recognition aspect of the project
required considerable experimentation to ascertain the effectiveness of various
components and to understand the reasons for disappointing results. Clearly not all of
these tests can be documented in the results section but a detailed overview and
analysis of the work undertaken is provided.
Many of the experiments employed previously acquired images of vehicles
taken from different angles and distances to provide an adequate range of complexity.
These can then be used to run experiments on the image processing algorithm without
having to include the communications element of the project. This testing
methodology led to several instances of successful recognition, enabling the project to
progress towards integration of a complete system.
As well as applying different images to the processing algorithm, tests included
making changes to certain elements of the program to observe the results and use the
information to refine the process. Results from this type of testing are provided in the
Complex Recognition section.
Initial testing of the hardware elements of the system have been carried out by
interfacing the Raspberry Pi with a circuit constructed on a breadboard. These tests
were designed to determine the effectiveness of the switches and LEDs for modelling
the acquisition trigger and the automated response. These intial test results proved
successful, enabling work to proceed on the PCB design and manufacture.
Running concurrently with the breadboard testing was testing of the wireless
communication between the Raspberry Pi and a laptop computer. Simple testing such
as programming LEDs to flash progressed on to more complex tasks such as
transmitting an image from microcontroller to laptop.
36
5.2 Complete System Test
Complete system testing combined each of the component elements of the project to
determine whether or not it would operate as expected. This process has not
proceeded as smoothly as anticipated although it has provided some positive results.
The hardware used for the complete system test, including the Raspberry Pi and PCB
can be viewd in Figure 15.
Figure 18 Complete System Hardware
On a number of occasions the system has operated entirely as expected,
providing a lit green LED to signify correct recognition of the input characters.
However certain issue have arisen with the system that has limited the time spent on
refining the final product. For example, the system when left idle for a significant
period of time tends to lose connectivity with the wireless network in the lab. This can
lead to errors when attempting to reconnect, as the program running in Matlab
considers the device as being still connected but unable to respond to commands.
However this appears to be an issue with the network itself as testing in other venues
has not produced the same problem.
An additional feature of the complete system test was the discovery that the
Raspberry Pi camera module tended to deliver four snapshots to the Matlab program
when only one is expected. This would lead to a type of backlog in which triggering
of the camera module would lead to Matlab processing a leftover image that may have
been acquired several minutes earlier. This particular issue was solved by making
some minor adjustments to the Matlab code to ensure that only one image is acquired
with each iteration.
37
5.3 Limitations to Testing
Several limitations to testing of the system have been experienced, some of which
may be relevant to the results obtained. Perhaps the most debilitating of these has
been the difficulty in obtaining and maintaining adequate wireless connectivity in the
lab. Intermittent connectivity led to a significant amount of time being expended on
troubleshooting network problems. Occasionally it was not possible to establish any
connectivity between the Raspberry Pi and the college network, making testing of the
overall system very difficult.
On reflection, a more prudent approach may have been to perform all testing
with an Ethernet connection to avoid time wasted on wireless issues. Final completion
of the system could, in that case, have incorporated the wireless element.
Lighting proved to be something of a restriction to results obtained from live
input images. The less than adequate lighting in the lab setting combined with
intermittent changes in intensity due to sunlight made consistency of results extremely
difficult during testing. However this can also be interpreted as a positive aspect as
solutions to these problems are required in real-world scenarios.
Finally with regards to limitations, it is important to understand that all of the
testing completed for this project has been in relation to static text images. What is
meant by the word static is that the text content of the image is stationary at the
moment of image acquisition. This is in contrast to more advanced systems that use
sophisticated techniques to extract text from moving vehicles for example.
6 Results and Discussion
6.1 Introduction
The results obtained from testing of the image processing algorithm and the overall
system are numerous and generally successful in relation to prior expectations. The
Raspberry Pi is able to acquire an image when triggered. This image can be
transmitted to a laptop wirelessly via a Wifi network where it is applied to the image
processing algorithm. In many instances the correct characters are obtained and a
green LED switched on in response.
38
With specific regards to the image processing and character recognition element
of the project it is important to understand that those results demonstrated in the
following paragraphs have been obtained through many stages of experimentations
with the various components of the algorithm. It is not possible to discuss the result of
each test but a detailed overview is provided.
Not all tests have been successful in achieving the desired target of the system
i.e. to correctly identify the characters in a vehicle registration plate. However each
unsuccessful test has provided information on the effects of the various processing
techniques which has helped in refining elements of the program. Some of the more
interesting results, obtained from unsuccessful tests, are documented in the Complex
Recognition section of this report.
The Basic Detection section will show how the algorithm has been successful in
identifying text regions and recognising them correctly as those in the input image.
The title of the section relates to the relative complexity of the input image which is a
primary reason for the positive results. The Complex Detection section employs a
series of examples to demonstrate how changes to thresholding parameters in the
algorithm affect its performace.
The results presented are supplemented by discussion and analysis of the overall
system and recommendations for further work.
6.2 Basic Detection
The image in Figure 16 is one used prominently in tests carried out throughout the
duration of this project and is a typical example. The particular features of this image
that make it conducive to character recognition are the clearly defined black character
regions against a yellow background, the lack of external image regions that may be
miscalculated as potential text regions and the close to ideal angle from which the
image has been acquired.
Figure 19 Basic Detection - Input Image
In Figure 17 the input image has been converted to grayscale and the MSER regions
technique applied. With a fairly basic image like this one it is anticipated that this
39
method should have no difficulty in detecting all of the character regions and should
only detect minimal non-text regions, or perhaps even zero non-text regions.
Figure 20 Basic Detection - MSER regions
As expected, the character regions have been detected with only two non-text regions
below the number plate being identified as potential text. With so few non-text
regions initially detected due to the lack of complexity in the image the next two
stages of the algorithm have a greater chance of identifying the text regions which are
to be operated on by the OCR function.
Figure 18 depicts the effect of applying the regionprops command and
statistical thresholding to the image post-MSER.
Figure 21 Basic Detection - Geometric Properties method
In this case, as in many of the experiments with this type of image, the MSER regions
remaining from Figure 18 have been removed, leaving only text regions remaining.
The effects of applying stroke-width thresholding can be viewed in Figure 19.
Figure 22 Basic Detection - Stoke-width thresholding
The fact that the second stage in the pre-processing algorithm has successfully
identified all of the text regions makes the third stage somewhat redundant in this
instance. This can be a common occurrence in image processing algorithms when
well cropped images like this one are involved. However it is necessary to include the
stoke-width thresholding stage due to its effect on the consistency of results in more
complex situations. The importance of including each of the three pre-processing
stages will be made clear in the following section.
40
One of the problems encountered when attempting to generate a character
output was in returning the full registration in the correct order. Following the stroke-
width thresholding, bounding boxes are applied to the image in an attempt to form a
coherent structure from the data. As stated in the methodology section these bounding
boxes are calculated within Matlab but can be adjusted to suit specific applications.
Due to the fact that there are two distinct sections, “LLZ” and “2268” and the
bounding boxes are included to establish text regions, the resultant output tended to
return the two sections in reverse order. A certain amount of trial and error can be
required to overcome an issue like this one but adjustments to the expansion amount
required for increasing the size of each box proved effective in overcoming the issue.
Figures 20 and 21 show how the bounding boxes have been applied
differently in two iterations of the same algorithm.
Figure 23 Basic Detection - Bounding Box comparison
In Figure 20 the bounding boxes have been applied using the associated Matlab
command. However two different expansion amounts have been used to extend the
jurisdiction of the boxes. In the top image in Figure 20 a relatively small expansion
amount has been applied. Although this has resulted in most of the character
components being connected, the central aperture between ‘Z’ and ‘2’ has resulted in
these two not being identified as connected components.
With the expansion amount increased, the second image in Figure 20 shows
how these larger bounding boxes extend over a greater area and result in slight
overlap between the ‘Z’ and ‘2’. With the overlap ratio set to zero, all connected
components are considered as part of a single line of text. The effect this process has
on the input to the OCR function can be seen in Figure 21.
41
Figure 24 Basic Detection - Bounding Box Comparison (1)
The top image in Figure 21 shows how a small expansion amount can result in a
vehicle registration plate being separated into two distinct lines of text. This is an
unwanted situation as it can lead to errors when comparing the text string with an
existing database of registration numbers.
In the second image the increased expansion amount has ensured that the OCR
function will consider the text regions on the image as a single string. This is the ideal
scenario when inputting the image to an OCR function as it eliminates alternative
interpretation of the order of the data.
The algorithm has been extremely successful in identifying and correctly
recognising the characters when operating on basic input images like the one in
Figure 16. The processed image, having been applied to each of the stages
documented in this section is applied to the Matlab OCR function, which provides a
result based on its interpretation of the image. Comparing the edges of the character
regions in the image it returns a text string based on correlation to existing templates.
Figures 22 and 23 show the result of the OCR operation on the processed
image, as printed on the Matlab command line.
Figure 25 OCR result (1)
Figure 26 OCR result (2)
42
In Figure 22 the result has been returned as two distinct text strings. Although it has
returned the correct characters and proven the effectiveness of the various pre-
processing stages as well as the OCR function it is preferable that the result in a single
line of text.
6.3 Complex Detection
Results obtained from the image processing algorithm highlight the importance of
ambient conditions, the quality of the input image and the effectiveness of well-
refined thresholding properties. One of the more interesting aspects of the
experimentation process has been the fact that each iteration of the algorithm provides
information on the functional operation of the process, regardless of whether or not
positive recognition has been achieved. A typical example of this is the variation in
results obtained when applying different camera angles to the vehicle registration
plates.
In order to further demonstrate the results obtained from experimentation with
the algorithm in Matlab a single input image is being used to display the effects of
various adjustments to the image processing properties. The image is displayed in
Figure 24 below and the intention is to isolate the registration plate characters as the
only Regions of Interest to the OCR function. This particular image has been selected
due to certain properties it exhibits. These include the substantial light contrast from
the top to the bottom of the picture and the offset angle of the vehicle registration
plate.
Figure 27 Alfa Romeo Input Image
Tables 1 to 4 contain property types used in the image processing algorithm to
differentiate between RoI’s in an image. Each property is associated with a threshold
43
value which can be adjusted to determine the effect of each property in distinguishing
between general colour concentrations and text regions. The first five properties in
each table are the geometric properties discussed in Section 3.4 and the sixth is the
Stroke-width threshold discussed in Section 3.5.
Table 1 contains the base values used to configure the region properties and
stroke-width thresholding levels. With these values in place several separate instances
of character recognition have been successful. In fact, with this configuration the
system has been able to produce the automated response to positive recognition
anticipated in the design. However, these successful cases have been induced in ideal
conditions or with much less complex input images than the one seen in Figure 24.
Table 1 Parameter Values - First Iteration
Parameter Threshold Value
Aspect Ratio >3
Eccentricity >0.995
Solidity <0.3
Extent 0.2< OR <0.9
Euler Number <-4
Stroke-width Threshold 0.4
From left to right the images in Figure 25 as well as in Figures 26, 27 and 28 depict
the three key stages of the image processing algorithm: 1) MSER region detection, 2)
removal of MSER regions based on geometric properties and 3) removal of remaining
non-text regions based on stoke-width detection.
Figure 28 Processing results - First Iteration
In the first image in Figure 25, the MSER technique demonstrates both its strength
and weaknesses. The method has successfully identified the seven character regions in
44
the image but has also identified a very high number of additional regions which are
considered potential text regions. It is the sheer quantity of potential RoI’s determined
using the MSER methodology that make further pre-processing of the image a
necessary requirement. However it should be noted that the volume of MSER regions
detected in this image is a consequence of the inherent complexity presented. Much of
the experimentation carried out for this project has been undertaken with extremely
basic text images, often resulting in detection of text regions only, or very limited
non-text regions.
In the second image presented in Figure 25 the regionprops command has
been employed to measure the specified geometric properties with the intention of
eliminating non-text regions based on the threshold values seen in Table 1. In this
instance the technique has been fairly successful in removing many of those ‘blob’
regions detected using MSER. The areas surrounding the license plate have been
removed, as have many of those on the grill and window-wipers of the vehicle. This
stage of the process has also been successful in maintaining the character regions on
the number plate for further processing.
Despite many of the non-text regions being removed during this stage of the
process it can be deduced from those remaining regions that the parameters
documented in Table 1 are not ideally refined for this image.
The final image in Figure 25 depicts the result of applying stroke-width
analysis and a threshold of 0.4 to the picture. As stated previously, the stroke-width
measurement is calculated as the standard deviation of the stroke widths divided by
the mean of the stoke widths. In this example the stroke-width threshold has been
entered as 0.4. Those areas of the image with a stoke-width measurement greater than
0.4 are indexed and identified as likely text regions.
Use of stroke-width analysis has been partially successful in removing some
of the remaining non-text regions, particularly the significant ‘blob’ of colour to the
top-left of the image. However there are still several areas of non-text regions which
have not been eliminated. Perhaps of even greater significance is the fact that
application of the stroke width threshold has actually resulted in removal of one of the
license plate characters as a potential text character. In this instance the ‘W’ does not
meet the specified criteria.
There are a number of possible reasons for the disappointing results obtained
in this example. One of these is the likelihood that the number of non-text regions
45
remaining after the geometric property thresholding had been applied has resulted in a
skewed calculation of the stoke-width average which is not primarily based on actual
text character values.
Another potential reason for the removal of the ‘W’ character is that the
threshold setting may be too low when applying the stoke-width analysis. Although
increasing the threshold may result in this character being detected it may also result
in additional non-text regions being identified.
The results demonstrated in Figures 26, 27 and 28 will show how making
changes to parameter values in the image processing algorithm can improve or worsen
the overall performance in detection of RoIs.
In Table 2 the stroke-width threshold has been increased from 0.4 to 0.5 with
the other parameters remaining constant from the previous example. The expectation
if that the increased stroke-width threshold will result in inclusion of the ‘W’
character as a detected text region. However this change is not going to be a panacea
for the many non-text regions seen previously.
Table 2 Parameter Values - Second Iteration
Parameter Threshold Value
Aspect Ratio >3
Eccentricity >0.995
Solidity <0.3
Extent 0.2<OR<0.9
Euler Number -4
Stroke-width Threshold 0.5
As in the previous example, Figure 26 depicts the effect of the three important pre-
processing techniques on the input image.
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis
Thesis

Más contenido relacionado

La actualidad más candente

Final Thesis Presentation Licenseplaterecognitionincomplexscenes
Final Thesis Presentation LicenseplaterecognitionincomplexscenesFinal Thesis Presentation Licenseplaterecognitionincomplexscenes
Final Thesis Presentation Licenseplaterecognitionincomplexscenesdswazalwar
 
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APPLICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APPAditya Mishra
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVEditor IJCATR
 
License plate recognition
License plate recognitionLicense plate recognition
License plate recognitionrahul bhambri
 
Anpr based licence plate detection report
Anpr  based licence plate detection reportAnpr  based licence plate detection report
Anpr based licence plate detection reportsomchaturvedi
 
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...IJMTST Journal
 
Automatic number plate recognition using matlab
Automatic number plate recognition using matlabAutomatic number plate recognition using matlab
Automatic number plate recognition using matlabChetanSingh134
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate RecognitionGilbert
 
IRJET - Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...
IRJET -  	  Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...IRJET -  	  Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...
IRJET - Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...IRJET Journal
 
Automatic License Plate Recognition [ALPR]-A Review Paper
Automatic License Plate Recognition [ALPR]-A Review PaperAutomatic License Plate Recognition [ALPR]-A Review Paper
Automatic License Plate Recognition [ALPR]-A Review PaperIRJET Journal
 
IRJET - Automatic Licence Plate Detection and Recognition
IRJET -  	  Automatic Licence Plate Detection and RecognitionIRJET -  	  Automatic Licence Plate Detection and Recognition
IRJET - Automatic Licence Plate Detection and RecognitionIRJET Journal
 
Number plate recognition using ocr technique
Number plate recognition using ocr techniqueNumber plate recognition using ocr technique
Number plate recognition using ocr techniqueeSAT Publishing House
 
OCR optimization for vehicle number plate Identification based on Template ma...
OCR optimization for vehicle number plate Identification based on Template ma...OCR optimization for vehicle number plate Identification based on Template ma...
OCR optimization for vehicle number plate Identification based on Template ma...IJEEE
 
Implementation of Rotation and Vectoring-Mode Reconfigurable CORDIC
Implementation of Rotation and Vectoring-Mode Reconfigurable CORDICImplementation of Rotation and Vectoring-Mode Reconfigurable CORDIC
Implementation of Rotation and Vectoring-Mode Reconfigurable CORDICijtsrd
 
A Review Paper on Automatic Number Plate Recognition (ANPR) System
A Review Paper on Automatic Number Plate Recognition (ANPR) SystemA Review Paper on Automatic Number Plate Recognition (ANPR) System
A Review Paper on Automatic Number Plate Recognition (ANPR) SystemAM Publications
 
Performance Evaluation of Automatic Number Plate Recognition on Android Smart...
Performance Evaluation of Automatic Number Plate Recognition on Android Smart...Performance Evaluation of Automatic Number Plate Recognition on Android Smart...
Performance Evaluation of Automatic Number Plate Recognition on Android Smart...IJECEIAES
 

La actualidad más candente (20)

Final Thesis Presentation Licenseplaterecognitionincomplexscenes
Final Thesis Presentation LicenseplaterecognitionincomplexscenesFinal Thesis Presentation Licenseplaterecognitionincomplexscenes
Final Thesis Presentation Licenseplaterecognitionincomplexscenes
 
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APPLICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
 
License Plate recognition
License Plate recognitionLicense Plate recognition
License Plate recognition
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
License plate recognition
License plate recognitionLicense plate recognition
License plate recognition
 
Anpr based licence plate detection report
Anpr  based licence plate detection reportAnpr  based licence plate detection report
Anpr based licence plate detection report
 
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...
 
Automatic number plate recognition using matlab
Automatic number plate recognition using matlabAutomatic number plate recognition using matlab
Automatic number plate recognition using matlab
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate Recognition
 
IRJET - Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...
IRJET -  	  Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...IRJET -  	  Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...
IRJET - Kirsch Compass Kernel Edge Detection for Vehicle Number Plate Det...
 
Automatic License Plate Recognition [ALPR]-A Review Paper
Automatic License Plate Recognition [ALPR]-A Review PaperAutomatic License Plate Recognition [ALPR]-A Review Paper
Automatic License Plate Recognition [ALPR]-A Review Paper
 
journal nakk
journal nakkjournal nakk
journal nakk
 
IRJET - Automatic Licence Plate Detection and Recognition
IRJET -  	  Automatic Licence Plate Detection and RecognitionIRJET -  	  Automatic Licence Plate Detection and Recognition
IRJET - Automatic Licence Plate Detection and Recognition
 
Number plate recognition using ocr technique
Number plate recognition using ocr techniqueNumber plate recognition using ocr technique
Number plate recognition using ocr technique
 
OCR optimization for vehicle number plate Identification based on Template ma...
OCR optimization for vehicle number plate Identification based on Template ma...OCR optimization for vehicle number plate Identification based on Template ma...
OCR optimization for vehicle number plate Identification based on Template ma...
 
Implementation of Rotation and Vectoring-Mode Reconfigurable CORDIC
Implementation of Rotation and Vectoring-Mode Reconfigurable CORDICImplementation of Rotation and Vectoring-Mode Reconfigurable CORDIC
Implementation of Rotation and Vectoring-Mode Reconfigurable CORDIC
 
Ay36304310
Ay36304310Ay36304310
Ay36304310
 
A Review Paper on Automatic Number Plate Recognition (ANPR) System
A Review Paper on Automatic Number Plate Recognition (ANPR) SystemA Review Paper on Automatic Number Plate Recognition (ANPR) System
A Review Paper on Automatic Number Plate Recognition (ANPR) System
 
Performance Evaluation of Automatic Number Plate Recognition on Android Smart...
Performance Evaluation of Automatic Number Plate Recognition on Android Smart...Performance Evaluation of Automatic Number Plate Recognition on Android Smart...
Performance Evaluation of Automatic Number Plate Recognition on Android Smart...
 
Sai Dheeraj_Resume
Sai Dheeraj_ResumeSai Dheeraj_Resume
Sai Dheeraj_Resume
 

Destacado

Face Recognition on MATLAB
Face Recognition on MATLABFace Recognition on MATLAB
Face Recognition on MATLABMukesh Taneja
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR RecognitionBharat Kalia
 
Number plate recognition system using matlab.
Number plate recognition system using matlab.Number plate recognition system using matlab.
Number plate recognition system using matlab.Namra Afzal
 
Character recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neuralCharacter recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neuralIAEME Publication
 
Research support with optical character recognition apps
Research support with optical character recognition appsResearch support with optical character recognition apps
Research support with optical character recognition appsJim Hahn
 
Presentation iwssip2012
Presentation iwssip2012Presentation iwssip2012
Presentation iwssip2012Bernhard Quehl
 
Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...eSAT Journals
 
A new secure image transmission technique via secret
A new secure image transmission technique  via secretA new secure image transmission technique  via secret
A new secure image transmission technique via secretRaja Ram
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognitionaavi241
 
Digitisation Doctor Optical Character Recognition
Digitisation Doctor Optical Character RecognitionDigitisation Doctor Optical Character Recognition
Digitisation Doctor Optical Character RecognitionSimon Tanner
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformPooja G N
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Vidyut Singhania
 
Optical character recognition of handwritten Arabic using hidden Markov models
Optical character recognition of handwritten Arabic using hidden Markov modelsOptical character recognition of handwritten Arabic using hidden Markov models
Optical character recognition of handwritten Arabic using hidden Markov modelsMuhannad Aulama
 
human face detection using matlab
human face detection using matlabhuman face detection using matlab
human face detection using matlabshamima sultana
 
MATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCRMATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCRGhanshyam Dusane
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) pptDeijee Kalita
 

Destacado (20)

Face Recognition on MATLAB
Face Recognition on MATLABFace Recognition on MATLAB
Face Recognition on MATLAB
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
 
Number plate recognition system using matlab.
Number plate recognition system using matlab.Number plate recognition system using matlab.
Number plate recognition system using matlab.
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Seminar5
Seminar5Seminar5
Seminar5
 
Character recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neuralCharacter recognition of kannada text in scene images using neural
Character recognition of kannada text in scene images using neural
 
OCR2
OCR2OCR2
OCR2
 
Research support with optical character recognition apps
Research support with optical character recognition appsResearch support with optical character recognition apps
Research support with optical character recognition apps
 
Presentation iwssip2012
Presentation iwssip2012Presentation iwssip2012
Presentation iwssip2012
 
Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...Design and implementation of optical character recognition using template mat...
Design and implementation of optical character recognition using template mat...
 
A new secure image transmission technique via secret
A new secure image transmission technique  via secretA new secure image transmission technique  via secret
A new secure image transmission technique via secret
 
Bachelors project
Bachelors projectBachelors project
Bachelors project
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Digitisation Doctor Optical Character Recognition
Digitisation Doctor Optical Character RecognitionDigitisation Doctor Optical Character Recognition
Digitisation Doctor Optical Character Recognition
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width Transform
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition
 
Optical character recognition of handwritten Arabic using hidden Markov models
Optical character recognition of handwritten Arabic using hidden Markov modelsOptical character recognition of handwritten Arabic using hidden Markov models
Optical character recognition of handwritten Arabic using hidden Markov models
 
human face detection using matlab
human face detection using matlabhuman face detection using matlab
human face detection using matlab
 
MATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCRMATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCR
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
 

Similar a Thesis

Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerAdel Belasker
 
Project final report
Project final reportProject final report
Project final reportALIN BABU
 
ImplementationOFDMFPGA
ImplementationOFDMFPGAImplementationOFDMFPGA
ImplementationOFDMFPGANikita Pinto
 
Badripatro dissertation 09307903
Badripatro dissertation 09307903Badripatro dissertation 09307903
Badripatro dissertation 09307903patrobadri
 
(Manual) auto cad 2000 visual lisp tutorial (autocad)
(Manual) auto cad 2000 visual lisp tutorial (autocad)(Manual) auto cad 2000 visual lisp tutorial (autocad)
(Manual) auto cad 2000 visual lisp tutorial (autocad)Ketut Swandana
 
A Cloud Decision making Framework
A Cloud Decision making FrameworkA Cloud Decision making Framework
A Cloud Decision making FrameworkAndy Marshall
 
Data over dab
Data over dabData over dab
Data over dabDigris AG
 
Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan Bhargav
 
Distributed Mobile Graphics
Distributed Mobile GraphicsDistributed Mobile Graphics
Distributed Mobile GraphicsJiri Danihelka
 
REPORT IBM (1)
REPORT IBM (1)REPORT IBM (1)
REPORT IBM (1)Hamza Khan
 
Bike sharing android application
Bike sharing android applicationBike sharing android application
Bike sharing android applicationSuraj Sawant
 

Similar a Thesis (20)

Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel Belasker
 
Project final report
Project final reportProject final report
Project final report
 
ImplementationOFDMFPGA
ImplementationOFDMFPGAImplementationOFDMFPGA
ImplementationOFDMFPGA
 
Badripatro dissertation 09307903
Badripatro dissertation 09307903Badripatro dissertation 09307903
Badripatro dissertation 09307903
 
iPDC Report Kedar
iPDC Report KedariPDC Report Kedar
iPDC Report Kedar
 
Milan_thesis.pdf
Milan_thesis.pdfMilan_thesis.pdf
Milan_thesis.pdf
 
(Manual) auto cad 2000 visual lisp tutorial (autocad)
(Manual) auto cad 2000 visual lisp tutorial (autocad)(Manual) auto cad 2000 visual lisp tutorial (autocad)
(Manual) auto cad 2000 visual lisp tutorial (autocad)
 
2D ROBOTIC PLOTTER
2D ROBOTIC PLOTTER2D ROBOTIC PLOTTER
2D ROBOTIC PLOTTER
 
A Cloud Decision making Framework
A Cloud Decision making FrameworkA Cloud Decision making Framework
A Cloud Decision making Framework
 
Data over dab
Data over dabData over dab
Data over dab
 
Resume
ResumeResume
Resume
 
Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan_Dissertation (1)
Mohan_Dissertation (1)
 
Distributed Mobile Graphics
Distributed Mobile GraphicsDistributed Mobile Graphics
Distributed Mobile Graphics
 
Report
ReportReport
Report
 
My PhD Thesis
My PhD Thesis My PhD Thesis
My PhD Thesis
 
Thesis Report
Thesis ReportThesis Report
Thesis Report
 
REPORT IBM (1)
REPORT IBM (1)REPORT IBM (1)
REPORT IBM (1)
 
pin-Documentation
pin-Documentationpin-Documentation
pin-Documentation
 
22024582
2202458222024582
22024582
 
Bike sharing android application
Bike sharing android applicationBike sharing android application
Bike sharing android application
 

Thesis

  • 1. Image Processing with Character Recognition using Matlab Ciaran Cooney This thesis is submitted to Dundalk Institute of Technology in partial fulfilment of the requirements for the degree of B.Eng. (Hons) in Sustainable Design School of Engineering Dundalk Institute of Technology Supervisor: Tim Daly, Paul Egan, Tommy Gartland, Alan Kennedy 2016
  • 2. i Abstract Text detection and character recognition in natural scene images is a challenging and complex operation due to the potential for varying degrees of quality expected from the input data. Therefore development of a robust and adaptable algorithm requires several stages of pre-processing to identify regions of interest before character recognition can be applied. This paper presents a methodology for implementation of a character recognition algorithm based on identification of the alphanumeric digits on vehicle registration plates. The text detection algorithm has been integrated within a system requiring initial image acquisition and a visual indication of results. The reason for this development is to promote the use of the technique in a commercial application. A wireless network and graphical user interface are incorporated to supplement the primary utility of the system i.e. image processing and character recognition. Results demonstrate the strengths and weaknesses of the techniques employed. The quality of the input image, ambient conditions and various parameters within the algorithm itself are found to impact the Optical Character Recognition (OCR) engines ability to accurately detect text.
  • 4. iii Declaration I, the undersigned declare that this thesis entitled: Image Processing with Character Recognition using Matlab is entirely the author’s own work and has not been taken from the work of others, except as cited and acknowledged within the text. The thesis has been prepared according to the regulations of Dundalk Institute of Technology and has not been submitted in whole or in part for an award in this or any other institution. Author Name: Ciaran Cooney Author Signature: Date:
  • 5. iv List of Abbreviations and Symbols RoI Region of Interest OCR Optical Character Recognition MSER Maximally Stable Extremal Regions PCB Printed Circuit Board CPU Central Processing Unit GPIO General Purpose Input/Output LED Light Emitting Diode
  • 6. v Table of Contents Abstract......................................................................................................................... i Acknowledgments........................................................................................................ ii Declaration..................................................................................................................iii List of Abbreviations and Symbols............................................................................. iv Table of Contents......................................................................................................... v List of Figures............................................................................................................ vii List of Tables .............................................................................................................. ix 1 Introduction.......................................................................................................... 1 1.1 Introduction.................................................................................................. 1 2 Literature Review................................................................................................. 3 2.1 Introduction.................................................................................................. 3 2.2 Technique..................................................................................................... 4 2.3 Optical Character Recognition..................................................................... 6 2.4 Software....................................................................................................... 7 3 Theory................................................................................................................ 11 3.8 Common Issues with Text Detection......................................................... 17 4 Methodology...................................................................................................... 19 4.2 System Design ........................................................................................... 19 4.3 Hardware Specification – Raspberry Pi 2 Model B................................... 21 4.5 PCB Design and Manufacture ................................................................... 24 4.7 MSER Regions .......................................................................................... 27 4.8 Regionprops............................................................................................... 28 4.9 Stroke-Width Variation.............................................................................. 29 4.11 OCR Function............................................................................................ 31 4.12 String Comparison..................................................................................... 31 5 Experimental Testing......................................................................................... 35
  • 7. vi 6 Results and Discussion ...................................................................................... 37 6.1 Introduction................................................................................................ 37 6.2 Basic Detection.......................................................................................... 38 6.3 Complex Detection .................................................................................... 42 6.6 Further Work.............................................................................................. 56 7 Conclusions........................................................................................................ 57 Appendix A................................................................................................................ 64
  • 8. vii List of Figures Figure 1 System Flowchart.......................................................................................... 2 Figure 2 System Design Flowchart............................................................................ 20 Figure 3 Raspberry Pi Pin Layout.............................................................................. 22 Figure 4 PCB Design ................................................................................................. 25 Figure 5 Software Design Flowchart ......................................................................... 26 Figure 6 Polling a Switch........................................................................................... 27 Figure 7 Malab MSER command .............................................................................. 27 Figure 8 MSER Example Result................................................................................ 28 Figure 9 Geometric Properties Thresholds ................................................................ 28 Figure 10 Stoke-Width Thresholding ........................................................................ 29 Figure 11 Bounding Boxes ........................................................................................ 30 Figure 12 Merging of Bounding Boxes ..................................................................... 30 Figure 13 Merged Bounding Boxes........................................................................... 31 Figure 14 OCR function code.................................................................................... 31 Figure 15 Cell Arrays................................................................................................. 32 Figure 16 'if' statement in Matlab .............................................................................. 33 Figure 17 Graphical User Interface............................................................................ 34 Figure 18 Complete System Hardware...................................................................... 36 Figure 19 Basic Detection - Input Image................................................................... 38 Figure 20 Basic Detection - MSER regions............................................................... 39 Figure 21 Basic Detection - Geometric Properties method ....................................... 39 Figure 22 Basic Detection - Stoke-width thresholding.............................................. 39 Figure 23 Basic Detection - Bounding Box comparison ........................................... 40 Figure 24 Basic Detection - Bounding Box Comparison (1)..................................... 41 Figure 25 OCR result (1) ........................................................................................... 41 Figure 26 OCR result (2) ........................................................................................... 41
  • 9. viii Figure 27 Alfa Romeo Input Image........................................................................... 42 Figure 28 Processing results - First Iteration............................................................. 43 Figure 29 Processing Results - Second Iteration ....................................................... 46 Figure 30 Processing Results - Third Iteration .......................................................... 47 Figure 31 Processing Results - Fourth Iteration ........................................................ 48 Figure 32 Complete Test (Basic) - Input ................................................................... 49 Figure 33 Complete Test (Basic) - MSER regions .................................................... 50 Figure 34 Complete Test (Basic) - Bounding Boxes................................................. 51 Figure 35 Complete Test (Basic) - Text Region........................................................ 51 Figure 36 Complete Test (Basic) - Result.................................................................. 52 Figure 37 Complete Test (Complex) - Input.............................................................. 52 Figure 38 Complete Test (Complex) - MSER regions .............................................. 53 Figure 39 Complete Test (Complex) - Post-Geometric Properties............................ 53 Figure 40 Complete Test (Complex) - Post-Stroke-width thresholding.................... 54 Figure 41 Complete Test (Complex) - Bounding Boxes ........................................... 54 Figure 42 Complete Test (Complex) - Text Region.................................................. 55 Figure 43 Complete Test (Complex) - Result............................................................ 55 Figure 44 Schematic Diagram ................................................................................... 64 Figure 45 Breadboard Construction........................................................................... 64
  • 10. ix List of Tables Table 1 Parameter Values - First Iteration................................................................. 43 Table 2 Parameter Values - Second Iteration ............................................................ 45 Table 3 Parameter Values - Third Iteration ............................................................... 46 Table 4 Parameter Values - Fourth Iteration.............................................................. 48
  • 11. 1 1 Introduction 1.1 Introduction Image processing in general and object recognition in particular is becoming an increasingly important facet in modern electronics and communications. Some of the more prevalent applications include medical imaging using fMRI (Steele et al., 2016), process automation in industrial settings (Choi, Yun, Koo, & Kim, 2012) and text detection in natural scene images (Zhao, Fang, Lin, & Wu, 2015) (Liu, Su, Yi, & Hu, 2016). The techniques deployed across these applications are wide-ranging and diverse due to the different requirements of each. With such a vast array of criteria for investigation it is necessary to define a specific area of interest. Text Detection, or Character Recognition, is a field of study with an extensive literature behind it and a burgeoning market for applications. Typical applications where character recognition is especially important include scanning of text documents, reading license plate numbers and language translation of text images. Just as there are many applications for text detection, there are many techniques and methodologies for implementation of a detection algorithm. Edge-detection, thresholding and Hough transforms are three of the most common methods employed. In fact, Otsu’s Method (Otsu, 1979) is a thresholding technique often implemented within commercial Optical Character Recognition (OCR) algorithms. License plate recognition is a standard paradigm for investigation and experimentation of character recognition techniques and is the frame in which this project has been carried out. A variety of methods have been implemented in license plate detection such as Harris Corner and Character Segmentation (Panchal, Patel, & Panchal, 2016), the use of SIFT descriptors (Yu Wang, Ban, Chen, Hu, & Yang, 2015) and probabilistic neural networks (Öztürk & Özen, 2012). Much of the preliminary work undertaken has been focused on obtaining a deeper understanding of the various techniques involved in text detection processes, particularly those related to natural-scene images. Although the theory is extremely important, practical usage must also be considered. With this, hardware and software platforms are investigated in the literature review for this project to ascertain their relative compatibility with image processing applications.
  • 12. 2 To test the efficacy of the investigation into the various detection and recognition methods a practical implementation of these techniques is developed. In most cases character recognition systems will consist of several component parts including acquisition, pre-processing and recognition. The system proposed here incorporates each of these elements within a wireless network which will provide an automated response to positive character detection and an equivalent alert to failed or negative detection. The system is framed as a method for detecting the characters of a vehicle registration plate and permitting or denying entry based on comparison of the detected text and a pre-existing vehicle-registration database. However there is inherent flexibility in the model and it may be adapted to service other applications. Figure 1 is a flowchart depicting a high-level description of the required functionality of the system. Figure 1 System Flowchart The methodology is based upon use of a central microcontroller which will acquire an image when triggered. The acquired image is then transmitted wirelessly to a laptop or PC on which the filtering and pre-processing of the image will take place. Post- processing, the image is applied to a commercial OCR algorithm which will output a digital representation of the vehicle registration number obtained. Finally, comparison of the number obtained with a database of expected numbers is carried out to determine the action of the automated response. Image Acquisition Image Transmission Pre-Processing OCR Results Comparison Automated Response
  • 13. 3 All the relevant theory, methodology and results relating to implementation of the system described are contained within the main body of this document. 2 Literature Review 2.1 Introduction Image processing and text recognition are increasingly important areas for research and development in the modern world. Sectors in which image processing techniques provide the basis for critical applications include medical, communications and security. In the medical industry image processing techniques, such as improving the quality of fMRI scans, have been employed in diagnostics (Misaki et al., 2015), with some modern applications facilitating automated diagnosis of certain conditions. Text recognition is an area with increasing relevance and the technology in this area is keeping pace with this need. One of the most impressive applications present in the literature is the use of text recognition technology in the development of a text- to-speech synthesis system(Rebai & BenAyed, 2015). Not only are the potential applications for image processing widespread but the techniques used to extract the information are equally diverse. Methods deployed are of course dependent on the desired outcome and there is no shortage of techniques that can be tailored towards a specific target. Image processing is not unlike other types of data processing in that the particular process is chosen based on the exact requirements of the intended application. With the project for which this literature review has been compiled being primarily concerned with character recognition in a static image, much of this report has been written with reference to this area (Zhao et al., 2015; Zhu, Wang, & Dong, 2015). The expected outcome of this paper is to review, understand and analyse the present literature on image processing techniques, the platforms used to implement these techniques and the applications which most commonly employ image processing as a means of achieving a desired outcome. Section 2 of the report gives an overview of the techniques employed in the processing of images, usually to extract a specific piece of information. Section 3 will discuss the operation of Optical Character Recognition (OCR), which is an adaptable algorithm designed to recognise
  • 14. 4 specific features contained within an image i.e. text. The fourth and fifth sections of the report will feature an assessment of the hardware and software platforms which could be used to implement the specific techniques associated with image processing. The report will conclude with a concise summary of the key findings from the literature review. An outline will be included providing some of the relevant information which will inform the future progress of this project. 2.2 Technique There are numerous techniques documented and discussed in the literature available on image processing. Among those most prominently featured are segmentation, edge-detection and thresholding. Of course, the technique(s) employed by researchers or professionals are largely dependent upon the requirements of a given application, although not exclusively so. In some cases the limitations of software or hardware may be the deciding factor in choices regarding technique. Edge-Detection is one of the most common approaches to segmentation with its method of detecting meaningful discontinuity in intensity values(Rafael C. Gonzalez, Woods, & Eddins). The method makes use of derivatives and generally computed using a LaPlacian filter. In their 1997 paper, (Smith & Brady, 1997) document an approach to low level image processing, labelled the SUSAN principle which was basically developed on existing edge-detection and corner protection techniques. Another method with considerable presence within the literature is the use of Moment Invariants. Moments are used to analyse and characterize the patterns contained within image and are thus useful in character recognition. For instance, Zernike moment invariants have been shown to be extremely effective in pattern recognition applications(Belkasim, Shridhar, & Ahmadi, 1991). Alongside Edge-Detection, Thresholding is one of the most commonly used techniques used in image processing, specifically segmentation. The reason for this prevalence seems to be its simplicity of implementation as well as the intuitive properties it exibits(Rafael C. Gonzalez et al.). Thresholding is used for all sorts of applications that require the extraction of information from a given image. One such application is the detection of glioblastoma multiforme tumors from brain magnetic resonance images(Banerjee, Mitra, & Uma Shankar, 2016). Global thresholding is shown in this case to estimate the statistical parameters of the “object” and
  • 15. 5 “background” of an image. The literature in this area certainly supports the view that thresholding is among the primary techniques used in image processing. As well as the most common image processing techniques in the literature exist some that are more specialized. One such technique is Nonnegative Matrix Factorisation (NMF). Problems can occur with this method and several algorithms have been proposed to solve these(Hu, Guo, & Ma, 2015). Although NMF is purported to be an effective tool for large scale data processing it is not one that is likely to be pursued for the requirements of this project. Another less prominent but interesting method sometimes used for image processing is Fuzzy Logic (Amza & Cicic, 2015). Among its current uses are in automated quality control image processing systems. It works by extracting geometrical characteristics of an object and then using this information with a fuzzy pre-filtering unit to estimate the probability of a foreign body being present on the object being analyzed. Although the use of this fuzzy logic is extremely successful in these types of applications it does not appear to be the logical approach to a text recognition application. Before the more technical aspects of the image processing algorithm are activated, it may be necessary to implement some of the more basic image processing techniques to prepare an image for this. These basic adjustments may come in the form of an image resizing, rotation or cropping, depending on the particular characteristics of the image and the data to be extracted. In an article on low-quality underwater images (Abdul Ghani & Mat Isa, 2015), the authors reference Eustace et al. by adapting a contrast-limited adaptive histogram specification (CLAHS) as a pre- processing step. In most cases, the literature presents a combination of techniques that have been chosen because of a particular capability to carry out a specific function or as a means of experimentation in order to improve existing techniques. With regards to any nascent image processing project or assignment, it is quite clear that a pragmatic approach should be taken from the outset so that a suitable technique(s) can be chosen.
  • 16. 6 2.3 Optical Character Recognition One of the more dominant themes present in the literature surrounding image processing techniques is that of Optical Character Recognition (OCR). OCR appears as the final processing step in many of the papers research on image extraction and recognition. There is clearly a wide range of applications and extraction methods that OCR can be used in conjunction with. Among some of the potential applications for the use of OCR are keyword searches and document characterization in printed documents(M. R. Gupta, Jacobson, & Garcia, 2007). A summary of the theories underpinning the OCR function is provided in Optical Character Recognition-Theory and Practice(Nagy, 1982). Among the topics discussed in this book is the classical decision-theoretic formulation of the character recognition problem. Statistical approximations, including dimensionality reduction, feature extraction and feature detection are discussed with regard to the appropriate statistical techniques. Commercially available OCR algorithms are primarily designed to interpret binary (black and white) images. However, more and more pre-processing techniques are being developed as a means of preparing images for use with this function. An example of this is the denoising and binarizing of historical documents as a pre- processing step(M. R. Gupta et al., 2007). Many researchers have pursued methods based on development of a new or unique method of extraction that can be used along with existing OCR functions (Roy et al., 2015). One of the limitations associated with OCR-based applications is that they may not work well when properties of the captured character images are significantly different from those in the training data set. A supervised adaptation strategy is one that has been developed as a potential solution to this problem(Du & Huo, 2013). Nagy et al. also demonstrated that a character classifier trained on many typefaces can be adapted effectively to text in a single unknown typeface by using a self-adaptation strategy. A further problem which can sometimes be faced when using an OCR algorithm for text recognition is the assumption that individual characters can be isolated (Fernández-Caballero, López, & Castillo, 2012). Some traditional methods of OCR implementation have less than ideal recognition performance because of the difficulty in achieving clear binary character images.
  • 17. 7 The literature clearly indicates that OCR is a vital function in relation to image processing and text recognition. However, due to some of the limitations stated above, it is important that any image be properly processed and segmented before being put through an OCR algorithm. 2.4 Software The extensive literature on image processing and text recognition techniques incorporates the use of several types of software for implementation. Whether it is due to personal preference or application specific criteria, it appears that there are a large number of platforms available for consideration when undertaking an image processing project. Software which has been developed with the specific intention of being used for image processing applications are available, often initiated from academic research. A classic example of this is ImageJ, software written in Java and designed to run on any operating system. ImageJ supports various functions and capabilities. For instance, it is able to acquire images directly scanners, cameras or video sources. The program also supports all common image manipulations including reading and writing of image files and operations on individual pixels (Abràmoff et al., 2004). The use of Labview as a tool for image acquisition and processing is an interesting proposition and does have some presence in the literature. A program named Image- Sensor Software (ISS) is one that is based on the Labview programming language(Jurjo, Magluta, Roitman, & Batista Gonçalves, 2015). Use of this type of software enables image acquisition tools such as zoom, focus and capture. The features required by the overall image recognition system must be defined by the user when programming. Matlab is a powerful piece of software with many uses in modelling, experimentation and signal analysis. Its connectivity with many advanced programming languages (like C, Java, VB) and availability of a wide range of toolboxes make it popular among the scientific and research community(R. Gupta, Bera, & Mitra, 2010). It possesses an extensive array of tools which can be harnessed in the interests of image recognition. The use of the segmentation method id particularly powerful within Matlab. It’s use has been demonstrated by tracing yarn to accurately compute useful parameters of fibre migration by statistically calculating
  • 18. 8 mean yarn axis and tracing out mean fibre axis(Khandual, Luximon, Rout, Grover, & Kandi, 2015). By employing Matlab as the means of processing an image for some form of character recognition, the user has the ability to tailor code to develop algorithms with specific image properties in mind. This may involve text or shape recognition, simple colour recognition or perhaps properties contained within the image such as depth perception. Matlab has the additional advantage of being compatible for use in connection with some form of hardware acquisition unit that may be implemented as part of an embedded system. It’s use in this context has been proven successfully(R. Gupta et al., 2010), as a method for controlling image acquisition as well as image processing. There are some specialised software packages that have been designed to facilitate a specific function. A prime example of one of these is Xmipp, software developed primarily as a means of image processing in electron microscopy(de la Rosa-Trevín et al., 2013). Graphical tools incorporated within this software include data visualisation and particle picking which can allow visual selection of some of the key parameters of an image. It can be seen from reviewing the literature that image processing software is both prevalent and sophisticated. At times it can appear overwhelming from the sheer density of techniques available, however this does suggest that the type of application being pursued in this project is very much achievable. Although not always used exclusively, Matlab is very often used as a sun-section in an overall processing technique. This seems to be due to the vast array of different commands available within its image processing toolboxes. Images can be treated using commands such as “fspecist” and “imfilter” in Matlab (HashemiSejzei & Jamzad), before being processed elsewhere for different reasons. This is certainly a consideration for the progress of the project being considered here, particularly in the earlier stages of development when the use of some of these Matlab commands could prove to be extremely informative. 2.5 Hardware As with software, hardware is an important factor that must be given careful consideration when entering into an image processing project. The relative strengths and weaknesses of a specific hardware platform must be carefully gauged with
  • 19. 9 reference to the processing requirements. Not only this, but compatibility with a chosen piece of software must be given due consideration. The presence of discussion and critique of specific hardware units is not as strong as in software. This is primarily due to the fact that most of the experimental work in this area is focused on the various image processing algorithms, which are generally cross-platform. The presence of embedded systems as a means of computing image processing is fairly extensive in the literature. An ARM processor in conjunction with Matlab and a Linux based operating system has been used to automatically identify cracks in a wall (Pereira & Pereira, 2015). Some applications may require the use of high-speed image processing systems. Due to demands that may include increasing the speed of a transform process of decreasing overall processing time, it may be necessary to design a specific architecture to support the function. This is often the case with complex algorithms which can be implemented using an FPGA for prototyping and verification (Mondal, Biswal, & Banerjee). As commented upon at the beginning of this section, there is a comparative lack of hardware-related literature. The obvious conclusion to draw from this fact is that the choice of hardware is secondary to the choices of technique, algorithm and software. However one of the key hardware considerations is the processing capability of any PC or laptop being used. A powerful CPU and specifically the inclusion of a Graphics Processing Unit (GPU) can dramatically improve the performance of any image processing application (Cugola & Margara, 2012). 2.6 Conclusions There are several component factors to be investigated when considering a project related to image processing. The relative importance of each of these factors is reflected in their presence in the literature. Certainly the techniques or algorithms to be implemented are critical factors which will determine the success or failure of a given project. As has been documented previously in this report, there are many potential techniques that can be useful in a variety of applications. This being the case, it is always an important first step to define the functionality of an application’s before determining the correct method for achieving this aim.
  • 20. 10 With one of the potential objects of a project being text recognition from a scene image, the use of segmentation and particularly thresholding techniques are very likely to be required in some form. As well as these processing techniques, Optical Character Recognition (OCR) in one form or other is almost ubiquitous across text recognition applications. As there are many commercially available OCR engines, the decision of which to use is almost entirely intertwined with the choice of software platform. Matlab for example as an OCR algorithm associated with its own image processing toolboxes. With regards to software selection for image processing functions, it appears as if this may come down to a personal preference for a particular interface in many cases. However, an analytical approach should be taken to ensure that the chosen software has the desired capabilities. A secondary, or perhaps even primary, factor worth consideration is the relative expense of some of the software available for image processing tasks. As noted in this literature review, there are free image processing programs currently available and extensively developed, although it is possible that they may come with certain compatibility issues. At the opposite end of the spectrum software such as Matlab may only include its best image processing software at additional expense, separate from the main program license. One of the key decisions to be made is in the choice between the possible implementation of an embedded system or developing the process on a PC or laptop. Depending on the overall functionality of a system, it may be more desirable to have an embedded image processing algorithm that acts as a device for detecting very specific types of data. Alternatively, the use of a PC or laptop in this area allows for continuing flexibility in the processing techniques even after completion of the final design. As with every aspect related to this topic, decisions must be primarily based upon the end-requirements of the application. Overall impressions of the available literature on image processing techniques are that the research and experimentation in this area is both extensive and expanding. It is a field that is extremely relevant in the technology and communications sector today and the work being undertaken reflects this status. Of course this means that its pace of development is exceptionally fast but it also means that the potential applications for its use will continue to grow.
  • 21. 11 3 Theory 3.1 Introduction Text detection has of course been heavily researched with multiple methods being suggested for application (cite). There are some differences in the literature as to how these methods are categorised. (Zhang, Zhao, Song, & Guo, 2013) for example, categorise these techniques into four groups: edge-based, texture-based, connected- component (CC)-based and others. However (Chen et al., 2011) have categorised these techniques into two primary classes: texture-based and CC-based. Maximally Stable Extremal Regions is the technique being employed in this case. The use of an MSER approach to text detection is advocated for several reasons. Among these are the observations that text regions tend to have quite high colour- contrasts with their backgrounds and they also typically consist of homogenous colour formations (Liu et al., 2016). The following sections introduce the theory underpinning the methodology being implemented for this image processing algorithm in various stages. Each of the key components of the algorithm are discussed indivually and their anticipated effects on a given input image stated. The theory in this section is laced with referances to Matlab and the methods available on this software for applying these techniques. The section begins with a note on the image formats typically used in this type of application. In many instances the image format itself is not a critical factor in image processing but it is nevertheless worthy of consideration. 3.2 Image Formats There are certain specifications that an input image must meet for use with the Matlab OCR function. The image classification, i.e. ‘.png’, ‘.jpeg’, ‘.tiff’ etc. is not a critical factor in this implementation but it must be a real, non-sparse value (Mathworks.com, 2016a). This simply means that the image matrix must not be populated entirely by zeroes. The OCR function accepts any of the following three input image types: M-by-N-by-3 truecolour – A true colour image is a 24-bit image (8-bits for each colour Red, Blue and Green (RGB)) such as a JPEG, capable of displaying millions of
  • 22. 12 colours (224 ) (Robbins, 2007). The quantity of possible colours is due to the fact each byte is able to represent 256 different shades. M-by-N 2D grayscale – This is an image in which all colours are a different shade of grey. One of the virtues of this format is that less information is required for each pixel. They are stored in an 8-bit integer allowing for 256 (28 ) shades of grey from white to black (Fisher, Perkins, Walker, & Wolfart, 2003b). Grayscale is a common format in image processing. M-by-N binary – In binary images pixels have only two possible intensity values. These values are typically displayed as black and white, with 0 used for black and 1 or 255 used for white (Fisher, Perkins, Walker, & Wolfart, 2003a). The binary format is often used to distinguish between text and background in pattern recognition algorithms. As stated above, the class of image is not a defining factor in the success or failure of the recognition algorithm. Due to this two image types have been implemented throughout the testing and experimentation process: PNG and JPEG. PNG is a relatively new image format and uses 24-bit true colour (Willamette.edu, 2016). Although the files can be considerably larger than the JPEG format this is not a major concern in this instance as all image files are to be deleted immediately after use. JPEG is said to be a ‘lossy format’ (Willamette.edu, 2016) as it has the potential for some data loss associated. These losses result in slight degradation of the image but have minimal impact on the visual perception of the image. JPEG is not limited in colour and is a popular format for images containing natural scenes and vibrant colours. However the vibrancy of the colour image is not a primary factor for consideration in this case. 3.3 Maximally Stable Extremal Regions The first detection method employed in the text recognition algorithm is known as Maximally Stable Extremal Regions (MSER). MSER is a technique used extensively in many image processing applications from text recognition (Chen et al., 2011) to visual tracking (Gao et al.). One of the basic principles of an MSER approach has
  • 23. 13 been defined as “blob detection” (Matas, Chum, Urban, & Pajdla, 2004), meaning that the MSER command in Matlab will return relevant information pertaining to MSER features in a given input image. Due to the fact that an input image will present significant variation in granulation, resolution and grey-scale levels, amongst other features, the roughness or smoothness of the edges within that image can vary also (Moreno-Díaz, Pichler, & Quesada-Arencibia, 2012). For this reason the blob detection is applied with an MSER algorithm for detecting sections of significant intensity within an image. The Extremal region associated with the MSER acronym is an area within an image with connected components which maintain intensity levels below a threshold. Through this technique areas of interest can be filtered to allow an OCR algorithm to attempt character recognition. 3.4 Removal Based on Geometric Properties MSER algorithms in general, particularly the Matlab one in use on this project, are quite good at detecting most of the text regions within an image. However it is not immune to the possibility of detecting other non-text stable regions present within an image. Matlab facilitates a rule-based methodology for removal of these non-text regions (Mathworks.com, 2015). The principle behind this method is the removal of unwanted regions based on a series of geometric properties that are ideal for distinguishing between text and non- text areas of an image. The regionprops command is used to measure properties of an image region. Several properties can be selected for measurement and their statistics returned; ‘Orientation’ and ‘Area’ for example. Thresholds are required to be set for each of the properties selected for measurement. This may be considered one of the more dynamic sections of the algorithm as these threshold values can be tuned to perform better with different images. An mserRegions command can then be applied to an index array with each of the geometric properties selected so that certain regions of the image can be removed. This is effectively working as a filter, eliminating those “blobs” within the image that do not conform to certain characteristics of the image text.
  • 24. 14 3.5 Stoke-Width Thresholding In an effort to obtain more consistent results a stroke width transform of the MSER regions is generated and applied to perform filtering and pairing of the connected components (Chen et al., 2011). The stroke width is computed with the bwdist command which calculates the Euclidean distance transform of a binary image. (Epshtein, Ofek, & Wexler, 2010) designed a method of stroke-width transformation based on the premise that text characters could be detected from the regions where stable stroke widths occurred. The reason for including this approach within a character detection algorithm is that it can be effectively implemented as a means of reducing background noise. This is because regions contained within the image are grouped into blocks, having been further verified as containing properties relating to likely text characters (Yi & Tian, 2011). For example, the stroke-width of the letter ‘T’ should be identical to the stroke-width of the letter ‘D’ assuming the text font is the same. However a non-text region is not likely to share this stroke-width and can therefore be eliminated as a text region. Thinning is a method of reducing binary objects in an image to strokes which are a single pixel wide (R.C. Gonzalez, Woods, & Eddins, 2010). The Matlab command bwmorph implements this approach with a series of operations including dilations and erosions. Matlab enables the programmer to set the number of iterations for which the thinning operation occurs. In fact, the number of iterations can be set to infinity (inf) indicating that the operation will continue until the image ceases to change. The results from the distance transform and the thinning operation are then combined to provide the stroke width values contained within the image. A measurement for stroke width is calculated by dividing the standard deviation of the stroke width values by the mean of the same stroke width values: Stoke Width Measurement = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝐷𝐷 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ An array index is computed which is comprised of those regions of the image with a greater stroke width measurement value than the value of the predefined stroke width threshold. It is expected that those regions with a greater than threshold value
  • 25. 15 will be the text regions of the image. This index is then subject to the operation of the mserRegions command so that the desired regions of the image, i.e. the text regions can be removed. 3.6 Bounding Boxes Bounding boxes are often employed within image processing applications as a method of making some sense from the data obtained. Examples of the use of bounding boxes include collision detection as applied to computer graphics and animation (Yao Wang, Hu, Fan, Zhang, & Zhang, 2012), and the segmentation of hand-written Chinese characters which are prone to overlap (Tseng & Chen, 1998). In a typical text recognition system it is essential that the OCR engine is able to return complete words or paragraphs, rather than a list of the individual characters acquired. To help ensure that order is maintained so that the correct registration can be obtained from an input image, bounding boxes are used to amalgamate the individual character regions into lines of text (Mathworks.com, 2015). These bounding boxes surround each of the individual text regions and can be expanded to overlap with each other, thus forming a chain of overlapping boxes which are used to form complete words, or in this case a vehicle registration number. Bounding boxes are obtained for all the regions of interest remaining from the image by concatenating the MSER properties for bounding boxes previously obtained with the regionprops command. These bounding boxes are then expanded in line with the theory of having characters overlap with their neighbours. A value for the level of expansion, or Expansion Coefficient (E.C.) is entered into the algorithm and is used to set new limits for the x and y axis of the bounding boxes. Minimum and maximum axis values for the expanded bounding boxes are calculated in the following manner: xmin = (1- E.C.) x xmin xmax = (1 + E.C.) x xmax Prudence dictates that the minor precaution of ensuring the expanded bounding boxes do not exceed the outer limits of the image. This is achieved by comparing the maximum axis limits calculated from the expansion coefficient with the axis limits defined by the size of the image. The new axis limits of the bounding boxes are then
  • 26. 16 taken as the minimum value computed from the previous comparison. This is implemented in Matlab in the following fashion: xmax = min(xmax, size(I,2)) Overlapping bounding boxes can be combined to form a single box around multiple characters. This method of merging overlapping bounding box components to make a single component has been used in the processing and segmentation of ancient historical documents (Kavitha, Shivakumara, Kumar, & Lu). However it is most often implemented to distinguish between separate words. In the case of a vehicle registration there are two likely outcomes: Either the entire registration will be surrounded by a single bounding box or the two distinct sections of the registration will be surrounded by separate bounding boxes. The effect on the overall result of one of these events occurring over the other is negligible. An overlap-ratio is applied to quantify the distance between each of the text regions detected in the image. Matlab provides a function for this purpose which is activated with the bboxOverlapRatio command. The function returns the overlap ratio between each pair of bounding boxes contained within the image. Those characters with non-zero overlap ratios are considered to be connected in the context of the bounding box and are therefore likely to exist as part of the same line of text. Any characters with zero overlap ratios are not connected and are thus considered as separate sections within the text image. A graph of these overlap ratios is generated in Matlab for to determine which regions are connected for the purpose of merging them into a single text region. 3.7 Optical Character Recognition The OCR function is applied to the area of the filtered version of the input image encompassed by the merged bounding boxes. By defining the area of the image upon which the OCR function is going to operate a more consistent detection performance is expected. Use of the Matlab command txt = ocr(I, roi) enables recognition of text in the image (I) within a specified region of interest (roi). The region of interest is the area defined by the bounding boxes generated in the detection algorithm and must take the form of one or more rectangular regions defined by an m-by-4 matrix. The width and
  • 27. 17 height of the region of interest is determined by x and y coordinates established for the bounding boxes and these must not extend beyond the area of the image. Almost all of the commercially available OCR functions are designed to operate on binary images and Matlab is one of these. Matlab’s OCR function uses Otsu’s method of thresholding (Otsu, 1979) to convert an input image into a binary equivalent before the recognition process is implemented. Otsu’s method has been demonstrated to exhibit better overall performance in OCR than other techniques (M. R. Gupta et al., 2007). Modern OCR algorithms like the one employed by Matlab add multiple algorithms of neural network technology to analyse character stroke-edge. This stroke edge is effectively the collision point between the concentration of character pixels and the background image. The algorithm takes averages of the black and white values along the edge of each character. The result is then matched to the characters contained in the dataset and the closest estimation is selected as the output character (Potocnik & Zadnik, 2016). When the OCR algorithm has completed the recognition process the results are printed in the Matlab command line with the following entry: [txt.Text]. Should the user require information on the properties of the OCR output the command ocrText contains recognised text and metadata collected during optical character recognition (Mathworks.com, 2016b). However, some of these features are not available with the student edition of Matlab used during this project. 3.8 Common Issues with Text Detection There are many difficulties associated with character recognition is scene images. Typically there is a significant amount of inter-character and intra-character confusion leading to mistaken identification (Mishra, Alahari, & Jawahar, 2016). For instance, partial capture of a character can result in it being recognised as a completely different alpha-numeric digit. Extraction of text regions from natural scene images is a challenging task due to the many factors influencing the quality of detection. These factors include variation in light intensity, alignment of text, colour, font-size and camera angles (Zhang et al., 2013). Some text components do not display a high level of colour contrast and fail to be detected with MSER (Liu et al., 2016). This is one of the weaknesses associated
  • 28. 18 with implementation of an MSER methodology. However, under favourable lighting conditions should not pose a substantial problem as vehicle registration plates have reasonably distinct contrast between character and background regions.
  • 29. 19 4 Methodology 4.1 Introduction Implementation of a system which will acquire an image, process it and provide automated indication of the success or failure of the operation requires an elaborate methodology to incorporate each of the individual components into one design. This section documents how the complete system has been put together. There are two distinct sections covered in this methodology section. The first is the hardware element of the project containing all the communications involved, including image acquisition and an automated response. The second is the procedure implemented for processing of the acquired image and extraction of the desired text region. The overall system design is discussed, providing insight into how the various components are expected to interact. The selection of the Raspberry Pi 2 Model B and the specifications of this microcontroller which lend itself to the application are documented before the additional circuitry required of the system is discussed with particular regard to the design of a PCB. Following the hardware description of the project a detailed description of the image processing algorithm is provided. This description discusses each of the major sections of the algorithm individually, highlighting the effects of each technique on a given input image. As stated previously, this image processing algorithm is the technical focus of the project and the level of detail reflects this. The concluding paragraphs of this methodology section are intended to provide the reader with information on how the extracted number plate text is compared to an existing text string and how this comparison is used to provide indication of recognition. 4.2 System Design Figure 2 is a flowchart depicting how the overall system to be implemented has been conceived.
  • 30. 20 Figure 2 System Design Flowchart Having framed the project in the context of a system for extracting the digits from a vehicle registration plate and using these to produce an automated response, each of the six steps in the flowchart are essential elements in this process. Beginning with image acquisition, it is conceived that some form of wireless sensor network will be used to trigger a camera to capture an image. This may be something analogous to an infrared transmitter/receiver (IR) circuit. In this case an IR sensor would be positioned to allow an incoming vehicle to break the beam and consequently cause the camera module to acquire the image. In order to facilitate image acquisition that would be automated in this way a microcontroller will act as the central node in the system. Some of the microcontrollers with potential for selection are documented in the literature review section. The second stage of the system design is entitled ‘Image transmission’. The concept behind this title is that the acquired image will be transmitted wirelessly to a laptop or PC for image processing and character recognition. The central microcontroller must be equipped with a wireless protocol such as Wifi or Bluetooth and be capable of transmitting the data in this way. The third and fourth stages of the system design are Image Processing and Optical Character Recognition. Matlab has been selected as the software platform for implementing this process for several reasons, including its Image Processing Toolbox and OCR engine. The image processing stage of the system will incorporate Image Acquisition Image Transmission Pre-Processing OCR Results Comparison Automated Response
  • 31. 21 a series of steps designed to provide the best possible image for the OCR function to operate on. This will involve some of the methods mentioned in the introduction and literature review of this document. The OCR function is used to produce a text string output which is expected to match the characters present in the input image. The result from the OCR function will then be used alongside some existing database of vehicle registration numbers in a comparison function which will determine whether or not the character string obtained is one of the registrations expected. Finally, the result from the comparison function, which will be a Boolean 1 or 0, will be used to initiate an automated response, tailored to each condition. 4.3 Hardware Specification – Raspberry Pi 2 Model B The Raspberry Pi 2 Model B is the second generation of the Raspberry Pi microcontroller and has been selected as the central device for this project. The device offers a flexible format for embedded projects, particularly those requiring low power (Raspberrypi.org, 2016b). There are several features of the Raspberry Pi that make it an ideal candidate for selection in this project. Central to this is the 900MHz quad- core ARM Cortex-A7 CPU with 1GB of RAM. According to (Arm.com, 2016), the Cortex A7 is the “most power-efficient multi-core processor.” This becomes a particularly important factor when considering the sustainability of a given system or product. The Cortex-A7 allows the Raspberry Pi to run at 1.2-1.6GHz while requiring less than 100mW of total power in typical conditions (Arm.com, 2016). The low- power and high performance of the Raspberry Pi has led to its implementation in many power-critical projects. (Tomar & Bhatia, 2015) employ the Pi as the central device in development of a Software Defined Radio (SDR) for use in disaster affected areas. Wireless sensor networks are a common application for this type of microcontroller and the Raspberry Pi compares favourably with devices such as the Arduino Uno (Ferdoush & Li, 2014). Additional features of the Raspberry Pi making it a suitable device for the type of application considered for this project include the following (Raspberrypi.org, 2016b): • 4 USB ports • 40 GPIO ports • Full HDMI port
  • 32. 22 • Ethernet port • Camera interface • Display interface • MicroSD card slot • VideoCore IV 3D graphics core The camera interface included in the list above enables the user to connect the custom-designed add-on module for Raspberry Pi hardware (Mathworks.com, 2016d). This small and lightweight device supports both still capture and video mode, making it ideal for mobile projects. In still capture mode the camera has a 5 megapixel native resolution, supporting 1080p30 and 720p60. The Raspberry Pi camera module is popular in home security applications and wildlife camera traps and is often used for time-lapse and slow motion imaging (Raspberrypi.org, 2016a). The GPIO pins on the Raspberry Pi model B are an essential element in its use as the central node of a system as they facilitate connection with external electronic circuitry and sensors (Vujović & Maksimović, 2015). These pins can accept input and output commands which can be programmed to act as required. With particular reference to this project these input pins can be used to monitor the status of switches or sensors which can be implemented as triggers for other components of the system. The pin layout in Figure 3 can be seen in the diagram taken from element14.com, included below: Figure 3 Raspberry Pi Pin Layout
  • 33. 23 As witnessed by the pin diagram, the Raspberry Pi model B is equipped with several DC power lines which can be used as a power source for external circuitry. In terms of portability and using the microcontroller remotely this is a powerful feature as it eliminates the necessity for further external power supplies which may otherwise be required. The facility to integrate a wireless network, database server and web server into a single compact, low-power computer, which can be configured to run without a monitor, keyboard or mouse is a major advantage when working with the Raspberry Pi (Ferdoush & Li, 2014). This became a particularly important feature for use in this project as the Pi could be controlled remotely following initial setup. As the system was developed and became more refined, the wireless element grew in importance, not only as a means of data transmission but as a method for implementing overall control. For this reason the selection of the Raspberry Pi for the hardware requirements of the project proved correct. There are a several options for powering the Raspberry Pi with the condition that the source is able to provide enough current to the device (Vujović & Maksimović, 2015). The device is powered by 5V from a micro-USB connector; however the current requirements differ for each model of the device and depend on the number of connections drawing power from the microcontroller. For the model being used in this case (2B), a PSU current capacity of 1.8Amps is recommended (Raspberrypi.org, 2016c). With a device such as the Raspberry Pi acting as the central node of a system like this one, there is a possibility that an excessive number of parasitic devices may be connected and drawing current that the Pi cannot facilitate. It is therefore essential that the number of connected devices and components are kept to the minimum required. Typical connections to the Raspberry Pi including HDMI cable, keyboard and mouse require between 50mA and several hundred milliamps of current (Raspberrypi.org, 2016c) and the camera module being used here requires a significant draw of 250mA. Those external devices are required during the testing and prototyping stages of this project. However, due to the specification of the system some of these current drawing devices are not required for the final construction. With remote connectivity there is no need for GUI-related connections to the Raspberry Pi, thus relieving the power-burden on the device somewhat.
  • 34. 24 4.4 Wireless Network The system design specifies that some form of wireless network is used for communication between the microcontroller and the computer containing Matlab. Wifi has been selected as the protocol for this purpose and there are several reasons behind this decision. The prevalence of Wifi in commercial and academic premises makes it an easily accessable resource for implementation of this system. Wifi also enables greater range than could be provided by a single Bluetooth device. The use of Wifi for transmission of the acquired image is not a major concern as only one picture is being sent at any one time. The simple fact that Matlab is able to communicate directly with the Raspberry Pi by forming a connection via the devices IP address made the selection of Wifi a certainty. An IP address along with a username and password for the Raspberry Pi is all that is required to enable remote control of the device from Matlab. The choice of Wifi as the network model may have been premature in regards to experimental testing of the system due to the intermittent coverage in the lab setting. This issue is discussed further in the section of this paper relating to testing. 4.5 PCB Design and Manufacture As the primary area of investigation and experimentation undertaken for this project is the image processing and character recognition elements, a model for some of the hardware requirements is necessary to ensure effective use of the time allocated. The use of modelling particularly relates to the inputs and output of the system i.e. the initial triggering of acquisition and the automated response. As the initial triggering of the camera module is premised on a traditional IR sensor a simple push-button switch can be used to model this action. In a real world- scenario the automated response of the system may be used as a means of enabling or restricting access to a parking facility or even alerting an operator. The Raspberry Pi GPIO pins can be utilised to initiate an automated response and in this case the use of two LEDs has been chosen as the method for affirming the results of the overall system. A green LED will denote positive detection and recognition while a red LED with signify the corresponding negative result.
  • 35. 25 Initial testing of the system in this configuration required that construction of a circuit be carried out on a breadboard. Once successfully tested and a final design settled upon, this circuit could be designed and constructed as a Printed Circuit Board (PCB). The circuit design incorporated two push-button switches. One to simulate the arrival of a vehicle at the position where image acquisition takes place and a second to simulate the end of the operation and system reset. These two switches are connected to one of the Raspberry Pi GPIO pins which will be polling for a change in state. The two LEDs being used to simulate the systems output response are also connected to GPIO pins on the Raspberry Pi. Due to a lack of intensity experienced while testing with the LEDs two NPN transistors have been included in the circuit to enable extra current to be driven to the LEDs. The design of the circuit can be seen in the appendix and the PCB design in Figure 5. Both the schematic and PCB layout have been drawn on proteus. Figure 4 PCB Design 4.6 Software Structure To implement the system on Matlab several important requirements of the design specification must be met. This requires a systematic approach to development of the program to ensure that none of the critical stages are overlooked. Figure 5 is a flowchart depicting the various stages of the software design as it has been programmed in Matlab.
  • 36. 26 Figure 5 Software Design Flowchart
  • 37. 27 The first critical objective of the program is to connect with the Raspberry Pi device and take control of the onboard camera module. At this point the external LEDs are set to ‘0’ to ensure they are not considered as false positives. The program then ‘polls’ the appropriate GPIO pin which is connected to the switch being used to trigger image acquisition. This ‘polling’ effectively sees the system wait for this switch to be pressed before any other action can begin. Figure 6 shows how this has been implemented two simple lines of code. Figure 6 Polling a Switch When the switch is finally pressed the Raspberry camera model aquires an image and it is transmitted to Matlab on a laptop. The image is saved into the associated Matlab folder and applied to the image processing algorithm. The image is converted to the grayscale format before it is processed through each of the stages discussed in the theory section. The method employed for applying these techniques in Matlab is documented in the following sections. 4.7 MSER Regions As stated previously in the theory section, the initial phase of image processing is application of the MSER technique. The Matlab command detectMSERFeatures seen in Figure 6, returns information on region pixel lists and is used to determine the ‘blob’ regions in an image. Figure 7 Malab MSER command The lines of code in Figure 6 show how this command is implemented in the program. There are a number of parameter values associated with the command, allowing the user to determine certain ranges depending on their application. The ‘RegionAreaRange’ facilitates the size of the detected regions in pixels and can be adjusted within the range of 30 to 14,000. In Matlab’s user guide the ‘ThresholdDelta’ value is stated as a method for specifiying the threshold intensity levels used in selecting Extremal regions while testing for their stability. Put simply a greater parameter value will return fewer regions of interest.
  • 38. 28 The parameter values seen in Figure 6 have generally been used thoughout testing but they can be adjusted and additional parameters included if required. The image in Figure 7 has been operated on by the MSER technique discussed and exhibits all the potential text regions detected. Due to the relatively wide scope of this image a large number of MSER regions have been returned. The weakness of this technique is obvious from this image as the number of non-text regions indentified vastly outnumbers the text regions. Figure 8 MSER Example Result 4.8 Regionprops The Matlab command regionprops is deployed to apply removal of MSER regions based on geometric properties. Data from the MSER regions must be converted to linear indices so that it can be operated on with regionprops. The regionprops command then measures and returns statistical analysis of the MSER regions previously identified. There are numerous property types which can be used for geometric thresholding and selection may depend on the requirements of an application. Thos properties selected in this instance are included in the section of Matlab code in Figure 8. Figure 9 Geometric Properties Thresholds The ‘Extent’ parameter for example, returns a value that specifies the ratio of pixels in the region to pixels in the total bounding box (Mathworks.com, 2016c). A threshold range for this property is set, as in Figure 8, remvoving MSER regions based on these
  • 39. 29 criteria. ‘Eccentricity’, ‘Solidity’ and ‘Euler Number’ are each calculated in a similar manner using the regionprops command. The ‘Aspect Ratio’ is calculated as the ratio of the height of the image area to its width. Information is extracted from the bounding box regions of the image using regionprops and the ratio calculated as the width divided by the height. A threshold is applied in the same way as with the other geometric properties. MSER regions determined by the thresholds are removed from the image based on this technique. It is anticipated that this would result in a significant reduction in the number of those non-text regions present in an image like the one seen in Figure 7. 4.9 Stroke-Width Variation A series of steps are required for analysis of the stoke-width of the region images before a threshold can be used to elimintate the remaining non-text regions. The padarray command is used to ‘zero pad’ the image region, effectively encasing it in a number of zeroes along its edge. This is to avoid corruption due to boundary effects which can occur as a result of filtering (stackexchange.com, 2016). In Matlab the bwdist command is used to calculate the distance transform of a binary image. This function calculates a number that is the distance between a given pixel and the nearest non-zero pixel. Morphological thinning is then applied to the image to remove some foreground pixels from the image. This is commonly known as skeletonisation and produces a drastically thinned image which retains the connectivity and form of the original. The results of the distance calculation and the thinning operation are combined to determine the stoke-width values present in the image. The standard deviation and the mean of the values are used in a calculation to determine a stoke-width measurement. This measurement is then used along with a threshold value with the intention of removing all remainin non-text regions. The section of Matlab code used to determine the stroke-width measurement and threshold is included in Figure 9. Figure 10 Stoke-Width Thresholding
  • 40. 30 4.10 Bounding Boxes As stated in the theory section, bounding boxes are used to bring form to the data present in the image. Matlab is equipped with considerable functionality for applying bounding boxes and the process is initiated by determining bounding boxes for each of the remaining text regions. These bounding boxes can be expanded slightly to help ensure overlap between connected components. This is achieved by applying a small expansion amout to the bounding boxes and is an important feature in determing the structure of a text string returned from the OCR function. The effect of varying the expansion amout is discussed in greater detail in the results section. Figure 10 shows the effect of applying bounding boxes to each of the character regions and a clear overlap is clearly visble among the components. Figure 11 Bounding Boxes A bounding box overlap ratio is calculated and graphed so that connected regions within the image can be identified. These connected components are then merged together based on a non-zero overlap ratio to form a text string or word. In Figure 11 the lines of code remove the bounded boxes that only contain single components and the text region presented to the OCR function is displayed in Figure 12. Figure 12 Merging of Bounding Boxes The example in Figure 12 is an ideal scenario as the entire number plate has been identified as a single text string, making future comparison with stored registrations much simpler.
  • 41. 31 Figure 13 Merged Bounding Boxes 4.11 OCR Function The OCR function used for this project is an existing function in Matlab, the operation of which has been discussed in the theory section of this document. In terms of the methodology employed for using this function the process is a simple matter of applying the text image, appropriately processed, with the correctly merged bounding boxes to the OCR command. The section of code in Figure 13 depicts how this is accomplished. Figure 14 OCR function code The result of the OCR function is a text string of the recognised alphanumeric digits printed in the Matlab command line. 4.12 String Comparison For the digits recovered from a vehicle registration plate to be relevant in any kind of automated system a method is required for comparing them with what is expected or required. In an operational, fully-automated system the alphanumeric digits extracted from an image may be compared with an extensive database of all registration numbers cleared for access. The result of the comparison would be a simple positive or negative, resulting in action or inaction. This seems to be a relatively simple procedure but due to the various data types and array structures present in Matlab, a certain amount of manipulation is required to implement direct comparison. Matlab is equipped with a function for comparing strings, called as strcmp(A,B), which returns a true(1) or false(0) depending on whether or not the strings match. It is important to note that the data operated on within this function must be of the same type. Therefore the 1x10 character array generated from the OCR
  • 42. 32 function cannot be directly compared with the string entered as the expected registration digits. The solution to the problem of comparing different data types is to convert them both to a mutual type. This requires the use of the cellstr(S) function in Matlab which facilitates the creation of a cell array of strings from any character array. A cell array in Matlab is one whose elements are cells. Each cell in a cell array can hold any Matlab data type including numerical arrays, character strings, symbolic objects and structures (Hanselman & Littlefield, 2001). Taking the example of a vehicle registration plate accurately detected by the OCR engine as ‘XJZ 7743’, the answer returned is a 1x10 Character Array and is stored in Matlab as such. Entering the string B = 'XJZ 7742' is stored in Matlab simply as the value ‘XJZ 7747’. Comparisons of these two results with the string compare function returns a ‘0’ as the values are in different formats. This error is overcome by creating two cell arrays from the stated values. The lines of code in Figure 14 below show the method for comparing these two cell arrays: Figure 15 Cell Arrays B is the manually entered string to provide comparison with; A is the output of the OCR function converted to a character array; D is this OCR output generated as a 1x1 cell array; E is the string generated as a 1x1 cell array. F is the result of comparison between the cell arrays D and E using the string compare function. In this example the output of the string comparison function, F, is equal to ‘1’. This provides positive confirmation of a match which can then be implemented as a condition for the execution of an automated response function. 4.13 Indication of Recognition In order to indicate whether or not the system has produced a positive match and to represent an automated response to this match, some form of output from the system
  • 43. 33 would be required. The basic premise of this function, as stated previously in this report, is to enable or block entry to a parking facility and to alert an operator when this is considered necessary. The Raspberry Pi provides a suitable platform for this purpose as its GPIO pins can be implemented to trigger an external response to the system inputs. In real- world applications this output may be tailored to meet the specific requirements of a given system. For example, a servo motor may be triggered to raise a barrier or an alarm sounded to alert a system operator. In this case a simple LED can be used for simulation and testing of the efficacy of the Matlab code and external circuitry. In Matlab code an ‘if’ statement can be used to determine a response which is dependent upon the presence of a specified input condition(s). For instance, it may be used to implement a certain set of conditions when the ‘if’ statement is ‘true’, otherwise the status-quo persists. Alternatively it could be used to determine an output based on several potential input conditions, determining the required output upon the presence of a given condition. For testing the output of the system the input condition is provided for by the result of the string compare function discussed in the previous section. Therefore the code could be compiled to trigger some form of response when the output of the string compare function (F) is equal to ‘1’. In cases where the OCR function is unable to determine a positive match F is equal to ‘0’, in which case the system can be configured to produce no response at all or an alternative response such as a red light to indicate that the comparison is negative. A section of code containing the ‘if’ statement is inserted in Figure 15below. Figure 16 'if' statement in Matlab The initial condition to this section is ‘F == 1’. When this condition is met due to a positive match from the OCR function and the string compare function, a digital output pin on the Raspberry Pi is sent HIGH. An ‘else’ statement is included to ensure that should this condition not be met the digital output pin will remain LOW. To provide visual indication of a positive or negative match an external LED is connected to the relevant output pin via a transistor. The transistor is required to
  • 44. 34 provide enough current so that the LED is easily visible. The current provided from the GPIO pins on the Raspberry Pi is insufficient for this purpose. In the event of a positive match, i.e. F==1, the green LED is switched on. When the code is run and the result is a negative match, then the red LED will be switched on. A single iteration of the system is completed when the second push-button switch is pressed as this switched off all external LEDs, closes all open figures, deletes the input image and exits the While Loop. 4.14 Graphical User Interface For the purpose of improving the utility of the character recognition algorithm a Graphical User Interface (GUI) has been developed in Matlab. The software provides tools for creation of the GUI and facilitates inclusion of push-buttons, graphs and text etc. Among the virtues of using a GUI in Matlab is that it can disguise a vast and complex program code behind an easy to use interface. The GUI created for this system can be viewed in Figure 15. Figure 17 Graphical User Interface The image in Figure 15 depicts the GUI following acquisition of an image. Text regions and the detected text are displaye on the screen which may be useful to a system operator. The GUI also enables the user to establish connection with the remote device and to manually override the system. Finally the Raspberry Pi can be shut down remotely by pressing the associated push-button on the GUI.
  • 45. 35 5 Experimental Testing 5.1 Initial Testing Testing has been carried out on the various components of this project throughout the duration of the academic year. Having considerable elements of both hardware and software, testing required a modular methodology to ensure that each part of the system worked correctly before it could be integrated within the overall design. Testing of the image processing and character recognition aspect of the project required considerable experimentation to ascertain the effectiveness of various components and to understand the reasons for disappointing results. Clearly not all of these tests can be documented in the results section but a detailed overview and analysis of the work undertaken is provided. Many of the experiments employed previously acquired images of vehicles taken from different angles and distances to provide an adequate range of complexity. These can then be used to run experiments on the image processing algorithm without having to include the communications element of the project. This testing methodology led to several instances of successful recognition, enabling the project to progress towards integration of a complete system. As well as applying different images to the processing algorithm, tests included making changes to certain elements of the program to observe the results and use the information to refine the process. Results from this type of testing are provided in the Complex Recognition section. Initial testing of the hardware elements of the system have been carried out by interfacing the Raspberry Pi with a circuit constructed on a breadboard. These tests were designed to determine the effectiveness of the switches and LEDs for modelling the acquisition trigger and the automated response. These intial test results proved successful, enabling work to proceed on the PCB design and manufacture. Running concurrently with the breadboard testing was testing of the wireless communication between the Raspberry Pi and a laptop computer. Simple testing such as programming LEDs to flash progressed on to more complex tasks such as transmitting an image from microcontroller to laptop.
  • 46. 36 5.2 Complete System Test Complete system testing combined each of the component elements of the project to determine whether or not it would operate as expected. This process has not proceeded as smoothly as anticipated although it has provided some positive results. The hardware used for the complete system test, including the Raspberry Pi and PCB can be viewd in Figure 15. Figure 18 Complete System Hardware On a number of occasions the system has operated entirely as expected, providing a lit green LED to signify correct recognition of the input characters. However certain issue have arisen with the system that has limited the time spent on refining the final product. For example, the system when left idle for a significant period of time tends to lose connectivity with the wireless network in the lab. This can lead to errors when attempting to reconnect, as the program running in Matlab considers the device as being still connected but unable to respond to commands. However this appears to be an issue with the network itself as testing in other venues has not produced the same problem. An additional feature of the complete system test was the discovery that the Raspberry Pi camera module tended to deliver four snapshots to the Matlab program when only one is expected. This would lead to a type of backlog in which triggering of the camera module would lead to Matlab processing a leftover image that may have been acquired several minutes earlier. This particular issue was solved by making some minor adjustments to the Matlab code to ensure that only one image is acquired with each iteration.
  • 47. 37 5.3 Limitations to Testing Several limitations to testing of the system have been experienced, some of which may be relevant to the results obtained. Perhaps the most debilitating of these has been the difficulty in obtaining and maintaining adequate wireless connectivity in the lab. Intermittent connectivity led to a significant amount of time being expended on troubleshooting network problems. Occasionally it was not possible to establish any connectivity between the Raspberry Pi and the college network, making testing of the overall system very difficult. On reflection, a more prudent approach may have been to perform all testing with an Ethernet connection to avoid time wasted on wireless issues. Final completion of the system could, in that case, have incorporated the wireless element. Lighting proved to be something of a restriction to results obtained from live input images. The less than adequate lighting in the lab setting combined with intermittent changes in intensity due to sunlight made consistency of results extremely difficult during testing. However this can also be interpreted as a positive aspect as solutions to these problems are required in real-world scenarios. Finally with regards to limitations, it is important to understand that all of the testing completed for this project has been in relation to static text images. What is meant by the word static is that the text content of the image is stationary at the moment of image acquisition. This is in contrast to more advanced systems that use sophisticated techniques to extract text from moving vehicles for example. 6 Results and Discussion 6.1 Introduction The results obtained from testing of the image processing algorithm and the overall system are numerous and generally successful in relation to prior expectations. The Raspberry Pi is able to acquire an image when triggered. This image can be transmitted to a laptop wirelessly via a Wifi network where it is applied to the image processing algorithm. In many instances the correct characters are obtained and a green LED switched on in response.
  • 48. 38 With specific regards to the image processing and character recognition element of the project it is important to understand that those results demonstrated in the following paragraphs have been obtained through many stages of experimentations with the various components of the algorithm. It is not possible to discuss the result of each test but a detailed overview is provided. Not all tests have been successful in achieving the desired target of the system i.e. to correctly identify the characters in a vehicle registration plate. However each unsuccessful test has provided information on the effects of the various processing techniques which has helped in refining elements of the program. Some of the more interesting results, obtained from unsuccessful tests, are documented in the Complex Recognition section of this report. The Basic Detection section will show how the algorithm has been successful in identifying text regions and recognising them correctly as those in the input image. The title of the section relates to the relative complexity of the input image which is a primary reason for the positive results. The Complex Detection section employs a series of examples to demonstrate how changes to thresholding parameters in the algorithm affect its performace. The results presented are supplemented by discussion and analysis of the overall system and recommendations for further work. 6.2 Basic Detection The image in Figure 16 is one used prominently in tests carried out throughout the duration of this project and is a typical example. The particular features of this image that make it conducive to character recognition are the clearly defined black character regions against a yellow background, the lack of external image regions that may be miscalculated as potential text regions and the close to ideal angle from which the image has been acquired. Figure 19 Basic Detection - Input Image In Figure 17 the input image has been converted to grayscale and the MSER regions technique applied. With a fairly basic image like this one it is anticipated that this
  • 49. 39 method should have no difficulty in detecting all of the character regions and should only detect minimal non-text regions, or perhaps even zero non-text regions. Figure 20 Basic Detection - MSER regions As expected, the character regions have been detected with only two non-text regions below the number plate being identified as potential text. With so few non-text regions initially detected due to the lack of complexity in the image the next two stages of the algorithm have a greater chance of identifying the text regions which are to be operated on by the OCR function. Figure 18 depicts the effect of applying the regionprops command and statistical thresholding to the image post-MSER. Figure 21 Basic Detection - Geometric Properties method In this case, as in many of the experiments with this type of image, the MSER regions remaining from Figure 18 have been removed, leaving only text regions remaining. The effects of applying stroke-width thresholding can be viewed in Figure 19. Figure 22 Basic Detection - Stoke-width thresholding The fact that the second stage in the pre-processing algorithm has successfully identified all of the text regions makes the third stage somewhat redundant in this instance. This can be a common occurrence in image processing algorithms when well cropped images like this one are involved. However it is necessary to include the stoke-width thresholding stage due to its effect on the consistency of results in more complex situations. The importance of including each of the three pre-processing stages will be made clear in the following section.
  • 50. 40 One of the problems encountered when attempting to generate a character output was in returning the full registration in the correct order. Following the stroke- width thresholding, bounding boxes are applied to the image in an attempt to form a coherent structure from the data. As stated in the methodology section these bounding boxes are calculated within Matlab but can be adjusted to suit specific applications. Due to the fact that there are two distinct sections, “LLZ” and “2268” and the bounding boxes are included to establish text regions, the resultant output tended to return the two sections in reverse order. A certain amount of trial and error can be required to overcome an issue like this one but adjustments to the expansion amount required for increasing the size of each box proved effective in overcoming the issue. Figures 20 and 21 show how the bounding boxes have been applied differently in two iterations of the same algorithm. Figure 23 Basic Detection - Bounding Box comparison In Figure 20 the bounding boxes have been applied using the associated Matlab command. However two different expansion amounts have been used to extend the jurisdiction of the boxes. In the top image in Figure 20 a relatively small expansion amount has been applied. Although this has resulted in most of the character components being connected, the central aperture between ‘Z’ and ‘2’ has resulted in these two not being identified as connected components. With the expansion amount increased, the second image in Figure 20 shows how these larger bounding boxes extend over a greater area and result in slight overlap between the ‘Z’ and ‘2’. With the overlap ratio set to zero, all connected components are considered as part of a single line of text. The effect this process has on the input to the OCR function can be seen in Figure 21.
  • 51. 41 Figure 24 Basic Detection - Bounding Box Comparison (1) The top image in Figure 21 shows how a small expansion amount can result in a vehicle registration plate being separated into two distinct lines of text. This is an unwanted situation as it can lead to errors when comparing the text string with an existing database of registration numbers. In the second image the increased expansion amount has ensured that the OCR function will consider the text regions on the image as a single string. This is the ideal scenario when inputting the image to an OCR function as it eliminates alternative interpretation of the order of the data. The algorithm has been extremely successful in identifying and correctly recognising the characters when operating on basic input images like the one in Figure 16. The processed image, having been applied to each of the stages documented in this section is applied to the Matlab OCR function, which provides a result based on its interpretation of the image. Comparing the edges of the character regions in the image it returns a text string based on correlation to existing templates. Figures 22 and 23 show the result of the OCR operation on the processed image, as printed on the Matlab command line. Figure 25 OCR result (1) Figure 26 OCR result (2)
  • 52. 42 In Figure 22 the result has been returned as two distinct text strings. Although it has returned the correct characters and proven the effectiveness of the various pre- processing stages as well as the OCR function it is preferable that the result in a single line of text. 6.3 Complex Detection Results obtained from the image processing algorithm highlight the importance of ambient conditions, the quality of the input image and the effectiveness of well- refined thresholding properties. One of the more interesting aspects of the experimentation process has been the fact that each iteration of the algorithm provides information on the functional operation of the process, regardless of whether or not positive recognition has been achieved. A typical example of this is the variation in results obtained when applying different camera angles to the vehicle registration plates. In order to further demonstrate the results obtained from experimentation with the algorithm in Matlab a single input image is being used to display the effects of various adjustments to the image processing properties. The image is displayed in Figure 24 below and the intention is to isolate the registration plate characters as the only Regions of Interest to the OCR function. This particular image has been selected due to certain properties it exhibits. These include the substantial light contrast from the top to the bottom of the picture and the offset angle of the vehicle registration plate. Figure 27 Alfa Romeo Input Image Tables 1 to 4 contain property types used in the image processing algorithm to differentiate between RoI’s in an image. Each property is associated with a threshold
  • 53. 43 value which can be adjusted to determine the effect of each property in distinguishing between general colour concentrations and text regions. The first five properties in each table are the geometric properties discussed in Section 3.4 and the sixth is the Stroke-width threshold discussed in Section 3.5. Table 1 contains the base values used to configure the region properties and stroke-width thresholding levels. With these values in place several separate instances of character recognition have been successful. In fact, with this configuration the system has been able to produce the automated response to positive recognition anticipated in the design. However, these successful cases have been induced in ideal conditions or with much less complex input images than the one seen in Figure 24. Table 1 Parameter Values - First Iteration Parameter Threshold Value Aspect Ratio >3 Eccentricity >0.995 Solidity <0.3 Extent 0.2< OR <0.9 Euler Number <-4 Stroke-width Threshold 0.4 From left to right the images in Figure 25 as well as in Figures 26, 27 and 28 depict the three key stages of the image processing algorithm: 1) MSER region detection, 2) removal of MSER regions based on geometric properties and 3) removal of remaining non-text regions based on stoke-width detection. Figure 28 Processing results - First Iteration In the first image in Figure 25, the MSER technique demonstrates both its strength and weaknesses. The method has successfully identified the seven character regions in
  • 54. 44 the image but has also identified a very high number of additional regions which are considered potential text regions. It is the sheer quantity of potential RoI’s determined using the MSER methodology that make further pre-processing of the image a necessary requirement. However it should be noted that the volume of MSER regions detected in this image is a consequence of the inherent complexity presented. Much of the experimentation carried out for this project has been undertaken with extremely basic text images, often resulting in detection of text regions only, or very limited non-text regions. In the second image presented in Figure 25 the regionprops command has been employed to measure the specified geometric properties with the intention of eliminating non-text regions based on the threshold values seen in Table 1. In this instance the technique has been fairly successful in removing many of those ‘blob’ regions detected using MSER. The areas surrounding the license plate have been removed, as have many of those on the grill and window-wipers of the vehicle. This stage of the process has also been successful in maintaining the character regions on the number plate for further processing. Despite many of the non-text regions being removed during this stage of the process it can be deduced from those remaining regions that the parameters documented in Table 1 are not ideally refined for this image. The final image in Figure 25 depicts the result of applying stroke-width analysis and a threshold of 0.4 to the picture. As stated previously, the stroke-width measurement is calculated as the standard deviation of the stroke widths divided by the mean of the stoke widths. In this example the stroke-width threshold has been entered as 0.4. Those areas of the image with a stoke-width measurement greater than 0.4 are indexed and identified as likely text regions. Use of stroke-width analysis has been partially successful in removing some of the remaining non-text regions, particularly the significant ‘blob’ of colour to the top-left of the image. However there are still several areas of non-text regions which have not been eliminated. Perhaps of even greater significance is the fact that application of the stroke width threshold has actually resulted in removal of one of the license plate characters as a potential text character. In this instance the ‘W’ does not meet the specified criteria. There are a number of possible reasons for the disappointing results obtained in this example. One of these is the likelihood that the number of non-text regions
  • 55. 45 remaining after the geometric property thresholding had been applied has resulted in a skewed calculation of the stoke-width average which is not primarily based on actual text character values. Another potential reason for the removal of the ‘W’ character is that the threshold setting may be too low when applying the stoke-width analysis. Although increasing the threshold may result in this character being detected it may also result in additional non-text regions being identified. The results demonstrated in Figures 26, 27 and 28 will show how making changes to parameter values in the image processing algorithm can improve or worsen the overall performance in detection of RoIs. In Table 2 the stroke-width threshold has been increased from 0.4 to 0.5 with the other parameters remaining constant from the previous example. The expectation if that the increased stroke-width threshold will result in inclusion of the ‘W’ character as a detected text region. However this change is not going to be a panacea for the many non-text regions seen previously. Table 2 Parameter Values - Second Iteration Parameter Threshold Value Aspect Ratio >3 Eccentricity >0.995 Solidity <0.3 Extent 0.2<OR<0.9 Euler Number -4 Stroke-width Threshold 0.5 As in the previous example, Figure 26 depicts the effect of the three important pre- processing techniques on the input image.