Thesis

Image Processing with Character Recognition
using Matlab
Ciaran Cooney
This thesis is submitted to Dundalk Institute of Technology in partial
fulfilment of the requirements for the degree of
B.Eng. (Hons) in Sustainable Design
School of Engineering
Dundalk Institute of Technology
Supervisor: Tim Daly, Paul Egan, Tommy Gartland, Alan
Kennedy
2016

i
Abstract
Text detection and character recognition in natural scene images is a challenging and
complex operation due to the potential for varying degrees of quality expected from
the input data. Therefore development of a robust and adaptable algorithm requires
several stages of pre-processing to identify regions of interest before character
recognition can be applied. This paper presents a methodology for implementation of
a character recognition algorithm based on identification of the alphanumeric digits
on vehicle registration plates.
The text detection algorithm has been integrated within a system requiring
initial image acquisition and a visual indication of results. The reason for this
development is to promote the use of the technique in a commercial application. A
wireless network and graphical user interface are incorporated to supplement the
primary utility of the system i.e. image processing and character recognition.
Results demonstrate the strengths and weaknesses of the techniques employed.
The quality of the input image, ambient conditions and various parameters within the
algorithm itself are found to impact the Optical Character Recognition (OCR) engines
ability to accurately detect text.

iii
Declaration
I, the undersigned declare that this thesis entitled:
Image Processing with Character Recognition using Matlab
is entirely the author’s own work and has not been taken from the work of others,
except as cited and acknowledged within the text.
The thesis has been prepared according to the regulations of Dundalk Institute of
Technology and has not been submitted in whole or in part for an award in this or any
other institution.
Author Name: Ciaran Cooney
Author Signature:
Date:

iv
List of Abbreviations and Symbols
RoI Region of Interest
OCR Optical Character Recognition
MSER Maximally Stable Extremal Regions
PCB Printed Circuit Board
CPU Central Processing Unit
GPIO General Purpose Input/Output
LED Light Emitting Diode

v
Table of Contents
Abstract......................................................................................................................... i
Acknowledgments........................................................................................................ ii
Declaration..................................................................................................................iii
List of Abbreviations and Symbols............................................................................. iv
Table of Contents......................................................................................................... v
List of Figures............................................................................................................ vii
List of Tables .............................................................................................................. ix
1 Introduction.......................................................................................................... 1
1.1 Introduction.................................................................................................. 1
2 Literature Review................................................................................................. 3
2.1 Introduction.................................................................................................. 3
2.2 Technique..................................................................................................... 4
2.3 Optical Character Recognition..................................................................... 6
2.4 Software....................................................................................................... 7
3 Theory................................................................................................................ 11
3.8 Common Issues with Text Detection......................................................... 17
4 Methodology...................................................................................................... 19
4.2 System Design ........................................................................................... 19
4.3 Hardware Specification – Raspberry Pi 2 Model B................................... 21
4.5 PCB Design and Manufacture ................................................................... 24
4.7 MSER Regions .......................................................................................... 27
4.8 Regionprops............................................................................................... 28
4.9 Stroke-Width Variation.............................................................................. 29
4.11 OCR Function............................................................................................ 31
4.12 String Comparison..................................................................................... 31
5 Experimental Testing......................................................................................... 35

vi
6 Results and Discussion ...................................................................................... 37
6.1 Introduction................................................................................................ 37
6.2 Basic Detection.......................................................................................... 38
6.3 Complex Detection .................................................................................... 42
6.6 Further Work.............................................................................................. 56
7 Conclusions........................................................................................................ 57
Appendix A................................................................................................................ 64

vii
List of Figures
Figure 1 System Flowchart.......................................................................................... 2
Figure 2 System Design Flowchart............................................................................ 20
Figure 3 Raspberry Pi Pin Layout.............................................................................. 22
Figure 4 PCB Design ................................................................................................. 25
Figure 5 Software Design Flowchart ......................................................................... 26
Figure 6 Polling a Switch........................................................................................... 27
Figure 7 Malab MSER command .............................................................................. 27
Figure 8 MSER Example Result................................................................................ 28
Figure 9 Geometric Properties Thresholds ................................................................ 28
Figure 10 Stoke-Width Thresholding ........................................................................ 29
Figure 11 Bounding Boxes ........................................................................................ 30
Figure 12 Merging of Bounding Boxes ..................................................................... 30
Figure 13 Merged Bounding Boxes........................................................................... 31
Figure 14 OCR function code.................................................................................... 31
Figure 15 Cell Arrays................................................................................................. 32
Figure 16 'if' statement in Matlab .............................................................................. 33
Figure 17 Graphical User Interface............................................................................ 34
Figure 18 Complete System Hardware...................................................................... 36
Figure 19 Basic Detection - Input Image................................................................... 38
Figure 20 Basic Detection - MSER regions............................................................... 39
Figure 21 Basic Detection - Geometric Properties method ....................................... 39
Figure 22 Basic Detection - Stoke-width thresholding.............................................. 39
Figure 23 Basic Detection - Bounding Box comparison ........................................... 40
Figure 24 Basic Detection - Bounding Box Comparison (1)..................................... 41
Figure 25 OCR result (1) ........................................................................................... 41
Figure 26 OCR result (2) ........................................................................................... 41

viii
Figure 27 Alfa Romeo Input Image........................................................................... 42
Figure 28 Processing results - First Iteration............................................................. 43
Figure 29 Processing Results - Second Iteration ....................................................... 46
Figure 30 Processing Results - Third Iteration .......................................................... 47
Figure 31 Processing Results - Fourth Iteration ........................................................ 48
Figure 32 Complete Test (Basic) - Input ................................................................... 49
Figure 33 Complete Test (Basic) - MSER regions .................................................... 50
Figure 34 Complete Test (Basic) - Bounding Boxes................................................. 51
Figure 35 Complete Test (Basic) - Text Region........................................................ 51
Figure 36 Complete Test (Basic) - Result.................................................................. 52
Figure 37 Complete Test (Complex) - Input.............................................................. 52
Figure 38 Complete Test (Complex) - MSER regions .............................................. 53
Figure 39 Complete Test (Complex) - Post-Geometric Properties............................ 53
Figure 40 Complete Test (Complex) - Post-Stroke-width thresholding.................... 54
Figure 41 Complete Test (Complex) - Bounding Boxes ........................................... 54
Figure 42 Complete Test (Complex) - Text Region.................................................. 55
Figure 43 Complete Test (Complex) - Result............................................................ 55
Figure 44 Schematic Diagram ................................................................................... 64
Figure 45 Breadboard Construction........................................................................... 64

ix
List of Tables
Table 1 Parameter Values - First Iteration................................................................. 43
Table 2 Parameter Values - Second Iteration ............................................................ 45
Table 3 Parameter Values - Third Iteration ............................................................... 46
Table 4 Parameter Values - Fourth Iteration.............................................................. 48

1
1 Introduction
1.1 Introduction
Image processing in general and object recognition in particular is becoming an
increasingly important facet in modern electronics and communications. Some of the
more prevalent applications include medical imaging using fMRI (Steele et al., 2016),
process automation in industrial settings (Choi, Yun, Koo, & Kim, 2012) and text
detection in natural scene images (Zhao, Fang, Lin, & Wu, 2015) (Liu, Su, Yi, & Hu,
2016). The techniques deployed across these applications are wide-ranging and
diverse due to the different requirements of each. With such a vast array of criteria for
investigation it is necessary to define a specific area of interest.
Text Detection, or Character Recognition, is a field of study with an extensive
literature behind it and a burgeoning market for applications. Typical applications
where character recognition is especially important include scanning of text
documents, reading license plate numbers and language translation of text images.
Just as there are many applications for text detection, there are many techniques and
methodologies for implementation of a detection algorithm. Edge-detection,
thresholding and Hough transforms are three of the most common methods employed.
In fact, Otsu’s Method (Otsu, 1979) is a thresholding technique often implemented
within commercial Optical Character Recognition (OCR) algorithms.
License plate recognition is a standard paradigm for investigation and
experimentation of character recognition techniques and is the frame in which this
project has been carried out. A variety of methods have been implemented in license
plate detection such as Harris Corner and Character Segmentation (Panchal, Patel, &
Panchal, 2016), the use of SIFT descriptors (Yu Wang, Ban, Chen, Hu, & Yang,
2015) and probabilistic neural networks (Öztürk & Özen, 2012).
Much of the preliminary work undertaken has been focused on obtaining a
deeper understanding of the various techniques involved in text detection processes,
particularly those related to natural-scene images. Although the theory is extremely
important, practical usage must also be considered. With this, hardware and software
platforms are investigated in the literature review for this project to ascertain their
relative compatibility with image processing applications.

2
To test the efficacy of the investigation into the various detection and
recognition methods a practical implementation of these techniques is developed. In
most cases character recognition systems will consist of several component parts
including acquisition, pre-processing and recognition. The system proposed here
incorporates each of these elements within a wireless network which will provide an
automated response to positive character detection and an equivalent alert to failed or
negative detection.
The system is framed as a method for detecting the characters of a vehicle
registration plate and permitting or denying entry based on comparison of the detected
text and a pre-existing vehicle-registration database. However there is inherent
flexibility in the model and it may be adapted to service other applications. Figure 1 is
a flowchart depicting a high-level description of the required functionality of the
system.
Figure 1 System Flowchart
The methodology is based upon use of a central microcontroller which will acquire an
image when triggered. The acquired image is then transmitted wirelessly to a laptop
or PC on which the filtering and pre-processing of the image will take place. Post-
processing, the image is applied to a commercial OCR algorithm which will output a
digital representation of the vehicle registration number obtained. Finally, comparison
of the number obtained with a database of expected numbers is carried out to
determine the action of the automated response.
Image Acquisition
Image Transmission
Pre-Processing
OCR
Results Comparison
Automated Response

3
All the relevant theory, methodology and results relating to implementation of the
system described are contained within the main body of this document.
2 Literature Review
2.1 Introduction
Image processing and text recognition are increasingly important areas for research
and development in the modern world. Sectors in which image processing techniques
provide the basis for critical applications include medical, communications and
security. In the medical industry image processing techniques, such as improving the
quality of fMRI scans, have been employed in diagnostics (Misaki et al., 2015), with
some modern applications facilitating automated diagnosis of certain conditions.
Text recognition is an area with increasing relevance and the technology in this
area is keeping pace with this need. One of the most impressive applications present
in the literature is the use of text recognition technology in the development of a text-
to-speech synthesis system(Rebai & BenAyed, 2015).
Not only are the potential applications for image processing widespread but the
techniques used to extract the information are equally diverse. Methods deployed are
of course dependent on the desired outcome and there is no shortage of techniques
that can be tailored towards a specific target. Image processing is not unlike other
types of data processing in that the particular process is chosen based on the exact
requirements of the intended application.
With the project for which this literature review has been compiled being
primarily concerned with character recognition in a static image, much of this report
has been written with reference to this area (Zhao et al., 2015; Zhu, Wang, & Dong,
2015).
The expected outcome of this paper is to review, understand and analyse the
present literature on image processing techniques, the platforms used to implement
these techniques and the applications which most commonly employ image
processing as a means of achieving a desired outcome. Section 2 of the report gives an
overview of the techniques employed in the processing of images, usually to extract a
specific piece of information. Section 3 will discuss the operation of Optical
Character Recognition (OCR), which is an adaptable algorithm designed to recognise

4
specific features contained within an image i.e. text. The fourth and fifth sections of
the report will feature an assessment of the hardware and software platforms which
could be used to implement the specific techniques associated with image processing.
The report will conclude with a concise summary of the key findings from the
literature review. An outline will be included providing some of the relevant
information which will inform the future progress of this project.
2.2 Technique
There are numerous techniques documented and discussed in the literature available
on image processing. Among those most prominently featured are segmentation,
edge-detection and thresholding. Of course, the technique(s) employed by researchers
or professionals are largely dependent upon the requirements of a given application,
although not exclusively so. In some cases the limitations of software or hardware
may be the deciding factor in choices regarding technique.
Edge-Detection is one of the most common approaches to segmentation with its
method of detecting meaningful discontinuity in intensity values(Rafael C. Gonzalez,
Woods, & Eddins). The method makes use of derivatives and generally computed
using a LaPlacian filter. In their 1997 paper, (Smith & Brady, 1997) document an
approach to low level image processing, labelled the SUSAN principle which was
basically developed on existing edge-detection and corner protection techniques.
Another method with considerable presence within the literature is the use of
Moment Invariants. Moments are used to analyse and characterize the patterns
contained within image and are thus useful in character recognition. For instance,
Zernike moment invariants have been shown to be extremely effective in pattern
recognition applications(Belkasim, Shridhar, & Ahmadi, 1991).
Alongside Edge-Detection, Thresholding is one of the most commonly used
techniques used in image processing, specifically segmentation. The reason for this
prevalence seems to be its simplicity of implementation as well as the intuitive
properties it exibits(Rafael C. Gonzalez et al.). Thresholding is used for all sorts of
applications that require the extraction of information from a given image. One such
application is the detection of glioblastoma multiforme tumors from brain magnetic
resonance images(Banerjee, Mitra, & Uma Shankar, 2016). Global thresholding is
shown in this case to estimate the statistical parameters of the “object” and

5
“background” of an image. The literature in this area certainly supports the view that
thresholding is among the primary techniques used in image processing.
As well as the most common image processing techniques in the literature exist
some that are more specialized. One such technique is Nonnegative Matrix
Factorisation (NMF). Problems can occur with this method and several algorithms
have been proposed to solve these(Hu, Guo, & Ma, 2015). Although NMF is
purported to be an effective tool for large scale data processing it is not one that is
likely to be pursued for the requirements of this project.
Another less prominent but interesting method sometimes used for image
processing is Fuzzy Logic (Amza & Cicic, 2015). Among its current uses are in
automated quality control image processing systems. It works by extracting
geometrical characteristics of an object and then using this information with a fuzzy
pre-filtering unit to estimate the probability of a foreign body being present on the
object being analyzed. Although the use of this fuzzy logic is extremely successful in
these types of applications it does not appear to be the logical approach to a text
recognition application.
Before the more technical aspects of the image processing algorithm are
activated, it may be necessary to implement some of the more basic image processing
techniques to prepare an image for this. These basic adjustments may come in the
form of an image resizing, rotation or cropping, depending on the particular
characteristics of the image and the data to be extracted. In an article on low-quality
underwater images (Abdul Ghani & Mat Isa, 2015), the authors reference Eustace et
al. by adapting a contrast-limited adaptive histogram specification (CLAHS) as a pre-
processing step.
In most cases, the literature presents a combination of techniques that have been
chosen because of a particular capability to carry out a specific function or as a means
of experimentation in order to improve existing techniques. With regards to any
nascent image processing project or assignment, it is quite clear that a pragmatic
approach should be taken from the outset so that a suitable technique(s) can be
chosen.

6
2.3 Optical Character Recognition
One of the more dominant themes present in the literature surrounding image
processing techniques is that of Optical Character Recognition (OCR). OCR appears
as the final processing step in many of the papers research on image extraction and
recognition. There is clearly a wide range of applications and extraction methods that
OCR can be used in conjunction with. Among some of the potential applications for
the use of OCR are keyword searches and document characterization in printed
documents(M. R. Gupta, Jacobson, & Garcia, 2007).
A summary of the theories underpinning the OCR function is provided in Optical
Character Recognition-Theory and Practice(Nagy, 1982). Among the topics
discussed in this book is the classical decision-theoretic formulation of the character
recognition problem. Statistical approximations, including dimensionality reduction,
feature extraction and feature detection are discussed with regard to the appropriate
statistical techniques.
Commercially available OCR algorithms are primarily designed to interpret
binary (black and white) images. However, more and more pre-processing techniques
are being developed as a means of preparing images for use with this function. An
example of this is the denoising and binarizing of historical documents as a pre-
processing step(M. R. Gupta et al., 2007). Many researchers have pursued methods
based on development of a new or unique method of extraction that can be used along
with existing OCR functions (Roy et al., 2015).
One of the limitations associated with OCR-based applications is that they may
not work well when properties of the captured character images are significantly
different from those in the training data set. A supervised adaptation strategy is one
that has been developed as a potential solution to this problem(Du & Huo, 2013).
Nagy et al. also demonstrated that a character classifier trained on many typefaces can
be adapted effectively to text in a single unknown typeface by using a self-adaptation
strategy.
A further problem which can sometimes be faced when using an OCR algorithm
for text recognition is the assumption that individual characters can be isolated
(Fernández-Caballero, López, & Castillo, 2012). Some traditional methods of OCR
implementation have less than ideal recognition performance because of the difficulty
in achieving clear binary character images.

7
The literature clearly indicates that OCR is a vital function in relation to image
processing and text recognition. However, due to some of the limitations stated above,
it is important that any image be properly processed and segmented before being put
through an OCR algorithm.
2.4 Software
The extensive literature on image processing and text recognition techniques
incorporates the use of several types of software for implementation. Whether it is due
to personal preference or application specific criteria, it appears that there are a large
number of platforms available for consideration when undertaking an image
processing project.
Software which has been developed with the specific intention of being used for
image processing applications are available, often initiated from academic research. A
classic example of this is ImageJ, software written in Java and designed to run on any
operating system. ImageJ supports various functions and capabilities. For instance, it
is able to acquire images directly scanners, cameras or video sources. The program
also supports all common image manipulations including reading and writing of
image files and operations on individual pixels (Abràmoff et al., 2004).
The use of Labview as a tool for image acquisition and processing is an interesting
proposition and does have some presence in the literature. A program named Image-
Sensor Software (ISS) is one that is based on the Labview programming
language(Jurjo, Magluta, Roitman, & Batista Gonçalves, 2015). Use of this type of
software enables image acquisition tools such as zoom, focus and capture. The
features required by the overall image recognition system must be defined by the user
when programming.
Matlab is a powerful piece of software with many uses in modelling,
experimentation and signal analysis. Its connectivity with many advanced
programming languages (like C, Java, VB) and availability of a wide range of
toolboxes make it popular among the scientific and research community(R. Gupta,
Bera, & Mitra, 2010). It possesses an extensive array of tools which can be harnessed
in the interests of image recognition. The use of the segmentation method id
particularly powerful within Matlab. It’s use has been demonstrated by tracing yarn to
accurately compute useful parameters of fibre migration by statistically calculating

8
mean yarn axis and tracing out mean fibre axis(Khandual, Luximon, Rout, Grover, &
Kandi, 2015).
By employing Matlab as the means of processing an image for some form of
character recognition, the user has the ability to tailor code to develop algorithms with
specific image properties in mind. This may involve text or shape recognition, simple
colour recognition or perhaps properties contained within the image such as depth
perception.
Matlab has the additional advantage of being compatible for use in connection with
some form of hardware acquisition unit that may be implemented as part of an
embedded system. It’s use in this context has been proven successfully(R. Gupta et
al., 2010), as a method for controlling image acquisition as well as image processing.
There are some specialised software packages that have been designed to facilitate
a specific function. A prime example of one of these is Xmipp, software developed
primarily as a means of image processing in electron microscopy(de la Rosa-Trevín et
al., 2013). Graphical tools incorporated within this software include data visualisation
and particle picking which can allow visual selection of some of the key parameters of
an image. It can be seen from reviewing the literature that image processing software
is both prevalent and sophisticated. At times it can appear overwhelming from the
sheer density of techniques available, however this does suggest that the type of
application being pursued in this project is very much achievable.
Although not always used exclusively, Matlab is very often used as a sun-section
in an overall processing technique. This seems to be due to the vast array of different
commands available within its image processing toolboxes. Images can be treated
using commands such as “fspecist” and “imfilter” in Matlab (HashemiSejzei &
Jamzad), before being processed elsewhere for different reasons. This is certainly a
consideration for the progress of the project being considered here, particularly in the
earlier stages of development when the use of some of these Matlab commands could
prove to be extremely informative.
2.5 Hardware
As with software, hardware is an important factor that must be given careful
consideration when entering into an image processing project. The relative strengths
and weaknesses of a specific hardware platform must be carefully gauged with

9
reference to the processing requirements. Not only this, but compatibility with a
chosen piece of software must be given due consideration. The presence of discussion
and critique of specific hardware units is not as strong as in software. This is primarily
due to the fact that most of the experimental work in this area is focused on the
various image processing algorithms, which are generally cross-platform.
The presence of embedded systems as a means of computing image processing is
fairly extensive in the literature. An ARM processor in conjunction with Matlab and a
Linux based operating system has been used to automatically identify cracks in a wall
(Pereira & Pereira, 2015).
Some applications may require the use of high-speed image processing systems.
Due to demands that may include increasing the speed of a transform process of
decreasing overall processing time, it may be necessary to design a specific
architecture to support the function. This is often the case with complex algorithms
which can be implemented using an FPGA for prototyping and verification (Mondal,
Biswal, & Banerjee).
As commented upon at the beginning of this section, there is a comparative lack
of hardware-related literature. The obvious conclusion to draw from this fact is that
the choice of hardware is secondary to the choices of technique, algorithm and
software. However one of the key hardware considerations is the processing
capability of any PC or laptop being used. A powerful CPU and specifically the
inclusion of a Graphics Processing Unit (GPU) can dramatically improve the
performance of any image processing application (Cugola & Margara, 2012).
2.6 Conclusions
There are several component factors to be investigated when considering a project
related to image processing. The relative importance of each of these factors is
reflected in their presence in the literature. Certainly the techniques or algorithms to
be implemented are critical factors which will determine the success or failure of a
given project. As has been documented previously in this report, there are many
potential techniques that can be useful in a variety of applications. This being the
case, it is always an important first step to define the functionality of an application’s
before determining the correct method for achieving this aim.

10
With one of the potential objects of a project being text recognition from a scene
image, the use of segmentation and particularly thresholding techniques are very
likely to be required in some form. As well as these processing techniques, Optical
Character Recognition (OCR) in one form or other is almost ubiquitous across text
recognition applications. As there are many commercially available OCR engines, the
decision of which to use is almost entirely intertwined with the choice of software
platform. Matlab for example as an OCR algorithm associated with its own image
processing toolboxes.
With regards to software selection for image processing functions, it appears as if
this may come down to a personal preference for a particular interface in many cases.
However, an analytical approach should be taken to ensure that the chosen software
has the desired capabilities. A secondary, or perhaps even primary, factor worth
consideration is the relative expense of some of the software available for image
processing tasks. As noted in this literature review, there are free image processing
programs currently available and extensively developed, although it is possible that
they may come with certain compatibility issues. At the opposite end of the spectrum
software such as Matlab may only include its best image processing software at
additional expense, separate from the main program license.
One of the key decisions to be made is in the choice between the possible
implementation of an embedded system or developing the process on a PC or laptop.
Depending on the overall functionality of a system, it may be more desirable to have
an embedded image processing algorithm that acts as a device for detecting very
specific types of data. Alternatively, the use of a PC or laptop in this area allows for
continuing flexibility in the processing techniques even after completion of the final
design. As with every aspect related to this topic, decisions must be primarily based
upon the end-requirements of the application.
Overall impressions of the available literature on image processing techniques are
that the research and experimentation in this area is both extensive and expanding. It
is a field that is extremely relevant in the technology and communications sector
today and the work being undertaken reflects this status. Of course this means that its
pace of development is exceptionally fast but it also means that the potential
applications for its use will continue to grow.

11
3 Theory
3.1 Introduction
Text detection has of course been heavily researched with multiple methods being
suggested for application (cite). There are some differences in the literature as to how
these methods are categorised. (Zhang, Zhao, Song, & Guo, 2013) for example,
categorise these techniques into four groups: edge-based, texture-based, connected-
component (CC)-based and others. However (Chen et al., 2011) have categorised
these techniques into two primary classes: texture-based and CC-based.
Maximally Stable Extremal Regions is the technique being employed in this
case. The use of an MSER approach to text detection is advocated for several reasons.
Among these are the observations that text regions tend to have quite high colour-
contrasts with their backgrounds and they also typically consist of homogenous colour
formations (Liu et al., 2016).
The following sections introduce the theory underpinning the methodology
being implemented for this image processing algorithm in various stages. Each of the
key components of the algorithm are discussed indivually and their anticipated effects
on a given input image stated. The theory in this section is laced with referances to
Matlab and the methods available on this software for applying these techniques. The
section begins with a note on the image formats typically used in this type of
application. In many instances the image format itself is not a critical factor in image
processing but it is nevertheless worthy of consideration.
3.2 Image Formats
There are certain specifications that an input image must meet for use with the Matlab
OCR function. The image classification, i.e. ‘.png’, ‘.jpeg’, ‘.tiff’ etc. is not a critical
factor in this implementation but it must be a real, non-sparse value (Mathworks.com,
2016a). This simply means that the image matrix must not be populated entirely by
zeroes.
The OCR function accepts any of the following three input image types:
M-by-N-by-3 truecolour – A true colour image is a 24-bit image (8-bits for each
colour Red, Blue and Green (RGB)) such as a JPEG, capable of displaying millions of

12
colours (224
) (Robbins, 2007). The quantity of possible colours is due to the fact each
byte is able to represent 256 different shades.
M-by-N 2D grayscale – This is an image in which all colours are a different shade of
grey. One of the virtues of this format is that less information is required for each
pixel. They are stored in an 8-bit integer allowing for 256 (28
) shades of grey from
white to black (Fisher, Perkins, Walker, & Wolfart, 2003b). Grayscale is a common
format in image processing.
M-by-N binary – In binary images pixels have only two possible intensity values.
These values are typically displayed as black and white, with 0 used for black and 1
or 255 used for white (Fisher, Perkins, Walker, & Wolfart, 2003a). The binary format
is often used to distinguish between text and background in pattern recognition
algorithms.
As stated above, the class of image is not a defining factor in the success or
failure of the recognition algorithm. Due to this two image types have been
implemented throughout the testing and experimentation process: PNG and JPEG.
PNG is a relatively new image format and uses 24-bit true colour
(Willamette.edu, 2016). Although the files can be considerably larger than the JPEG
format this is not a major concern in this instance as all image files are to be deleted
immediately after use.
JPEG is said to be a ‘lossy format’ (Willamette.edu, 2016) as it has the potential
for some data loss associated. These losses result in slight degradation of the image
but have minimal impact on the visual perception of the image. JPEG is not limited in
colour and is a popular format for images containing natural scenes and vibrant
colours. However the vibrancy of the colour image is not a primary factor for
consideration in this case.
3.3 Maximally Stable Extremal Regions
The first detection method employed in the text recognition algorithm is known as
Maximally Stable Extremal Regions (MSER). MSER is a technique used extensively
in many image processing applications from text recognition (Chen et al., 2011) to
visual tracking (Gao et al.). One of the basic principles of an MSER approach has

13
been defined as “blob detection” (Matas, Chum, Urban, & Pajdla, 2004), meaning that
the MSER command in Matlab will return relevant information pertaining to MSER
features in a given input image.
Due to the fact that an input image will present significant variation in
granulation, resolution and grey-scale levels, amongst other features, the roughness or
smoothness of the edges within that image can vary also (Moreno-Díaz, Pichler, &
Quesada-Arencibia, 2012). For this reason the blob detection is applied with an
MSER algorithm for detecting sections of significant intensity within an image. The
Extremal region associated with the MSER acronym is an area within an image with
connected components which maintain intensity levels below a threshold.
Through this technique areas of interest can be filtered to allow an OCR
algorithm to attempt character recognition.
3.4 Removal Based on Geometric Properties
MSER algorithms in general, particularly the Matlab one in use on this project, are
quite good at detecting most of the text regions within an image. However it is not
immune to the possibility of detecting other non-text stable regions present within an
image. Matlab facilitates a rule-based methodology for removal of these non-text
regions (Mathworks.com, 2015).
The principle behind this method is the removal of unwanted regions based on
a series of geometric properties that are ideal for distinguishing between text and non-
text areas of an image. The regionprops command is used to measure properties of an
image region. Several properties can be selected for measurement and their statistics
returned; ‘Orientation’ and ‘Area’ for example.
Thresholds are required to be set for each of the properties selected for
measurement. This may be considered one of the more dynamic sections of the
algorithm as these threshold values can be tuned to perform better with different
images. An mserRegions command can then be applied to an index array with each
of the geometric properties selected so that certain regions of the image can be
removed. This is effectively working as a filter, eliminating those “blobs” within the
image that do not conform to certain characteristics of the image text.

14
3.5 Stoke-Width Thresholding
In an effort to obtain more consistent results a stroke width transform of the MSER
regions is generated and applied to perform filtering and pairing of the connected
components (Chen et al., 2011). The stroke width is computed with the bwdist
command which calculates the Euclidean distance transform of a binary image.
(Epshtein, Ofek, & Wexler, 2010) designed a method of stroke-width transformation
based on the premise that text characters could be detected from the regions where
stable stroke widths occurred.
The reason for including this approach within a character detection algorithm
is that it can be effectively implemented as a means of reducing background noise.
This is because regions contained within the image are grouped into blocks, having
been further verified as containing properties relating to likely text characters (Yi &
Tian, 2011). For example, the stroke-width of the letter ‘T’ should be identical to the
stroke-width of the letter ‘D’ assuming the text font is the same. However a non-text
region is not likely to share this stroke-width and can therefore be eliminated as a text
region.
Thinning is a method of reducing binary objects in an image to strokes which
are a single pixel wide (R.C. Gonzalez, Woods, & Eddins, 2010). The Matlab
command bwmorph implements this approach with a series of operations including
dilations and erosions. Matlab enables the programmer to set the number of iterations
for which the thinning operation occurs. In fact, the number of iterations can be set to
infinity (inf) indicating that the operation will continue until the image ceases to
change.
The results from the distance transform and the thinning operation are then
combined to provide the stroke width values contained within the image. A
measurement for stroke width is calculated by dividing the standard deviation of the
stroke width values by the mean of the same stroke width values:
Stoke Width Measurement =
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝐷𝐷 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ
An array index is computed which is comprised of those regions of the image with
a greater stroke width measurement value than the value of the predefined stroke
width threshold. It is expected that those regions with a greater than threshold value

15
will be the text regions of the image. This index is then subject to the operation of the
mserRegions command so that the desired regions of the image, i.e. the text regions
can be removed.
3.6 Bounding Boxes
Bounding boxes are often employed within image processing applications as a
method of making some sense from the data obtained. Examples of the use of
bounding boxes include collision detection as applied to computer graphics and
animation (Yao Wang, Hu, Fan, Zhang, & Zhang, 2012), and the segmentation of
hand-written Chinese characters which are prone to overlap (Tseng & Chen, 1998). In
a typical text recognition system it is essential that the OCR engine is able to return
complete words or paragraphs, rather than a list of the individual characters acquired.
To help ensure that order is maintained so that the correct registration can be
obtained from an input image, bounding boxes are used to amalgamate the individual
character regions into lines of text (Mathworks.com, 2015). These bounding boxes
surround each of the individual text regions and can be expanded to overlap with each
other, thus forming a chain of overlapping boxes which are used to form complete
words, or in this case a vehicle registration number.
Bounding boxes are obtained for all the regions of interest remaining from the
image by concatenating the MSER properties for bounding boxes previously obtained
with the regionprops command. These bounding boxes are then expanded in line with
the theory of having characters overlap with their neighbours. A value for the level of
expansion, or Expansion Coefficient (E.C.) is entered into the algorithm and is used to
set new limits for the x and y axis of the bounding boxes. Minimum and maximum
axis values for the expanded bounding boxes are calculated in the following manner:
xmin = (1- E.C.) x xmin
xmax = (1 + E.C.) x xmax
Prudence dictates that the minor precaution of ensuring the expanded bounding boxes
do not exceed the outer limits of the image. This is achieved by comparing the
maximum axis limits calculated from the expansion coefficient with the axis limits
defined by the size of the image. The new axis limits of the bounding boxes are then

16
taken as the minimum value computed from the previous comparison. This is
implemented in Matlab in the following fashion:
xmax = min(xmax, size(I,2))
Overlapping bounding boxes can be combined to form a single box around
multiple characters. This method of merging overlapping bounding box components
to make a single component has been used in the processing and segmentation of
ancient historical documents (Kavitha, Shivakumara, Kumar, & Lu). However it is
most often implemented to distinguish between separate words. In the case of a
vehicle registration there are two likely outcomes: Either the entire registration will be
surrounded by a single bounding box or the two distinct sections of the registration
will be surrounded by separate bounding boxes. The effect on the overall result of one
of these events occurring over the other is negligible.
An overlap-ratio is applied to quantify the distance between each of the text
regions detected in the image. Matlab provides a function for this purpose which is
activated with the bboxOverlapRatio command. The function returns the overlap
ratio between each pair of bounding boxes contained within the image. Those
characters with non-zero overlap ratios are considered to be connected in the context
of the bounding box and are therefore likely to exist as part of the same line of text.
Any characters with zero overlap ratios are not connected and are thus considered as
separate sections within the text image.
A graph of these overlap ratios is generated in Matlab for to determine which
regions are connected for the purpose of merging them into a single text region.
3.7 Optical Character Recognition
The OCR function is applied to the area of the filtered version of the input image
encompassed by the merged bounding boxes. By defining the area of the image upon
which the OCR function is going to operate a more consistent detection performance
is expected.
Use of the Matlab command txt = ocr(I, roi) enables recognition of text in the
image (I) within a specified region of interest (roi). The region of interest is the area
defined by the bounding boxes generated in the detection algorithm and must take the
form of one or more rectangular regions defined by an m-by-4 matrix. The width and

17
height of the region of interest is determined by x and y coordinates established for
the bounding boxes and these must not extend beyond the area of the image.
Almost all of the commercially available OCR functions are designed to operate
on binary images and Matlab is one of these. Matlab’s OCR function uses Otsu’s
method of thresholding (Otsu, 1979) to convert an input image into a binary
equivalent before the recognition process is implemented. Otsu’s method has been
demonstrated to exhibit better overall performance in OCR than other techniques (M.
R. Gupta et al., 2007).
Modern OCR algorithms like the one employed by Matlab add multiple
algorithms of neural network technology to analyse character stroke-edge. This stroke
edge is effectively the collision point between the concentration of character pixels
and the background image. The algorithm takes averages of the black and white
values along the edge of each character. The result is then matched to the characters
contained in the dataset and the closest estimation is selected as the output character
(Potocnik & Zadnik, 2016).
When the OCR algorithm has completed the recognition process the results are
printed in the Matlab command line with the following entry: [txt.Text]. Should the
user require information on the properties of the OCR output the command ocrText
contains recognised text and metadata collected during optical character recognition
(Mathworks.com, 2016b). However, some of these features are not available with the
student edition of Matlab used during this project.
3.8 Common Issues with Text Detection
There are many difficulties associated with character recognition is scene images.
Typically there is a significant amount of inter-character and intra-character confusion
leading to mistaken identification (Mishra, Alahari, & Jawahar, 2016). For instance,
partial capture of a character can result in it being recognised as a completely different
alpha-numeric digit.
Extraction of text regions from natural scene images is a challenging task due to
the many factors influencing the quality of detection. These factors include variation
in light intensity, alignment of text, colour, font-size and camera angles (Zhang et al.,
2013). Some text components do not display a high level of colour contrast and fail to
be detected with MSER (Liu et al., 2016). This is one of the weaknesses associated

18
with implementation of an MSER methodology. However, under favourable lighting
conditions should not pose a substantial problem as vehicle registration plates have
reasonably distinct contrast between character and background regions.

19
4 Methodology
4.1 Introduction
Implementation of a system which will acquire an image, process it and provide
automated indication of the success or failure of the operation requires an elaborate
methodology to incorporate each of the individual components into one design. This
section documents how the complete system has been put together.
There are two distinct sections covered in this methodology section. The first is
the hardware element of the project containing all the communications involved,
including image acquisition and an automated response. The second is the procedure
implemented for processing of the acquired image and extraction of the desired text
region.
The overall system design is discussed, providing insight into how the various
components are expected to interact. The selection of the Raspberry Pi 2 Model B and
the specifications of this microcontroller which lend itself to the application are
documented before the additional circuitry required of the system is discussed with
particular regard to the design of a PCB.
Following the hardware description of the project a detailed description of the
image processing algorithm is provided. This description discusses each of the major
sections of the algorithm individually, highlighting the effects of each technique on a
given input image. As stated previously, this image processing algorithm is the
technical focus of the project and the level of detail reflects this.
The concluding paragraphs of this methodology section are intended to provide
the reader with information on how the extracted number plate text is compared to an
existing text string and how this comparison is used to provide indication of
recognition.
4.2 System Design
Figure 2 is a flowchart depicting how the overall system to be implemented has been
conceived.

20
Figure 2 System Design Flowchart
Having framed the project in the context of a system for extracting the digits from a
vehicle registration plate and using these to produce an automated response, each of
the six steps in the flowchart are essential elements in this process.
Beginning with image acquisition, it is conceived that some form of wireless
sensor network will be used to trigger a camera to capture an image. This may be
something analogous to an infrared transmitter/receiver (IR) circuit. In this case an IR
sensor would be positioned to allow an incoming vehicle to break the beam and
consequently cause the camera module to acquire the image.
In order to facilitate image acquisition that would be automated in this way a
microcontroller will act as the central node in the system. Some of the
microcontrollers with potential for selection are documented in the literature review
section.
The second stage of the system design is entitled ‘Image transmission’. The
concept behind this title is that the acquired image will be transmitted wirelessly to a
laptop or PC for image processing and character recognition. The central
microcontroller must be equipped with a wireless protocol such as Wifi or Bluetooth
and be capable of transmitting the data in this way.
The third and fourth stages of the system design are Image Processing and
Optical Character Recognition. Matlab has been selected as the software platform for
implementing this process for several reasons, including its Image Processing
Toolbox and OCR engine. The image processing stage of the system will incorporate
Image Acquisition
Image Transmission
Pre-Processing
OCR
Results Comparison
Automated Response

21
a series of steps designed to provide the best possible image for the OCR function to
operate on. This will involve some of the methods mentioned in the introduction and
literature review of this document. The OCR function is used to produce a text string
output which is expected to match the characters present in the input image.
The result from the OCR function will then be used alongside some existing
database of vehicle registration numbers in a comparison function which will
determine whether or not the character string obtained is one of the registrations
expected. Finally, the result from the comparison function, which will be a Boolean 1
or 0, will be used to initiate an automated response, tailored to each condition.
4.3 Hardware Specification – Raspberry Pi 2 Model B
The Raspberry Pi 2 Model B is the second generation of the Raspberry Pi
microcontroller and has been selected as the central device for this project. The device
offers a flexible format for embedded projects, particularly those requiring low power
(Raspberrypi.org, 2016b). There are several features of the Raspberry Pi that make it
an ideal candidate for selection in this project. Central to this is the 900MHz quad-
core ARM Cortex-A7 CPU with 1GB of RAM. According to (Arm.com, 2016), the
Cortex A7 is the “most power-efficient multi-core processor.” This becomes a
particularly important factor when considering the sustainability of a given system or
product.
The Cortex-A7 allows the Raspberry Pi to run at 1.2-1.6GHz while requiring
less than 100mW of total power in typical conditions (Arm.com, 2016). The low-
power and high performance of the Raspberry Pi has led to its implementation in
many power-critical projects. (Tomar & Bhatia, 2015) employ the Pi as the central
device in development of a Software Defined Radio (SDR) for use in disaster affected
areas. Wireless sensor networks are a common application for this type of
microcontroller and the Raspberry Pi compares favourably with devices such as the
Arduino Uno (Ferdoush & Li, 2014).
Additional features of the Raspberry Pi making it a suitable device for the type of
application considered for this project include the following (Raspberrypi.org,
2016b):
• 4 USB ports
• 40 GPIO ports
• Full HDMI port

22
• Ethernet port
• Camera interface
• Display interface
• MicroSD card slot
• VideoCore IV 3D graphics core
The camera interface included in the list above enables the user to connect the
custom-designed add-on module for Raspberry Pi hardware (Mathworks.com, 2016d).
This small and lightweight device supports both still capture and video mode, making
it ideal for mobile projects. In still capture mode the camera has a 5 megapixel native
resolution, supporting 1080p30 and 720p60.
The Raspberry Pi camera module is popular in home security applications and
wildlife camera traps and is often used for time-lapse and slow motion imaging
(Raspberrypi.org, 2016a).
The GPIO pins on the Raspberry Pi model B are an essential element in its use
as the central node of a system as they facilitate connection with external electronic
circuitry and sensors (Vujović & Maksimović, 2015). These pins can accept input and
output commands which can be programmed to act as required. With particular
reference to this project these input pins can be used to monitor the status of switches
or sensors which can be implemented as triggers for other components of the system.
The pin layout in Figure 3 can be seen in the diagram taken from element14.com,
included below:
Figure 3 Raspberry Pi Pin Layout

23
As witnessed by the pin diagram, the Raspberry Pi model B is equipped with several
DC power lines which can be used as a power source for external circuitry. In terms
of portability and using the microcontroller remotely this is a powerful feature as it
eliminates the necessity for further external power supplies which may otherwise be
required.
The facility to integrate a wireless network, database server and web server into
a single compact, low-power computer, which can be configured to run without a
monitor, keyboard or mouse is a major advantage when working with the Raspberry
Pi (Ferdoush & Li, 2014). This became a particularly important feature for use in this
project as the Pi could be controlled remotely following initial setup. As the system
was developed and became more refined, the wireless element grew in importance,
not only as a means of data transmission but as a method for implementing overall
control. For this reason the selection of the Raspberry Pi for the hardware
requirements of the project proved correct.
There are a several options for powering the Raspberry Pi with the condition
that the source is able to provide enough current to the device (Vujović &
Maksimović, 2015). The device is powered by 5V from a micro-USB connector;
however the current requirements differ for each model of the device and depend on
the number of connections drawing power from the microcontroller. For the model
being used in this case (2B), a PSU current capacity of 1.8Amps is recommended
(Raspberrypi.org, 2016c).
With a device such as the Raspberry Pi acting as the central node of a system
like this one, there is a possibility that an excessive number of parasitic devices may
be connected and drawing current that the Pi cannot facilitate. It is therefore essential
that the number of connected devices and components are kept to the minimum
required. Typical connections to the Raspberry Pi including HDMI cable, keyboard
and mouse require between 50mA and several hundred milliamps of current
(Raspberrypi.org, 2016c) and the camera module being used here requires a
significant draw of 250mA. Those external devices are required during the testing and
prototyping stages of this project. However, due to the specification of the system
some of these current drawing devices are not required for the final construction. With
remote connectivity there is no need for GUI-related connections to the Raspberry Pi,
thus relieving the power-burden on the device somewhat.

24
4.4 Wireless Network
The system design specifies that some form of wireless network is used for
communication between the microcontroller and the computer containing Matlab.
Wifi has been selected as the protocol for this purpose and there are several reasons
behind this decision.
The prevalence of Wifi in commercial and academic premises makes it an
easily accessable resource for implementation of this system. Wifi also enables
greater range than could be provided by a single Bluetooth device. The use of Wifi for
transmission of the acquired image is not a major concern as only one picture is being
sent at any one time.
The simple fact that Matlab is able to communicate directly with the
Raspberry Pi by forming a connection via the devices IP address made the selection of
Wifi a certainty. An IP address along with a username and password for the
Raspberry Pi is all that is required to enable remote control of the device from Matlab.
The choice of Wifi as the network model may have been premature in regards
to experimental testing of the system due to the intermittent coverage in the lab
setting. This issue is discussed further in the section of this paper relating to testing.
4.5 PCB Design and Manufacture
As the primary area of investigation and experimentation undertaken for this project is
the image processing and character recognition elements, a model for some of the
hardware requirements is necessary to ensure effective use of the time allocated. The
use of modelling particularly relates to the inputs and output of the system i.e. the
initial triggering of acquisition and the automated response.
As the initial triggering of the camera module is premised on a traditional IR
sensor a simple push-button switch can be used to model this action. In a real world-
scenario the automated response of the system may be used as a means of enabling or
restricting access to a parking facility or even alerting an operator. The Raspberry Pi
GPIO pins can be utilised to initiate an automated response and in this case the use of
two LEDs has been chosen as the method for affirming the results of the overall
system. A green LED will denote positive detection and recognition while a red LED
with signify the corresponding negative result.

25
Initial testing of the system in this configuration required that construction of a
circuit be carried out on a breadboard. Once successfully tested and a final design
settled upon, this circuit could be designed and constructed as a Printed Circuit Board
(PCB).
The circuit design incorporated two push-button switches. One to simulate the arrival
of a vehicle at the position where image acquisition takes place and a second to
simulate the end of the operation and system reset. These two switches are connected
to one of the Raspberry Pi GPIO pins which will be polling for a change in state.
The two LEDs being used to simulate the systems output response are also
connected to GPIO pins on the Raspberry Pi. Due to a lack of intensity experienced
while testing with the LEDs two NPN transistors have been included in the circuit to
enable extra current to be driven to the LEDs.
The design of the circuit can be seen in the appendix and the PCB design in
Figure 5. Both the schematic and PCB layout have been drawn on proteus.
Figure 4 PCB Design
4.6 Software Structure
To implement the system on Matlab several important requirements of the design
specification must be met. This requires a systematic approach to development of the
program to ensure that none of the critical stages are overlooked. Figure 5 is a
flowchart depicting the various stages of the software design as it has been
programmed in Matlab.

26
Figure 5 Software Design Flowchart

27
The first critical objective of the program is to connect with the Raspberry Pi device
and take control of the onboard camera module. At this point the external LEDs are
set to ‘0’ to ensure they are not considered as false positives.
The program then ‘polls’ the appropriate GPIO pin which is connected to the
switch being used to trigger image acquisition. This ‘polling’ effectively sees the
system wait for this switch to be pressed before any other action can begin. Figure 6
shows how this has been implemented two simple lines of code.
Figure 6 Polling a Switch
When the switch is finally pressed the Raspberry camera model aquires an image and
it is transmitted to Matlab on a laptop. The image is saved into the associated Matlab
folder and applied to the image processing algorithm.
The image is converted to the grayscale format before it is processed through
each of the stages discussed in the theory section. The method employed for applying
these techniques in Matlab is documented in the following sections.
4.7 MSER Regions
As stated previously in the theory section, the initial phase of image processing is
application of the MSER technique. The Matlab command detectMSERFeatures
seen in Figure 6, returns information on region pixel lists and is used to determine the
‘blob’ regions in an image.
Figure 7 Malab MSER command
The lines of code in Figure 6 show how this command is implemented in the program.
There are a number of parameter values associated with the command, allowing the
user to determine certain ranges depending on their application. The
‘RegionAreaRange’ facilitates the size of the detected regions in pixels and can be
adjusted within the range of 30 to 14,000. In Matlab’s user guide the ‘ThresholdDelta’
value is stated as a method for specifiying the threshold intensity levels used in
selecting Extremal regions while testing for their stability. Put simply a greater
parameter value will return fewer regions of interest.

28
The parameter values seen in Figure 6 have generally been used thoughout
testing but they can be adjusted and additional parameters included if required.
The image in Figure 7 has been operated on by the MSER technique discussed
and exhibits all the potential text regions detected. Due to the relatively wide scope of
this image a large number of MSER regions have been returned. The weakness of
this technique is obvious from this image as the number of non-text regions
indentified vastly outnumbers the text regions.
Figure 8 MSER Example Result
4.8 Regionprops
The Matlab command regionprops is deployed to apply removal of MSER regions
based on geometric properties. Data from the MSER regions must be converted to
linear indices so that it can be operated on with regionprops. The regionprops
command then measures and returns statistical analysis of the MSER regions
previously identified. There are numerous property types which can be used for
geometric thresholding and selection may depend on the requirements of an
application. Thos properties selected in this instance are included in the section of
Matlab code in Figure 8.
Figure 9 Geometric Properties Thresholds
The ‘Extent’ parameter for example, returns a value that specifies the ratio of pixels in
the region to pixels in the total bounding box (Mathworks.com, 2016c). A threshold
range for this property is set, as in Figure 8, remvoving MSER regions based on these

29
criteria. ‘Eccentricity’, ‘Solidity’ and ‘Euler Number’ are each calculated in a similar
manner using the regionprops command.
The ‘Aspect Ratio’ is calculated as the ratio of the height of the image area to
its width. Information is extracted from the bounding box regions of the image using
regionprops and the ratio calculated as the width divided by the height. A threshold is
applied in the same way as with the other geometric properties.
MSER regions determined by the thresholds are removed from the image
based on this technique. It is anticipated that this would result in a significant
reduction in the number of those non-text regions present in an image like the one
seen in Figure 7.
4.9 Stroke-Width Variation
A series of steps are required for analysis of the stoke-width of the region images
before a threshold can be used to elimintate the remaining non-text regions. The
padarray command is used to ‘zero pad’ the image region, effectively encasing it in a
number of zeroes along its edge. This is to avoid corruption due to boundary effects
which can occur as a result of filtering (stackexchange.com, 2016).
In Matlab the bwdist command is used to calculate the distance transform of a
binary image. This function calculates a number that is the distance between a given
pixel and the nearest non-zero pixel. Morphological thinning is then applied to the
image to remove some foreground pixels from the image. This is commonly known as
skeletonisation and produces a drastically thinned image which retains the
connectivity and form of the original.
The results of the distance calculation and the thinning operation are combined
to determine the stoke-width values present in the image. The standard deviation and
the mean of the values are used in a calculation to determine a stoke-width
measurement. This measurement is then used along with a threshold value with the
intention of removing all remainin non-text regions. The section of Matlab code used
to determine the stroke-width measurement and threshold is included in Figure 9.
Figure 10 Stoke-Width Thresholding

30
4.10 Bounding Boxes
As stated in the theory section, bounding boxes are used to bring form to the data
present in the image. Matlab is equipped with considerable functionality for applying
bounding boxes and the process is initiated by determining bounding boxes for each
of the remaining text regions. These bounding boxes can be expanded slightly to help
ensure overlap between connected components. This is achieved by applying a small
expansion amout to the bounding boxes and is an important feature in determing the
structure of a text string returned from the OCR function. The effect of varying the
expansion amout is discussed in greater detail in the results section. Figure 10 shows
the effect of applying bounding boxes to each of the character regions and a clear
overlap is clearly visble among the components.
Figure 11 Bounding Boxes
A bounding box overlap ratio is calculated and graphed so that connected
regions within the image can be identified. These connected components are then
merged together based on a non-zero overlap ratio to form a text string or word. In
Figure 11 the lines of code remove the bounded boxes that only contain single
components and the text region presented to the OCR function is displayed in Figure
12.
Figure 12 Merging of Bounding Boxes
The example in Figure 12 is an ideal scenario as the entire number plate has been
identified as a single text string, making future comparison with stored registrations
much simpler.

31
Figure 13 Merged Bounding Boxes
4.11 OCR Function
The OCR function used for this project is an existing function in Matlab, the
operation of which has been discussed in the theory section of this document. In terms
of the methodology employed for using this function the process is a simple matter of
applying the text image, appropriately processed, with the correctly merged bounding
boxes to the OCR command. The section of code in Figure 13 depicts how this is
accomplished.
Figure 14 OCR function code
The result of the OCR function is a text string of the recognised alphanumeric digits
printed in the Matlab command line.
4.12 String Comparison
For the digits recovered from a vehicle registration plate to be relevant in any kind of
automated system a method is required for comparing them with what is expected or
required. In an operational, fully-automated system the alphanumeric digits extracted
from an image may be compared with an extensive database of all registration
numbers cleared for access. The result of the comparison would be a simple positive
or negative, resulting in action or inaction. This seems to be a relatively simple
procedure but due to the various data types and array structures present in Matlab, a
certain amount of manipulation is required to implement direct comparison.
Matlab is equipped with a function for comparing strings, called as
strcmp(A,B), which returns a true(1) or false(0) depending on whether or not the
strings match. It is important to note that the data operated on within this function
must be of the same type. Therefore the 1x10 character array generated from the OCR

32
function cannot be directly compared with the string entered as the expected
registration digits.
The solution to the problem of comparing different data types is to convert
them both to a mutual type. This requires the use of the cellstr(S) function in Matlab
which facilitates the creation of a cell array of strings from any character array. A cell
array in Matlab is one whose elements are cells. Each cell in a cell array can hold any
Matlab data type including numerical arrays, character strings, symbolic objects and
structures (Hanselman & Littlefield, 2001).
Taking the example of a vehicle registration plate accurately detected by the
OCR engine as ‘XJZ 7743’, the answer returned is a 1x10 Character Array and is
stored in Matlab as such. Entering the string B = 'XJZ 7742' is stored in Matlab
simply as the value ‘XJZ 7747’. Comparisons of these two results with the string
compare function returns a ‘0’ as the values are in different formats.
This error is overcome by creating two cell arrays from the stated values. The
lines of code in Figure 14 below show the method for comparing these two cell
arrays:
Figure 15 Cell Arrays
B is the manually entered string to provide comparison with; A is the output of the
OCR function converted to a character array; D is this OCR output generated as a 1x1
cell array; E is the string generated as a 1x1 cell array. F is the result of comparison
between the cell arrays D and E using the string compare function.
In this example the output of the string comparison function, F, is equal to ‘1’.
This provides positive confirmation of a match which can then be implemented as a
condition for the execution of an automated response function.
4.13 Indication of Recognition
In order to indicate whether or not the system has produced a positive match and to
represent an automated response to this match, some form of output from the system

33
would be required. The basic premise of this function, as stated previously in this
report, is to enable or block entry to a parking facility and to alert an operator when
this is considered necessary.
The Raspberry Pi provides a suitable platform for this purpose as its GPIO
pins can be implemented to trigger an external response to the system inputs. In real-
world applications this output may be tailored to meet the specific requirements of a
given system. For example, a servo motor may be triggered to raise a barrier or an
alarm sounded to alert a system operator. In this case a simple LED can be used for
simulation and testing of the efficacy of the Matlab code and external circuitry.
In Matlab code an ‘if’ statement can be used to determine a response which is
dependent upon the presence of a specified input condition(s). For instance, it may be
used to implement a certain set of conditions when the ‘if’ statement is ‘true’,
otherwise the status-quo persists. Alternatively it could be used to determine an output
based on several potential input conditions, determining the required output upon the
presence of a given condition.
For testing the output of the system the input condition is provided for by the
result of the string compare function discussed in the previous section. Therefore the
code could be compiled to trigger some form of response when the output of the
string compare function (F) is equal to ‘1’. In cases where the OCR function is unable
to determine a positive match F is equal to ‘0’, in which case the system can be
configured to produce no response at all or an alternative response such as a red light
to indicate that the comparison is negative.
A section of code containing the ‘if’ statement is inserted in Figure 15below.
Figure 16 'if' statement in Matlab
The initial condition to this section is ‘F == 1’. When this condition is met due to a
positive match from the OCR function and the string compare function, a digital
output pin on the Raspberry Pi is sent HIGH. An ‘else’ statement is included to ensure
that should this condition not be met the digital output pin will remain LOW.
To provide visual indication of a positive or negative match an external LED is
connected to the relevant output pin via a transistor. The transistor is required to

34
provide enough current so that the LED is easily visible. The current provided from
the GPIO pins on the Raspberry Pi is insufficient for this purpose.
In the event of a positive match, i.e. F==1, the green LED is switched on. When
the code is run and the result is a negative match, then the red LED will be switched
on.
A single iteration of the system is completed when the second push-button switch
is pressed as this switched off all external LEDs, closes all open figures, deletes the
input image and exits the While Loop.
4.14 Graphical User Interface
For the purpose of improving the utility of the character recognition algorithm a
Graphical User Interface (GUI) has been developed in Matlab. The software provides
tools for creation of the GUI and facilitates inclusion of push-buttons, graphs and text
etc. Among the virtues of using a GUI in Matlab is that it can disguise a vast and
complex program code behind an easy to use interface. The GUI created for this
system can be viewed in Figure 15.
Figure 17 Graphical User Interface
The image in Figure 15 depicts the GUI following acquisition of an image. Text
regions and the detected text are displaye on the screen which may be useful to a
system operator. The GUI also enables the user to establish connection with the
remote device and to manually override the system. Finally the Raspberry Pi can be
shut down remotely by pressing the associated push-button on the GUI.

35
5 Experimental Testing
5.1 Initial Testing
Testing has been carried out on the various components of this project throughout the
duration of the academic year. Having considerable elements of both hardware and
software, testing required a modular methodology to ensure that each part of the
system worked correctly before it could be integrated within the overall design.
Testing of the image processing and character recognition aspect of the project
required considerable experimentation to ascertain the effectiveness of various
components and to understand the reasons for disappointing results. Clearly not all of
these tests can be documented in the results section but a detailed overview and
analysis of the work undertaken is provided.
Many of the experiments employed previously acquired images of vehicles
taken from different angles and distances to provide an adequate range of complexity.
These can then be used to run experiments on the image processing algorithm without
having to include the communications element of the project. This testing
methodology led to several instances of successful recognition, enabling the project to
progress towards integration of a complete system.
As well as applying different images to the processing algorithm, tests included
making changes to certain elements of the program to observe the results and use the
information to refine the process. Results from this type of testing are provided in the
Complex Recognition section.
Initial testing of the hardware elements of the system have been carried out by
interfacing the Raspberry Pi with a circuit constructed on a breadboard. These tests
were designed to determine the effectiveness of the switches and LEDs for modelling
the acquisition trigger and the automated response. These intial test results proved
successful, enabling work to proceed on the PCB design and manufacture.
Running concurrently with the breadboard testing was testing of the wireless
communication between the Raspberry Pi and a laptop computer. Simple testing such
as programming LEDs to flash progressed on to more complex tasks such as
transmitting an image from microcontroller to laptop.

36
5.2 Complete System Test
Complete system testing combined each of the component elements of the project to
determine whether or not it would operate as expected. This process has not
proceeded as smoothly as anticipated although it has provided some positive results.
The hardware used for the complete system test, including the Raspberry Pi and PCB
can be viewd in Figure 15.
Figure 18 Complete System Hardware
On a number of occasions the system has operated entirely as expected,
providing a lit green LED to signify correct recognition of the input characters.
However certain issue have arisen with the system that has limited the time spent on
refining the final product. For example, the system when left idle for a significant
period of time tends to lose connectivity with the wireless network in the lab. This can
lead to errors when attempting to reconnect, as the program running in Matlab
considers the device as being still connected but unable to respond to commands.
However this appears to be an issue with the network itself as testing in other venues
has not produced the same problem.
An additional feature of the complete system test was the discovery that the
Raspberry Pi camera module tended to deliver four snapshots to the Matlab program
when only one is expected. This would lead to a type of backlog in which triggering
of the camera module would lead to Matlab processing a leftover image that may have
been acquired several minutes earlier. This particular issue was solved by making
some minor adjustments to the Matlab code to ensure that only one image is acquired
with each iteration.

37
5.3 Limitations to Testing
Several limitations to testing of the system have been experienced, some of which
may be relevant to the results obtained. Perhaps the most debilitating of these has
been the difficulty in obtaining and maintaining adequate wireless connectivity in the
lab. Intermittent connectivity led to a significant amount of time being expended on
troubleshooting network problems. Occasionally it was not possible to establish any
connectivity between the Raspberry Pi and the college network, making testing of the
overall system very difficult.
On reflection, a more prudent approach may have been to perform all testing
with an Ethernet connection to avoid time wasted on wireless issues. Final completion
of the system could, in that case, have incorporated the wireless element.
Lighting proved to be something of a restriction to results obtained from live
input images. The less than adequate lighting in the lab setting combined with
intermittent changes in intensity due to sunlight made consistency of results extremely
difficult during testing. However this can also be interpreted as a positive aspect as
solutions to these problems are required in real-world scenarios.
Finally with regards to limitations, it is important to understand that all of the
testing completed for this project has been in relation to static text images. What is
meant by the word static is that the text content of the image is stationary at the
moment of image acquisition. This is in contrast to more advanced systems that use
sophisticated techniques to extract text from moving vehicles for example.
6 Results and Discussion
6.1 Introduction
The results obtained from testing of the image processing algorithm and the overall
system are numerous and generally successful in relation to prior expectations. The
Raspberry Pi is able to acquire an image when triggered. This image can be
transmitted to a laptop wirelessly via a Wifi network where it is applied to the image
processing algorithm. In many instances the correct characters are obtained and a
green LED switched on in response.

38
With specific regards to the image processing and character recognition element
of the project it is important to understand that those results demonstrated in the
following paragraphs have been obtained through many stages of experimentations
with the various components of the algorithm. It is not possible to discuss the result of
each test but a detailed overview is provided.
Not all tests have been successful in achieving the desired target of the system
i.e. to correctly identify the characters in a vehicle registration plate. However each
unsuccessful test has provided information on the effects of the various processing
techniques which has helped in refining elements of the program. Some of the more
interesting results, obtained from unsuccessful tests, are documented in the Complex
Recognition section of this report.
The Basic Detection section will show how the algorithm has been successful in
identifying text regions and recognising them correctly as those in the input image.
The title of the section relates to the relative complexity of the input image which is a
primary reason for the positive results. The Complex Detection section employs a
series of examples to demonstrate how changes to thresholding parameters in the
algorithm affect its performace.
The results presented are supplemented by discussion and analysis of the overall
system and recommendations for further work.
6.2 Basic Detection
The image in Figure 16 is one used prominently in tests carried out throughout the
duration of this project and is a typical example. The particular features of this image
that make it conducive to character recognition are the clearly defined black character
regions against a yellow background, the lack of external image regions that may be
miscalculated as potential text regions and the close to ideal angle from which the
image has been acquired.
Figure 19 Basic Detection - Input Image
In Figure 17 the input image has been converted to grayscale and the MSER regions
technique applied. With a fairly basic image like this one it is anticipated that this

39
method should have no difficulty in detecting all of the character regions and should
only detect minimal non-text regions, or perhaps even zero non-text regions.
Figure 20 Basic Detection - MSER regions
As expected, the character regions have been detected with only two non-text regions
below the number plate being identified as potential text. With so few non-text
regions initially detected due to the lack of complexity in the image the next two
stages of the algorithm have a greater chance of identifying the text regions which are
to be operated on by the OCR function.
Figure 18 depicts the effect of applying the regionprops command and
statistical thresholding to the image post-MSER.
Figure 21 Basic Detection - Geometric Properties method
In this case, as in many of the experiments with this type of image, the MSER regions
remaining from Figure 18 have been removed, leaving only text regions remaining.
The effects of applying stroke-width thresholding can be viewed in Figure 19.
Figure 22 Basic Detection - Stoke-width thresholding
The fact that the second stage in the pre-processing algorithm has successfully
identified all of the text regions makes the third stage somewhat redundant in this
instance. This can be a common occurrence in image processing algorithms when
well cropped images like this one are involved. However it is necessary to include the
stoke-width thresholding stage due to its effect on the consistency of results in more
complex situations. The importance of including each of the three pre-processing
stages will be made clear in the following section.

40
One of the problems encountered when attempting to generate a character
output was in returning the full registration in the correct order. Following the stroke-
width thresholding, bounding boxes are applied to the image in an attempt to form a
coherent structure from the data. As stated in the methodology section these bounding
boxes are calculated within Matlab but can be adjusted to suit specific applications.
Due to the fact that there are two distinct sections, “LLZ” and “2268” and the
bounding boxes are included to establish text regions, the resultant output tended to
return the two sections in reverse order. A certain amount of trial and error can be
required to overcome an issue like this one but adjustments to the expansion amount
required for increasing the size of each box proved effective in overcoming the issue.
Figures 20 and 21 show how the bounding boxes have been applied
differently in two iterations of the same algorithm.
Figure 23 Basic Detection - Bounding Box comparison
In Figure 20 the bounding boxes have been applied using the associated Matlab
command. However two different expansion amounts have been used to extend the
jurisdiction of the boxes. In the top image in Figure 20 a relatively small expansion
amount has been applied. Although this has resulted in most of the character
components being connected, the central aperture between ‘Z’ and ‘2’ has resulted in
these two not being identified as connected components.
With the expansion amount increased, the second image in Figure 20 shows
how these larger bounding boxes extend over a greater area and result in slight
overlap between the ‘Z’ and ‘2’. With the overlap ratio set to zero, all connected
components are considered as part of a single line of text. The effect this process has
on the input to the OCR function can be seen in Figure 21.

41
Figure 24 Basic Detection - Bounding Box Comparison (1)
The top image in Figure 21 shows how a small expansion amount can result in a
vehicle registration plate being separated into two distinct lines of text. This is an
unwanted situation as it can lead to errors when comparing the text string with an
existing database of registration numbers.
In the second image the increased expansion amount has ensured that the OCR
function will consider the text regions on the image as a single string. This is the ideal
scenario when inputting the image to an OCR function as it eliminates alternative
interpretation of the order of the data.
The algorithm has been extremely successful in identifying and correctly
recognising the characters when operating on basic input images like the one in
Figure 16. The processed image, having been applied to each of the stages
documented in this section is applied to the Matlab OCR function, which provides a
result based on its interpretation of the image. Comparing the edges of the character
regions in the image it returns a text string based on correlation to existing templates.
Figures 22 and 23 show the result of the OCR operation on the processed
image, as printed on the Matlab command line.
Figure 25 OCR result (1)
Figure 26 OCR result (2)

42
In Figure 22 the result has been returned as two distinct text strings. Although it has
returned the correct characters and proven the effectiveness of the various pre-
processing stages as well as the OCR function it is preferable that the result in a single
line of text.
6.3 Complex Detection
Results obtained from the image processing algorithm highlight the importance of
ambient conditions, the quality of the input image and the effectiveness of well-
refined thresholding properties. One of the more interesting aspects of the
experimentation process has been the fact that each iteration of the algorithm provides
information on the functional operation of the process, regardless of whether or not
positive recognition has been achieved. A typical example of this is the variation in
results obtained when applying different camera angles to the vehicle registration
plates.
In order to further demonstrate the results obtained from experimentation with
the algorithm in Matlab a single input image is being used to display the effects of
various adjustments to the image processing properties. The image is displayed in
Figure 24 below and the intention is to isolate the registration plate characters as the
only Regions of Interest to the OCR function. This particular image has been selected
due to certain properties it exhibits. These include the substantial light contrast from
the top to the bottom of the picture and the offset angle of the vehicle registration
plate.
Figure 27 Alfa Romeo Input Image
Tables 1 to 4 contain property types used in the image processing algorithm to
differentiate between RoI’s in an image. Each property is associated with a threshold

43
value which can be adjusted to determine the effect of each property in distinguishing
between general colour concentrations and text regions. The first five properties in
each table are the geometric properties discussed in Section 3.4 and the sixth is the
Stroke-width threshold discussed in Section 3.5.
Table 1 contains the base values used to configure the region properties and
stroke-width thresholding levels. With these values in place several separate instances
of character recognition have been successful. In fact, with this configuration the
system has been able to produce the automated response to positive recognition
anticipated in the design. However, these successful cases have been induced in ideal
conditions or with much less complex input images than the one seen in Figure 24.
Table 1 Parameter Values - First Iteration
Parameter Threshold Value
Aspect Ratio >3
Eccentricity >0.995
Solidity <0.3
Extent 0.2< OR <0.9
Euler Number <-4
Stroke-width Threshold 0.4
From left to right the images in Figure 25 as well as in Figures 26, 27 and 28 depict
the three key stages of the image processing algorithm: 1) MSER region detection, 2)
removal of MSER regions based on geometric properties and 3) removal of remaining
non-text regions based on stoke-width detection.
Figure 28 Processing results - First Iteration
In the first image in Figure 25, the MSER technique demonstrates both its strength
and weaknesses. The method has successfully identified the seven character regions in

44
the image but has also identified a very high number of additional regions which are
considered potential text regions. It is the sheer quantity of potential RoI’s determined
using the MSER methodology that make further pre-processing of the image a
necessary requirement. However it should be noted that the volume of MSER regions
detected in this image is a consequence of the inherent complexity presented. Much of
the experimentation carried out for this project has been undertaken with extremely
basic text images, often resulting in detection of text regions only, or very limited
non-text regions.
In the second image presented in Figure 25 the regionprops command has
been employed to measure the specified geometric properties with the intention of
eliminating non-text regions based on the threshold values seen in Table 1. In this
instance the technique has been fairly successful in removing many of those ‘blob’
regions detected using MSER. The areas surrounding the license plate have been
removed, as have many of those on the grill and window-wipers of the vehicle. This
stage of the process has also been successful in maintaining the character regions on
the number plate for further processing.
Despite many of the non-text regions being removed during this stage of the
process it can be deduced from those remaining regions that the parameters
documented in Table 1 are not ideally refined for this image.
The final image in Figure 25 depicts the result of applying stroke-width
analysis and a threshold of 0.4 to the picture. As stated previously, the stroke-width
measurement is calculated as the standard deviation of the stroke widths divided by
the mean of the stoke widths. In this example the stroke-width threshold has been
entered as 0.4. Those areas of the image with a stoke-width measurement greater than
0.4 are indexed and identified as likely text regions.
Use of stroke-width analysis has been partially successful in removing some
of the remaining non-text regions, particularly the significant ‘blob’ of colour to the
top-left of the image. However there are still several areas of non-text regions which
have not been eliminated. Perhaps of even greater significance is the fact that
application of the stroke width threshold has actually resulted in removal of one of the
license plate characters as a potential text character. In this instance the ‘W’ does not
meet the specified criteria.
There are a number of possible reasons for the disappointing results obtained
in this example. One of these is the likelihood that the number of non-text regions

45
remaining after the geometric property thresholding had been applied has resulted in a
skewed calculation of the stoke-width average which is not primarily based on actual
text character values.
Another potential reason for the removal of the ‘W’ character is that the
threshold setting may be too low when applying the stoke-width analysis. Although
increasing the threshold may result in this character being detected it may also result
in additional non-text regions being identified.
The results demonstrated in Figures 26, 27 and 28 will show how making
changes to parameter values in the image processing algorithm can improve or worsen
the overall performance in detection of RoIs.
In Table 2 the stroke-width threshold has been increased from 0.4 to 0.5 with
the other parameters remaining constant from the previous example. The expectation
if that the increased stroke-width threshold will result in inclusion of the ‘W’
character as a detected text region. However this change is not going to be a panacea
for the many non-text regions seen previously.
Table 2 Parameter Values - Second Iteration
Parameter Threshold Value
Aspect Ratio >3
Eccentricity >0.995
Solidity <0.3
Extent 0.2<OR<0.9
Euler Number -4
Stroke-width Threshold 0.5
As in the previous example, Figure 26 depicts the effect of the three important pre-
processing techniques on the input image.

Thesis

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Thesis

Similar a Thesis (20)

Thesis