The objective of the work was to reveal the
efficiency of state of the arts methods on spotting texts in natural scene images on the Arabic text images.
2. CONTENTS
• EAST Text detection
• Pipeline of text detection from natural scene
• Results of text detection on our data set
• The Optical character Recognition using Tesseract
• Results of text recognition our data set
• Comparison between Arabic & English recognition
• Character segmentation using MSER Using EASTR DATA
• Training TESSERACT with Arabic Data
3. EAST TEXT DETECTOR
• An Efficient and Accurate Scene Text Detector by Zhou (2017)
The core of text detection is the design of features to distinguish text from backgrounds
Fully Convolutional Network
FCN
Non Maximum Suppression
• Method consists of two stages: a Fully Convolutional Network and an NMS merging stage.
• The pipeline is flexible to produce either word level or line level predictions.
5. STRUCTURE
The model is a fully-convolutional neural network adapted for text detection
that outputs dense per-pixel predictions of words or text lines.
This eliminates intermediate steps such as candidate proposal, text region
formation and word partition.
The post-processing steps only include thresholding and NMS on predicted
geometric shapes
Adopted DenseBox Method for Object detection
6. DENSEBOX
The single convolutional network simultaneously output multiple predicted bounding boxes and class
confidence.
All components of object detection in DenseBox are modeled as a Fully Convolutional Network FCN except
the non-maximum suppression step, so region proposal generation is unnecessary.
The system takes an image (at the size of m × n) as input, and output a m/ 4 × n/ 4 feature map with 5 channels.
Finally every pixel in the output map is converted to bounding box with score, and non-maximum suppression is
applied to those boxes whose scores pass the threshold
8. PIPELINE
The model can be decomposed in to three parts:
1. Feature extractor stem (convolutional network pre-trained
on ImageNet )
Four levels of feature maps, denoted as fi, are extracted from
stem, whose sizes are 1/ 32, 1 /16, 1 /8 and
1 /4 of the input image, respectively
2. Feature-merging branch
Fed to an unpooling layer to double its size
concatenated with the current feature map
conv1×1 bottleneck cuts down the number of channels and
reduces computation
conv3×3 that fuses the information to finally produce the
of this merging stage.
3. Output layer
From 32 channel
to one channel
of score map
U-
Shape
9. NON MAXIMUME SUPPRESSION
Each output prediction have certain probability Pc
Disregard All boxes whose Pc < 0.6
For the remaining box
Pick the box with the largest Pc
Disregard all the remaining boxes IoU (Intersection over Union) > 0.5 with the output box in the previous step
15. TESSERACT OCR
Tesseract is an open source OCR engine originally developed by Hewlett-Packard Laboratories, Bristol and
Hewlett-Packard Co.
It is considered one of the most accurate OCR engines that is available. It can read a wide variety of image
formats and can convert text written in more than 60 languages.
16. CODE
determine the ratio of the original image
dimensions to new image dimensions
we resize the image
The first layer is our output sigmoid activation which
gives us the probability of a region containing text or not
The second layer is the output feature map that
represents the “geometry” of the image
17. Extract Region of interest ROI
Text recognition using Tesseract
text_recognition.py --east frozen_east_text_detection.pb --image images/offer.jpg
text_detection.py --image images/offer.jpg --east frozen_east_text_detection.pb
18. CONVERTING Image into blob
passing the path to the EAST detector
•The output geometry map used to derive the bounding box coordinates of
text in our input images
•And similarly, the scores map, containing the probability of a given region
containing text
19. •rects : Stores the bounding box (x, y)-coordinates for text
regions
•confidences : Stores the probability associated with each
of the bounding boxes in rects
Measure geometric score and
probabilities and add
bounding box
21. EXTERMAL REGIONS
Extremal regions are connected areas that are characterized by uniform intensity and surrounded by a
contrast background.
The stability of a region can be measured by calculating how resistant the region is to the thresholding
variance.
This variance can be measured with a simple algorithm:
1. Applying the threshold generates an image A. Detect its connected pixel regions (extremal regions).
2. Increasing the threshold by a delta amount generates an image B. Detect its connected pixel regions
(extremal regions).
3. Compare image B with A. If a region in image A is similar to the same region in image B, then add it to the
same branch in the tree. The criteria of similarity may vary from implementation to implementation, but it's
usually related to the image area or general shape. If a region in image A appears to be split in image B,
create two new branches in the tree for the new regions, and associate them with the previous branch
22. MSER
The probability of each ER being a character is estimated using novel features calculated with complexity
O and only ERs with locally maximal probability are selected for the second stage,
ETERMAL REGION
DETECTOR