6. Academic Research
• Natural scene OCR ≠ traditional scanned
OCR
–
–
–
–
Camera captured
Illumination variations
Perspective distortion
Short text
Digital-born text
Natural-scene text
Source: ICDAR Text locating competition 6
7. Product Images - Two Purposes
Text’s role is different
1. Sales pitches
1. Product list
7
12. Current methods
1. Texture based (Classifier-based)
2. Region based (Connected components)
3. Hybrids
12
13. 1. Texture-based method
• Special texture
• Scan
• Classifier (SVM, AdaBoost
or Neural network)
Problems:
• Scale/Rotation variant
• High computation
13
14. 2. Region-based method
• Local features
(edges or color clustering)
• Connected component
analysis
• Text lines and word
separation
Output of Stroke width transform
Problem:
• False candidates
14
20. RIT’s Approach
1. Character/word annotation
Time-consuming task
Text image classifier using imagewise annotation
2. Transparent text
Hard to detect
Transparent text detection and
background recovery
20
21. 1. Text image classifier
using image-wise annotation
• Text image detection (not char/word)
– Image-wise annotation (less time)
– Clustering detected regions
(measure text likeliness)
21
23. f2
Clustering detected regions
P(C1) = 3/4
x
x
C1
C5
x
C3
x
x
C2
P(C4) = 0/3
C4
Region in text images
Region in non-text images
x
f1
Cluster center
23
24. Comparison
Better than a typical method
Accuracy
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
Current
Proposed
• Rakuten 500 images
• Compared w/a traditional region-based method
24
25. RIT’s Approach
1. Character/word annotation
Time-consuming task
Text image classifier using imagewise annotation
2. Transparent text
Hard to detect
Transparent text detection and
background recovery
25
26. 2. Transparent text detection and
background recovery
• Edge Detection with adaptive threshold
– Image content analysis
• Background recovery
– Text color/opacity estimation
26
27. Edge detection with adaptive thresholds
•
Less noise
Weak edges are
better preserved
27
28. Texture strength
Measuring image complexity
Image patches:
Direction and energy:
eigenvectors and eigenvalues[1]
Texture strength:
[1] Xiang Zhu and Peyman Milanfar, “Automatic parameter selection for denoising
algorithms using a no-reference measure of image content,” IEEE transactions on
image processing, pp. 3116–32, 2010.
28
29. Proposed text detection
1. Texture based (Classifier based)
SVM/Random Forest/AdaBoost
2. Region based (Connected components)
Edge/Color Clustering
3. Hybrids
Region (Edge Stroke Width)
+
Texture (AdaBoost)
29
33. Transparent Text
opacity
I
text color
I = O(1- r)+ rT
O
I: observed pixel value
O: original pixel value
• 2 >= equations
• Least squares solution
• 2 unknown
33
Hello, my name is Naoki Chiba. Today I am going to talk about text detection in product images
These are examples of product images, which contain sales pitches such price, store name and shipping information. Text detection’s applications would be content retrieval/filtering, character recognition and text translation into different languages for international sales.
Here is the outline of today’s talk. After talking about text detection overview, I am going to review current methods. And then I will talk about Rakuten’s approach.
In academia, text detection has been an active area of research for a long time, started from from traditional scanned OCR, which scan documents by a flat-bed scanner. As matter of fact, current text detection is different.Because of the popularity of imaging devices such as mobile cameras, images may contain illumination variations, perspective distortion and the text is shorter than before. Text images can be categorized into two types: digital born text, which was inserted by an editor and natural scene text, which is having a lot of attention in academia.
Product images have two different purposes. The first is to show sales pitches. The second is to show a product list to represent product variation. Depending on the purpose, role of text is different.
This is an example of a product list. If the image contains store specific information such as merchant’s name, price or shipping information that might not be good.
Another example is what we call “Now printing” images. We use this type of image when the product images are not available although the product has been released or we take pre orders. These images are going to be updated, when the product photo is available. But we need to detect them first. The problem here is that they are provided by our merchants, not by Rakuten due to online market place model. We do not know what images they are going to use before hand.
In summary, product images can be regarded as text in natural scene images in addition to digital born text, which are mixed and difficult to detect.
Next, I am going to show some current methods in academia to detect text in images.
Current methods can be categorized into three methods: texture-based, region-based and hybrids of those methods.
Texture based method uses special texture to find text by scanning a window across the image. Then it classifies a window by classifiers such as Support Vector Machine, AdaBoost or Neural network. But it has two problems. First, it is scale and rotation variant. Second is that the computational cost is high.
The second method is called region-based method. It examines local features either edges or color clustering, followed by connected component analysis, text line grouping and word separation. But the problem is that it may contain a lot of false candidates. Therefore the third type is
Therefore, the third type is hybrid method, which is getting a lot of attentions these days. It is based on region-based method either by edges or by color clustering. And then it confirms fthat the detected regions are text or not by a classifier using machine learning techniques.
Still there are some problems. We would like to solve the following two problems. One is character/word annotation. So character annotation is a time-consuming task, especially when we have a lot of data. Also transparent text is hard to detect.
For example, character annotation is to locate rectangles on top of text characters by hand. If the image contains a lot of characters, annotation by a human operator is very time consuming, especially, when we have a lot of images.
Another problem we would like to solve is transparent text, which is difficult to detect, because the edges are weak. But once we detect them, there is a possibility to recover the background behind the text.
So to solve these problems, I would like to show what RI, Rakuten Institute of Technology, is doing.
To avoid character/word annotation, we built a text image classifier by using only image-wise annotation, which is much more efficient. We are also working on transparent text detection and background recovery. I am going to show the details of the two.
Our text image detection is based on image-wise annotation, which is much less time than character or word annotation.By clustering detected regions by a machine learning technique, we can get a measure of text likeliness.
When each detected region can be represented by image features f1 and f2, we cluster them by the features. Based on image-wise annotation, we can a probability of being text for each cluster. For example, red dots show regions appeared in text images and blue dots in non-text. Tcluster C4 has regions only appeared in non-text images, it is unlikely to be text.
We measured the performance against a typical previous method. It was significantly better. The accuracy has been increased around 20%.
Another problem we are solving is transparent text and background recovery.
We propose adaptive edge detection by analyzing image content. To recover background, we estimate text color and opacity which is transparency of text.
These are examples of detected edges. Compared with traditional edge detectors such as Sobel or Canny, ours are better.
Let me introduce how we do our detection. We measure image complexity as texture strength by analyzing image content. We can measure it by eigenspace analysis. Based on the texture strength, we can setup edge detection thresholds adaptively.
To detect text, we are having a hybrid method. Based on region-based, edge stroke width transform with a machine learning technique.
Here is a system flow. After adaptive edge detection, we work on component analysis and detect text. Once we detect text, we can recover background.
Here are examples. Our system was able to detect transparent text.
Here is a system flow. After adaptive edge detection, we work on component analysis and detect text. Once we detect text, we can recover background.
Transparent text can be represented by this formula. Observing pixel vales, I, are mixture of background values , O, and text color T. The mixing ratio is determined by opacity gamma. Assuming that text color and opacity are uniform in the text, we can solve these parameters by a least square method when we have two sets or more data because the number of unknown parameters is two.
This is an example of recovered image.
We also compared with a previous method called InPainting, which tries to fill the hole of text by surrounding pixel pattern. Although InPainting cannot recover the original content, in this case small hole, ours was able to recover it.
Thank you for your attention. The details will be presented at Asian Conference on Pattern Recognition next month.