This document summarizes an algorithm for novel text detection based on character and link energies. The algorithm can detect text in various lighting conditions and complex backgrounds. It analyzes candidate text objects by calculating character energy based on parallel edge similarity and non-noise pairs. Link energy is also computed to measure the probability that connected candidate parts are both characters. Text unit energy is then calculated using character and link energies to refine detected text objects. Evaluation on ICDAR datasets shows the algorithm achieves higher precision and recall than other text detection methods.
1. “A Novel Text detection System Based on
Character and Link Energies”
Presented by: Arun Patel
Roll No.: 15EC65R18
M.Tech 1st year VIPES, IIT Kharagpur
1
2. Algorithm
• This algorithm can detect most text object in various condition including different lightening,
different colors, complex background and low contrast text.
• This method is robust to the font, size, color and orientation of text and discriminate text object
from others effectively.
Fig(1) Algorithm 2
3. Initialization of Candidate Text Objects
Localize the candidate Part
Euler number
Let 𝑣𝑖 and 𝑣𝑗 be two candidate parts with widths W 𝑣𝑖
and Wvj , heights Hvi and Hvj , and
centroids Cvi and Cvj ;
dist.(Cvi ,Cvj)≤wd .min(max(Wvi ,Hvi), max(Wvj ,Hvj))
Finally, the candidate character parts that are reachable by one another via one or more links are
grouped to form a candidate text.
3
5. Character Features
• One important characteristic that discriminate text object from other object is that character are made
up of strokes that typically have approximately uniform thickness resulting in two near parallel edges
sets in their boundaries.
• Two edges sets have high similarities in length, orientation and curvature.
• Similarities of two stroke edges is captured by gradient vector of each point on the boundary.
Fig.3 (a) edge pairs of strokes (b)Gradient vectors of ‘R’ 5
6. • For a character, it has two near parallel edges sets and the gradients of an edge point and its
corresponding point should have approximately opposite direction.
• Distance between the points and their corresponding are similar because the change of stroke
width is usually small.
Fig.4 Corresponding pairs and links 6
7. Average Angle Difference of Corres. Pairs(Dangle)
• Let N denote the number of edge points of a candidate part. P(i)(1 ≤ i ≤ N) is the ith edge point with
the corresponding point P(i) corr .The difference of the gradient directions of the corresponding
pair (P(i) , P(i)
corr.) is defined as:
𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖)
=abs(𝜃 𝑝
(𝑖)
-𝜃 𝑝 𝑐𝑜𝑟𝑟
(𝑖)
)
• Dangle measures the average gradient direction difference of all corresponding pairs of a candidate
part.
• Dangle =
1
𝑁∙𝜋 𝑖=1
𝑁
𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖)
• For an ideal character Dangle reaches the maximum value 1.
7
8. Fraction of non-noise pair (Fnon-noise)
• In some cases, however, a character may have a smaller 𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖)
due to noise or deformations.
We compute Fnon−noise to measure the noise and deformation levels of a part based on d (i)
angle .
• Fnon-noise =
1
𝑁 𝑖=1
𝑁
ℎ(𝑑 𝑎𝑛𝑔𝑙𝑒
𝑖
, β)
• h(𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖) ,β)=1 if d (i)
angle >β
=0 else
• Fnon-noise is the fraction of all pairs for which the angle difference d(i)
angle is greater than β.
Fig5.Noise connections and non-noise connections Ref.(1) 8
9. R A C E
Dangle 0.889 0.865 0.925 0.897
Fnon-noise 0.754 0.684 0.897 0.806
Fig.6 Dangle and Fnon-noise Ref. (1)
9
10. • we divide the non-noise connections into two types: stroke-length connection and stroke-width
connection.
• By doing so, we can separate circle like objects and compute the feature vector of stroke width.
• Let k(i)(1 ≤ i ≤ N) be one of N non-noise connections of a part and have Ik
(i) intersections with other
non-noise connections. We define stroke-length connection and stroke-width as follows:
• K(i)∈ stroke−length connection, if (
Ik
(i)
𝑁
)> TIS
stroke−width connection, otherwise
• For circle, every connection intersects with all other connections at its center. Hence, all non-
noise connection of a circle are stroke length connection.
• Character have much more stroke-width connection than the non-characters.
10
11. Fig 7. Percentages of stroke-width links of two example images. Ref.(1)
11
12. Vector of Stroke Width (𝑉 𝑤𝑖𝑑𝑡ℎ)
• The vector of stroke width Vwidth is defined as: 𝑉 𝑤𝑖𝑑𝑡ℎ=[𝑤 𝑑
(1)
, 𝑤 𝑑
(2)
].
• Characters typically have one or two dominating stroke widths depending on their fonts.
• Then, we estimate dominating stroke-width w(i)
d through a weighted average computation using w(i)
p and its
two immediately adjacent neighbors:
𝑤 𝑑
(𝑖)
= r1×( 𝑤 𝑝
(𝑖)
−1)+ 𝑤 𝑝
(𝑖)
+r2× ( 𝑤 𝑝
(𝑖)
+1)
r1+1+r2
Fig.8 Histogram of the lengths of stroke width connections Ref. (1) 12
13. Character Energy
• For a part vi , we consider that its 𝐷 𝑎𝑛𝑔𝑙𝑒
(𝑖)
and 𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒
(𝑖)
are equally important for text detection
and define the character energy 𝐸𝑐ℎ𝑎𝑟
(𝑖)
of vi as follows:
𝐸𝑐ℎ𝑎𝑟
(𝑖)
=
𝐷 𝑎𝑛𝑔𝑙𝑒
(𝑖)
+𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒
(𝑖)
2
,0≤ 𝐸𝑐ℎ𝑎𝑟
(𝑖)
≤1.
• It can be treated as a measure of the probability that vi is a character.
• Character have larger Echar can discriminate text objects from other objects and it is robust to the
font,size,color and orientation of characters.
• 𝐷 𝑎𝑛𝑔𝑙𝑒
(𝑖)
and 𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒
(𝑖)
are correlated.
13
14. (a) (b)
Fig.9 two character with different noise/deformation levels Ref. (1) 14
𝑫 𝒂𝒏𝒈𝒍𝒆 𝑭 𝒏𝒐𝒏_𝒏𝒐𝒊𝒔𝒆 𝑬 𝒄𝒉𝒂𝒓
(a) 0.8846 0.5950 0.5950
(b) 0.8847 0.5261 0.7054
16. Link energy
• Link energy is computed for every candidate link to measure the probability that two parts
connected by the link are both characters.
• Link energy is computed by measuring two values:
1. Similarity in the properties of neighboring parts, such as the color, stroke width, and size.
2.Spatial consistency in the direction and distance between neighboring parts in a string of parts.
• For two connected parts vi and vj ,we use color, stroke width(Vwidth),character width, and character
height to capture similarities between them.
𝐸𝐿𝑖𝑛𝑘
(𝑖,𝑗)
=
1
4 𝑘=1
4
(𝑤 𝑘.𝑠𝑖,𝑗
(𝑘)
) 𝑤 𝑘= 0.25
• Higher the 𝐸𝐿𝑖𝑛𝑘
(𝑖,𝑗)
higher the similarities between two parts.
16
17. Similarity Computation Of two Character
Fig.11 Link energy Ref.(1) 17
colour 𝑆𝑖,𝑗
(1)
=
1
3 (𝐶=𝑅,𝐺,𝐵)(1 −
|Ci−Cj|
255
)
Vwidth
𝑆𝑖,𝑗
(2)
=
1
2 𝑘=1
𝑘=2
𝑆𝑖𝑚𝑖 𝑅𝑖,𝑗
𝑣
𝑘 , 𝑅𝑖,𝑗
𝑣
=
Vi (k)
Vj (k)
Character
width
𝑆𝑖,𝑗
(3)
= 𝑆𝑖𝑚𝑖 𝑅𝑖,𝑗
𝑤
, 𝑅𝑖,𝑗
𝑤
=
𝑊𝑖
𝑊 𝑗
Character
Height
𝑆𝑖,𝑗
(4)
=Simi(𝑅𝑖,𝑗
(𝐻)
), 𝑅𝑖,𝑗
𝑤
=
𝐻𝑖
𝐻 𝑗
18. Text Unit Energy
• For the text unit containing two parts vi and v j , the text unit energ 𝐸𝑡𝑒𝑥𝑡
(𝑖,𝑗)
is computed using
character energies 𝐸𝑐ℎ𝑎𝑟
(𝑖)
, 𝐸𝑐ℎ𝑎𝑟
(𝑗)
and link energy 𝐸𝑙𝑖𝑛𝑘
(𝑖,𝑗)
:
• 𝐸𝑡𝑒𝑥𝑡
(𝑖,𝑗)
=
1
2
[(
𝐸 𝑐ℎ𝑎𝑟
(𝑖)
+ 𝐸 𝑐ℎ𝑎𝑟
(𝑗)
2
)+ 𝐸𝑙𝑖𝑛𝑘
(𝑖,𝑗)
]
• To refine the detected text objects, text units whose text unit energies are smaller than a pre-
defined threshold Ttext are removed from the text objects.
• choice of this threshold depends upon the characteristic of the datasets, a threshold of of 0.7
worked well for several datasets used for testing this algorithm.
18
23. References:
• [1] Jing Zhang and Rangachar Kasturi ,“A novel text detection system based on characters
and link energies”, image processing, IEEE trans., vol.23, No.9, pp.4187-4198, September 2014.
• [2] S.M.Lucas, A. Panaretos, L.Sosa,A. Tang, S. Wong, and R. Young, “ICDAR 2003 robust reading
competitions”, in Proc. 7th Int. Conf. Document And Recognit.,vol.2,pp. 682,2003.
• [3]D,.Marr and Hildreth, “Theory of edge detection,” Proc.Roy.Soc. London B,vol.
207,No.1167,pp. 187-217,1980.
23