SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1
Scanned Document Compression Using a
Block-based Hybrid Video Codec
Alexandre Zaghetto, Member, IEEE, and Ricardo L. de Queiroz, Senior Member, IEEE
Abstract—This paper proposes a hybrid pattern
matching/transform-based compression method for scanned
documents. The idea is to use regular video interframe
prediction as a pattern matching algorithm that can be applied
to document coding. We show that this interpretation may
generate residual data that can be efficiently compressed by
a transform-based encoder. The efficiency of this approach
is demonstrated using H.264/AVC as a high quality single-
and multi-page document compressor. The proposed method,
called Advanced Document Coding (ADC), uses segments of
the originally independent scanned pages of a document to
create a video sequence, which is then encoded through regular
H.264/AVC. The encoding performance is unrivaled. Results
show that ADC outperforms AVC-I (H.264/AVC operating in
pure intra mode) and JPEG2000 by up to 2.7 dB and 6.2 dB,
respectively. Superior subjective quality is also achieved.
Index Terms—Scanned document compression, advanced doc-
ument coding, pattern matching, H.264/AVC.
I. INTRODUCTION
COMPRESSION of scanned documents can be tricky. The
scanned document is either compressed as a continuous-
tone picture, or it is binarized before compression. The binary
document can then be compressed using any available two-
level lossless compression algorithm (such as JBIG [1] and
JBIG2 [2]), or it may undergo character recognition [3].
Binarization may cause strong degradation to object contours
and textures, such that, whenever possible, continuous-tone
compression is preferred [4]. In single/multi-page document
compression, each page may be individually encoded by
some continuous-tone image compression algorithm, such as
JPEG [5] or JPEG2000 [6], [7]. Multi-layer approaches such
as the mixed raster content (MRC) imaging model [8]–[12]
are also challenged by soft edges in scanned documents, often
requiring pre- and post-processing [13].
Natural text along a document typically presents repetitive
symbols such that dictionary-based compression methods be-
come very efficient. For continuous-tone imagery, the recur-
rence of similar patterns is illustrated in Fig. 1. Nevertheless,
an efficient dictionary-based encoder relying on continuous-
tone pattern matching is not that trivial. We propose an encoder
that explores such a recurrence through the use of pattern-
matching predictors and efficient transform encoding of the
residual data.
Copyright (c) 2013 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
The authors are with the Department of Computer Science,
Universidade de Brasilia, Brazil, e-mail: alexandre@cic.unb.br,
queiroz@ieee.org.
Fig. 1. Digitized books usually present recurrent patterns across different
pages and across regions of the same page.
It is important to place our proposal within the proper
scenario. Three premises are assumed. Firstly, we want to
avoid complex multi-coder schemes such as MRC. Secondly,
the decoder should be as standard as possible. Since we are
dealing with scanned compound documents (mixed pictures
and text), natural image encoders, such as JPEG2000, are the
most adequate. Non-standard encoders, based on fractals [14]–
[16], texture prediction [17], [18], template matching [19]
or multiscale pattern recurrence [20], [21], are good options
out of the scope of what is being proposed. Thirdly, one
should provide high quality reconstructed versions of scanned
documents. This is especially important if rare books of
historical value must be digitally stored, thus discarding optical
character recognition (OCR) and token-based methods [2],
[10]. In summary, we want a standard single coder approach
that operates on natural images and delivers high-quality
reconstructed compound documents.
The proposed coder makes heavy use of the H.264/AVC
standard video coder [22]. H.264/AVC has been well explained
in the literature [23]–[27]. H.264/AVC leads to substantial
performance improvement when compared to other existing
standards [25], [28], such as MPEG-2 [29] and H.263 [30].
Among such improvements we can mention [22], [31]: in-
terframe variable block size prediction; arbitrary reference
frames; quarter-pel motion estimation; intraframe macroblock
prediction; context-adaptive binary arithmetic coding; and in-
loop deblocking filter. Results point to at least a factor of
two improvement over previous standards. The many cod-
ing advances brought into H.264/AVC not only set a new
benchmark for video compression, but they also make it a
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 2
formidable compressor for still images [32]. The intraframe
macroblock prediction, combined with the context-adaptive
binary arithmetic coding (CABAC) [33] turns the H.264/AVC
into a powerful still image compression method (i.e. working
on a video sequence composed of one single frame). We refer
to this coder as AVC-I. Gains of the AVC-I over JPEG2000
are typically in the order of 0.25 dB to 0.5 dB in PSNR
for pictorial images [31], [32], [34]. For compound images
(mixture of text and picture) [4], the PSNR gains are more
substantial, even surpassing the mark of 3 dB improvement
over JPEG2000, in some cases [31].
The hypothesis presented is this paper is that a scanned
document encoder that employs state-of-the-art video coding
techniques and generates an H.264/AVC-decodable bit-stream
yields the best rate-distortion performance compared to other
continuous-tone still image compressors. 1
.
II. THE PROPOSED METHOD AND ITS IMPLEMENTATION
USING AVC
The proposed document coder has a generic concept and an
implementation based on a stock H.264/AVC video coder. We
now describe the desired features and how one can implement
them using AVC. The generic description may help the reader
to adapt other video coders for that purpose or to develop
non-standard-based (proprietary) variations.
A. Block-based pattern matching
The encoder is based on pattern matching. The document
image is segmented into blocks of pixels. Each block is
matched to an existing pattern in a dictionary which is popu-
lated by the previous contents of the same document. In order
to do that, we partition the scanned document, which may be
made of one or more scanned pages of H ×W pixels, into Np
(H/ Np × W/ Np pixels) sub-pages or frames. Hence, a
scanned book may be decomposed into many frames. Figure 2
illustrates the page pre-processing (partition) algorithm, while
Fig. 3 shows an example of a frame sequence built from a
3-pages set.
Blocks have for example 16×16 pixels and each one is
matched to an existing pattern in a previous frame. In this way,
the previous frames make a dynamic dictionary of patterns to
look when encoding the present frame, which is continuously
being updated as more frames are encoded. Once a match is
found, the matching pattern is used to predict the block and
the prediction error (residue) is encoded along with the frame
number and position where the match was found (reference
vector). A block can be partitioned into smaller blocks to ease
prediction at the cost of spending more bits to encode reference
vectors. Figure 4 illustrates the effect of using the pattern
matching prediction algorithm. Figures 4 (a) and (b) show
examples of a reference and a current text area, respectively.
Figures 4 (c), (e) and (g) represent the predictions of the
current text using 16×16, 8×8 and 4×4-pixel block partitions.
Figures 4 (d), (f) and (h) are the corresponding residual data.
1Preliminary results of the proposed method over multi-page text-only
documents have been presented at a conference [35].
0 1 2
3 4 5
6 7 8
9 10 11
Fig. 2. A document page is partitioned into segments (labeled in sequence).
Each one is considered a frame and can be sequentially encoded.
Fig. 3. Example of a frame sequence, built from a 3 pages set, Np = 4
frames/page. Frames 1 to 4, 5 to 8, and 9 to 12 are built from pages 1, 2 and
3, respectively.
Notice that the 4×4-pixel prediction generates a lower-energy
residual, when compared to the 16 × 16 and 8 × 8 prediction,
however, they require encoding more reference vectors.
In this context, video coders often use motion estimation
techniques which are essentially the same as pattern matching.
The H.264/AVC is capable of partitioning macroblocks of 16×
16 pixels into any valid combination of blocks of 16 × 8,
8 × 16, 8 × 8, 4 × 8, 8 × 4, and 4 × 4 pixels. Our algorithm is
then to feed the document frames as video frames into AVC
since its motion estimation algorithm will take care of the
pattern matching search for us. However, motion estimation
algorithms always take advantage of the fact that video content
at the same frame position in neighbor frames are typically
very correlated. Since this is not our case, it is advisable to
make the search window to cover as much as possible of the
reference frames, or the whole frame, in order to enrich the
dictionary and to remove the spatial dependency.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 3
(a) (b)
(c) (d)
(e) (f)
(g)
2
(h)
Fig. 4. Illustration of approximate pattern matching using interframe
prediction: (a) reference text; (b) current text; (c), (e) and (g) predicted text
(block size: 16×16, 8×8 and 4×4 pixels, respectively); and (d), (f) and (h)
prediction residue (block size: 16 × 16, 8 × 8 and 4 × 4 pixels, respectively).
Each zoomed image patch has 178× 178 pixels.
B. Inter- and intra-frame prediction
AVC also allows for intra-frame prediction, in which a block
(partitioned or not) can be predicted from neighboring blocks
by means of directional extrapolation of the border pixels.
The decision to use or not intra-frame prediction is typically
based on rate-distortion optimization (RDO) and we use RDO
in all our simulations. However, AVC does not allow for
in-frame motion vectors (IFMV), but many variations using
such a feature and other sophisticated methods of intra-frame
prediction do exist [19]. Apparently, HEVC will also support
IFMV [36]. Breaking up the pages into frames allows for some
intra-document prediction similar to IFMV, yet using a stock
video coder. Furthermore, the information derived from IFMV
is typically very small compared to all compressed data, such
that the advantage should not be much relevant. Because of
that, we do not use IFMV.
Another issue is the random access to different book pages.
In order to get to a book page, we are forced to decompress
all the frames it uses as reference. So, if random access is an
issue, we suggest to periodically use no-reference frames, i.e.
frames in which inter-frame prediction is not allowed, relying
on pure intra-frame prediction/extrapolation.
In our encoder, using the AVC structure, each block-
partitioning combination and prediction mode is tested and the
best one is picked through RDO. With RDO within AVC, in
the k-th configuration test in a macroblock, AVC computes the
rate Rk (bits spent to encode the block) and distortion Dk (sum
of absolute differences - SAD) achieved by reconstructing the
block. One picks the block partition method that minimizes
Jk = Rk + λDk.
The process is then repeated for every macroblock. As usual,
λ controls compression ratios and is varied to find the RD
curves in our simulations.
C. Residual coding
The residual macroblock, i.e. the prediction error, is trans-
formed using 4 8×8-pixel discrete cosine transform (DCT) or
an integer approximation of it. The transformed blocks are
quantized and encoded using arithmetic coding. H.264/AVC
uses an integer transform with similar properties as the DCT
and the resulting transformed coefficients are quantized and
entropy encoded using CABAC.
D. Compound documents and region classification
Compound document compression usually segments the
image into regions and classifies each one as containing text
and graphics or images (or halftones, for instance). Once
a region is classified, it can be encoded using a proper
algorithm. This approach is driven by objects such as text
characters so that regions of the image are labeled based on
our estimate of its contents. Our method, however, is driven
by the compression itself. Rather than only testing pattern-
matching-based prediction for every block partition, we also
test prediction by extrapolating neighboring blocks, as in
”intra-prediction´´ in H.264/AVC. The RD-optimized selection
of the best prediction assures that the best option is picked.
Text and graphics shall contain recurrent patterns and will
be often encoded using patterns from previous regions, while
pictorial regions may resort to intra-prediction. In this sense,
segmentation is embedded into the encoding process. In fact,
the block prediction and RDO may have the same effect of
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 4
Fig. 5. Configuration parameters that have greater influence on the encoder
performance: Rf (number of reference frames) and Sr (search range).
a segmentation map, even though benefiting the compression
process, and not the true identification of image contents.
E. Encoder and decoder summary
In our concept, the frames are fed into AVC in sequence
just like in a regular video coder. Because of that relation to
AVC, we refer to the proposed method as advanced document
coding, or, simply, ADC. In a nutshell, ADC operation can
be summarized as (i) break the book pages into frames; (ii)
feed all frames to H.264/AVC resulting in an AVC-compatible
stream; (iii) decode the bit stream; and (iv) assemble the
decoded frames into the final document book pages.
In order to work well, H.246/AVC should operate in ”High”
profile, following an IPPP... framework. The encoder should
periodically use no-reference I-frames in the case random
access is desired. RDO should be turned on. Motion estimation
should be set to full search over a window that is as large
as possible. Note that other video coders such as HEVC
and MPEG-2 will also work, even though achieving different
performance levels due to their different sophistication levels.
III. EXPERIMENTAL RESULTS
In our tests, different page sets are compressed using
JPEG2000, AVC-I (H.264/AVC operating in pure intra mode)
and the proposed ADC. The reason we chose JPEG2000 and
AVC-I for comparison is that these are the most suitable
standards that would meet the three premises presented in
Section I.
Distortion metrics based on visual models such as Structural
Similarity (SSIM) [37] and Video Quality Metric (VQM) [38]
have been extensively tested for pictorial content. However,
they are unproven for text and graphics, which rely more on
resolution than on number of gray levels. Readability is very
important and some alternative metrics such as OCR efficiency
are considered. A good objective metric to reflect subjective
perception of text has not been well explored yet. Hence, we
opted to stick to the traditional PSNR as a distortion metric.
In JPEG2000 and AVC-I compression, the pages are sepa-
rately encoded. As for ADC, the first frame of the sequence
is encoded as an I-frame (only intraframe prediction modes
are used) and all the remaining frames are encoded as P-
frames (in addition to intraframe prediction, only past frames
0.2 0.3 0.4 0.5 0.6 0.7 0.8
20
25
30
35
40
45
Bitrate (bpp)
PSNR(dB)
Sequence "guita" (Np
= 4, Sr
= 32 and Rf
= 1, 3, 5)
ADC (S
r
= 32, R
f
= 5)
ADC (S
r
= 32, R
f
= 3)
ADC (Sr
= 32, Rf
= 1)
AVC−I
JPEG2000
(a)
0.2 0.3 0.4 0.5 0.6 0.7 0.8
20
25
30
35
40
45
Bitrate (bpp)
PSNR(dB)
Sequence "guita" (Np
= 4, Sr
= 08, 16, 32 and Rf
= 5)
ADC (S
r
= 32, R
f
= 5)
ADC (S
r
= 16, R
f
= 5)
ADC (Sr
= 08, Rf
= 5)
AVC−I
JPEG2000
(b)
Fig. 6. Comparison between JPEG2000, AVC-I and the proposed ADC, for
different combinations of search ranges (Sr) and number of reference frames
(Rf ) for test sequence “guita”. The number of frames/page (Np) is 4.
are used as reference by the interframe prediction). We also
considered that each page may be segmented into Np = 4
frames, Np = 16 frames, or not segmented at all (Np = 1,
for multi-page documents only). Two configuration parameters
have greater influence on the encoder performance. One is
the number of reference frames, Rf , the other is the search
range, Sr, as illustrated in Fig. 5. Initially, we evaluated the
effect of choosing different values for Sr and Rf . Figure 6
shows PSNR plots comparing JPEG2000, AVC-I and ADC
(Np = 4 frames/page), for different combinations of Sr and
Rf . The PSNR was calculated using the global mean square
error (MSE). The higher the Sr and Rf values, the better the
rate-distortion performance. In particular, for Sr = 32 pixels
and Rf = 5 frames, ADC outperforms AVC-I by more than 2
dB and JPEG2000 by more than 5 dB, at 0.5 bit/pixel (bpp).
Our test set is composed by 18 documents divided into the
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 5
TABLE I
AVERAGE OBJECTIVE (PSNR IN DB) IMPROVEMENT OVER EXISTING
STANDARDS FOR THE 4 DOCUMENT TEST SETS.
Document Set 0 1 2 3
JPEG2000 6.26 5.61 4.47 2.41
AVC-I 2.75 2.58 1.56 0.89
following 4 classes2
:
• Class 0: 6 multi-page text-only documents;
• Class 1: 6 single-page text-only documents;
• Class 2: 3 multi-page compound documents; and
• Class 3: 3 single-page compound documents.
Figure 7 illustrates one example of each class. Examples
of PSNR plots are shown in Figs. 8. Figure 9 shows average
PSNR improvement of ADC over JPEG2000 and AVC-I for
each of the document class test sets. In all cases, JPEG2000
and AVC-I are objectively outperformed, considering Sr = 32,
Rf = 5 and Np = 16. Figure 10 (a) shows a zoomed
part of the original “cerrado” sequence. Its encoded and
reconstructed versions using AVC-I, JPEG2000 and ADC,
at approximately 0.25 bits/pixel, are shown in Figs. 10 (b),
(c) and (d), respectively. ADC also yields superior subjective
quality. As a reference, in Table I we present the gains
of the proposed method over AVC-I and JPEG2000 using
Bjontegaard’s method [39] applied to the curves in Fig. 9. As
one can see from the table and from the graphics, the gains
are very substantial.
Our software-based tests using the popular and efficient
x246 implementation of the H.264/AVC standard and the
Kakadu implementation of JPEG 2000, indicate that ADC
(AVC running with 5 reference frames, slowest search, 32×32
window, in IPPPP... mode) is near 10× slower than JPEG
2000. x264 in an Intel Core i7 platform has been shown to
encode RGB video at a rate of 3 to 30 million pixels per
second (Mps). The variation is due to the various x264 settings
that affect speed and quality. Of course, in order to do that,
the system was be dedicated to the task, as an appliance. A
scanned letter-sized (8.5× 11 in) page at 600 pixels per inch
(ppi) yields about 33 million pixels. Hence, we can expect a
page compression speed roughly in the order of 5 to 50 pages
per minute (ppm). This page rate may be acceptable for many
on-the-fly applications and is definitely reasonable for off-line
compression of books and such. A rigorous complexity study
of the encoding algorithms presented here is beyond the scope
of this paper.
IV. CONCLUSIONS
In this paper, we presented a pattern matching/transform-
based encoder for scanned documents named ADC. The reason
why we decided to use H.264/AVC tools to implement the
proposed method is because its interframe prediction scheme
allied with RDO yield an efficient pattern matching algorithm.
2The entire test set is available at http://image.unb.br/queiroz/testset
In addition, the intraframe prediction, the DCT-based trans-
form and the CABAC contribute to improve the encoding
efficiency.
In essence, our work can be summarized as splitting the
document into many pages, forming frames, and feeding the
frames to AVC. Despite the simplicity of the idea, the perfor-
mance for scanned documents is unrivaled, to our knowledge.
Results show that ADC objectively outperforms AVC-I and
JPEG2000 by up to 2.7 dB and 6.2 dB, respectively, with more
significant gains observed for multi-page text-only documents.
Furthermore, the encoder outputs documents with superior
subjective quality. Replacing H.264/AVC by HEVC in ADC
would yield even larger gains.
REFERENCES
[1] JBIG, “Information Technology - Coded Representation of Picture and
Audio Information - Progressive Bi-level Image Compression. ITU-T
Recommendation T.82,” Mar. 1993.
[2] JBIG2, “Information Technology - Coded Representation of Picture and
Audio Information - Lossy/Lossless Coding of Bi-level Images. ITU-T
Recommendation T.88,” Mar. 2000.
[3] S. Mori, C.Y. Suen, and K.; Yamamoto, “Historical review of OCR
research and development,” Proceedings of the IEEE, vol. 80, no. 7, pp.
1029–1058, Jul. 1992.
[4] R. L. de Queiroz, Compressing Compound Documents, in the Document
and Image Compression Handbook, by M. Barni, Marcel-Dekker, EUA,
2005.
[5] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compres-
sion Standard, Chapman and Hall, 1993.
[6] JPEG, “Information Technology - JPEG2000 Image Coding System -
Part 1: Core Coding System. ISO/IEC 15444-1,” 2000.
[7] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression
Fundamentals, Standards and Practice, Kluwer Academic, EUA, 2002.
[8] MRC, “Mixed Raster Content (MRC). ITU-T Recommendation T.44,”
1999.
[9] R. L. de Queiroz, R. Buckley, and M. Xu, “Mixed Raster Content
(MRC) model for compound image compression,” Proc. of SPIE Visual
Communications and Image Processing, vol. 3653, pp. 1106–1117, Jan.
1999.
[10] P. Haffner, P. G. Howard, P. Simard, Y. Bengio, and Y. Lecun, “High
quality document image compression with DjVu,” Journal of Electronic
Imaging, vol. 7, pp. 410–425, 1998.
[11] G. Feng and C. A. Bouman, “High-quality MRC document coding,”
IEEE Trans. on Image Processing, vol. 15, no. 10, pp. 3152–3169, Oct.
2006.
[12] A. Zaghetto, R. L de Queiroz, and D. Mukherjee, “MRC compression
of compound documents using threshold segmentation, iterative data-
filling and H.264/AVC-INTRA,” Proc. Indian Conference on Computer
Vision, Graphics and Image Processing, Dec. 2008.
[13] A. Zaghetto and R. L de Queiroz, “Improved layer processing for MRC
compression of scanned documents,” Proc. of IEEE Intl. Conference on
Image Processing, pp. 1993 – 1996, Nov. 2009.
[14] E. Walach and E. Karnin, “A fractal-based approach to image com-
pression,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal
Processing, vol. 11, pp. 529 – 532, Apr. 1986.
[15] S.M. Kocsis, “Fractal-based image compression,” in Proc. 23rd.
Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 177
–181, 1989.
[16] A. Wakatani, “Improvement of adaptive fractal image coding on GPUs,”
in Proc. IEEE Intl. Conf. on Consumer Electronics, pp. 255 –256, Jan.
2012.
[17] D.C. Garcia and R.L. de Queiroz, “Least-squares directional intra
prediction in h.264/avc,” IEEE Signal Processing Letters, vol. 17, no.
10, pp. 831–834, Oct. 2010.
[18] Y. Liu L. Liu and E. Delp, “Enhanced intra prediction using contex-
tadaptive linear prediction,” in Proc. of PCS 2007 - Picture Coding
Symp., Nov. 2007.
[19] C. Lan, J. Xu, F. Wu, and G. Shi, “Intra frame coding with template
matching prediction and adaptive transform,” in Proc. IEEE Intl. Conf.
Image Processing,, pp. 1221 –1224, Sep., 2010.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 6
(a) (b) (c) (d)
Fig. 7. Examples of documents used in our experiments: (a) class 0: multi-page text-only documents; (b) class 1: single-page text-only documents; (c) class
2: multi-page compound documents; and (d) class 3: single-page compound documents.
[20] E. B. Lima Filho, E. A. B. da Silva, M. B. de Carvalho, and F. S. Pinag´e,
“Universal image compression using multiscale recurrent patterns with
adaptive probability model,” IEEE Trans. on Image Processing, vol. 17,
no. 4, pp. 512–527, Apr. 2008.
[21] N. Francisco, N. Rodrigues, E. da Silva, M. de Carvalho, S. de Faria, and
V. da Silva, “Scanned compound document encoding using multiscale
recurrent patterns,” IEEE Trans. on Image Processing, vol. 19, no. 10,
pp. 2712–2724, Apr. 2010.
[22] JVT, “Advanced Video Coding for Generic Audiovisual Services. ITU-T
Recommendation H.264,” Nov. 2007.
[23] I. E. G. Richardson, H.264 and MPEG-4 video compression, Wiley,
EUA, 2003.
[24] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview
of the H.264/AVC video coding standard,” IEEE Trans. on Circuits and
Systems for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003.
[25] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan,
“Rate-constrained coder control and comparison of video coding stan-
dards,” IEEE Trans. on Circuits and Systems for Video Technology, vol.
13, no. 7, pp. 688–703, July 2003.
[26] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,
T. Stockhammer, and T. Wedi, “Video Coding with H.264/AVC: Tools,
Performance, and Complexity,” IEEE Circuits and Systems Magazine,
vol. 4, no. 1, pp. 7–28, Mar. 2004.
[27] G. J. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC Advanced
Video Coding Standard: Overview and Introduction to the Fidelity Range
Extensions,” Proc. of SPIE Conference on Applications of Digital Image
Processing XXVII, Special Session on Advances in the New Emerging
Standard: H.264/AVC, vol. 5558, pp. 53–74, Aug. 2004.
[28] N. Kamaci and Y. Altunbasak, “Performance comparison of the
emerging H.264 video coding standard with the existing standards,”
Proc. Intl. Conf. on Multimedia and Expo, vol. 1, pp. 345–348, July
2003.
[29] B. G. Haskell, A. Puri, and A. N. Netravalli, Digital Video: An
Introduction to MPEG-2, Chapman and Hall, EUA, 1997.
[30] ITU-T, “Video Coding for Low Bit Rate Communication. ITU-T
Recommendation H.263,” Version 1: Nov. 1995, Version 2: Jan. 1998,
Version 3: Nov. 2000.
[31] R. L. de Queiroz, R. S. Ortis, A. Zaghetto, and T. A. Fonseca, “Fringe
benefits of the H.264/AVC,” Proc. of International Telecommunications
Symposium, pp. 208–212, Sep. 2006.
[32] D. Marpe, V. George, and T. Wiegand, “Performance comparison of
intra-only H.264/AVC and JPEG2000 for a set of monochrome ISO/IEC
test images,” Contribution JVT ISO/IEC MPEG and ITU-T VCEG, Doc.
JVT M-014, Oct. 2004.
[33] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary
arithmetic coding in the H.264/AVC video compression standard,” IEEE
Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp.
620 – 636, July 2003.
[34] A. Al, B. P. Rao, S. S. Kudva, S. Babu, D. Sumam, and A. V.
Rao, “Quality and complexity comparison of H.264 intra mode with
JPEG2000 and JPEG,” Proc. of IEEE International Conference on Image
Processing, vol. 1, pp. 24–27, Oct. 2004.
[35] A. Zaghetto and R. L de Queiroz, “High-quality scanned book compres-
sion using pattern matching,” Proc. of IEEE International Conference
on Image Processing, pp. 26 – 29, Sep. 2010.
[36] K. Ugur et. al. , “High performance, low complexity video coding and
the emerging HEVC standard,” IEEE Trans. on Circuits and Systems
for Video Technology, vol. 20, no. 12, pp. 1688–1697, Dec. 2010.
[37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: From error visibility to structural similarity,” IEEE
Transactios on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
[38] M.H. Pinson, and S. Wolf, “A new standardized method for objectively
measuring video quality,” IEEE Transactions on Broadcasting, vol.50,
no.3, pp. 312- 322, Sept. 2004.
[39] G. Bjontegaard, “Calculation of average PSNR differences between
RD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX,
Apr. 2001.
Alexandre Zaghetto received the Engineer degree
in 2002, from the Federal University of Rio de
Janeiro, Rio de Janeiro, Brazil, the M.Sc. degree
in 2004, from the University of Brasilia, Brasilia,
Brazil, and and the Ph.D. degree in 2009 also from
the University of Brasilia, all in Electrical Engi-
neering. In 2009, he became Associate Professor at
the Computer Science Department at University of
Brasilia. His main research interests are in image and
video processing, compound document coding, bio-
metrics, fuzzy logic and artificial neural networks.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 7
0.2 0.3 0.4 0.5 0.6 0.7 0.8
25
30
35
40
45
Bitrate(bpp)
PSNR(dB) Comparison between coders: "guita"
ADC − 16 frames/page
ADC − 4 frames/page
ADC − 1 frame/page
AVC−I
JPEG2000
(a)
0.2 0.4 0.6 0.8
25
30
35
40
Bitrate(bpp)
PSNR(dB)
Comparison between coders: page 2 of "guita"
ADC − 16 frames/page
ADC − 4 frames/page
AVC−I
JPEG2000
(b)
0.3 0.4 0.5 0.6 0.7 0.8
30
35
40
Bitrate(bpp)
PSNR(dB)
Comparison between coders: "paper"
ADC − 16 frames/page
ADC − 4 frames/page
AVC−I
JPEG2000
(c)
0.4 0.6 0.8 1
28
30
32
34
36
38
40
42
Bitrate(bpp)
PSNR(dB)
Comparison between coders: "carta"
ADC − 16 frames/page
ADC − 4 frames/page
AVC−I
JPEG2000
(d)
Fig. 8. Examples of PSNR plots for: (a) class 0 (multi-page, text-only), document “guita” (number of pages: 2, size: 1568 × 1024 pixels); (b) class 1
(single-page, text-only), page 2 of document “guita”(1568 × 1024 pixels); (c) class 2 (multi-page, compound), document “paper” (number of pages: 4, size:
2304 × 1632 pixels); and (d) class 3 (single-page, compound), document “carta” (2152 × 1632 pixels). Search range and number of reference frames are
Sr = 32 and Rf = 5, respectively.
Ricardo L. de Queiroz received the Engineer de-
gree from Universidade de Brasilia , Brazil, in 1987,
the M.Sc. degree from Universidade Estadual de
Campinas, Brazil, in 1990, and the Ph.D. degree
from The University of Texas at Arlington , in 1994,
all in Electrical Engineering.
In 1990-1991, he was with the DSP research
group at Universidade de Brasilia, as a research
associate. He joined Xerox Corp. in 1994, where
he was a member of the research staff until 2002.
In 2000-2001 he was also an Adjunct Faculty at
the Rochester Institute of Technology. He joined the Electrical Engineering
Department at Universidade de Brasilia in 2003. In 2010, he became a Full
Professor at the Computer Science Department at Universidade de Brasilia.
Dr. de Queiroz has published over 150 articles in Journals and conferences
and contributed chapters to books as well. He also holds 46 issued patents.
He is an elected member of the IEEE Signal Processing Society’s Multimedia
Signal Processing (MMSP) Technical Committee and a former member of the
Image, Video and Multidimensional Signal Processing (IVMSP) Technical
Committee. He is a past editor for the EURASIP Journal on Image and
Video Processing, IEEE Signal Processing Letters, IEEE Transactions on
Image Processing, and IEEE Transactions on Circuits and Systems for Video
Technology. He has been appointed an IEEE Signal Processing Society
Distinguished Lecturer for the 2011-2012 term.
Dr. de Queiroz has been actively involved with the Rochester chapter of
the IEEE Signal Processing Society, where he served as Chair and organized
the Western New York Image Processing Workshop since its inception until
2001. He is now helping organizing IEEE SPS Chapters in Brazil and
just founded the Brasilia IEEE SPS Chapter. He was the General Chair
of ISCAS’2011, and MMSP’2009, and is the General Chair of SBrT’2012.
He was also part of the organizing committee of ICIP’2002. His research
interests include image and video compression, multirate signal processing,
and color imaging. Dr. de Queiroz is a Senior Member of IEEE, a member
of the Brazilian Telecommunications Society and of the Brazilian Society of
Television Engineers.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 8
0.2 0.4 0.6 0.8
26
28
30
32
34
36
38
40
42
Bitrate(bpp)
AveragePSNR(dB)
Average over class 0
ADC − 16 frames/page
AVC−I
JPEG2000
(a)
0.4 0.6 0.8 1
26
28
30
32
34
36
38
40
42
Bitrate(bpp)
AveragePSNR(dB)
Average over class 1
ADC − 16 frames/page
AVC−I
JPEG2000
(b)
0.4 0.6 0.8 1 1.2
26
28
30
32
34
36
38
40
Bitrate(bpp)
AveragePSNR(dB)
Average over class 2
ADC − 16 frames/page
AVC−I
JPEG2000
(c)
0.4 0.6 0.8 1 1.2 1.4 1.6
28
30
32
34
36
38
40
Bitrate(bpp)
AveragePSNR(dB)
Average over class 3
ADC − 16 frames/page
AVC−I
JPEG2000
(c)
Fig. 9. Comparison of ADC against JPEG2000 and AVC-I in terms of PSNR averaged for documents in: (a) class 0; (b) class 1; (c) class 2; and (d) class 3.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 9
(a) (b)
(c) (d)
Fig. 10. Subjective comparison among coders: (a) zoomed part of “cerrado” sequence; reconstructed versions using (b) AVC-I, (c) JPEG2000 and (d) ADC,
at approximately 0.25 bits/pixels. ADC yields superior subjective quality.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

Más contenido relacionado

La actualidad más candente

Di2644594461
Di2644594461Di2644594461
Di2644594461IJMER
 
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHY
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYEMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHY
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYJournal For Research
 
Improved block based segmentation for jpeg compressed document images
Improved block based segmentation for jpeg compressed document imagesImproved block based segmentation for jpeg compressed document images
Improved block based segmentation for jpeg compressed document imageseSAT Journals
 
Selection of intra prediction modes for intra frame
Selection of intra prediction modes for intra frameSelection of intra prediction modes for intra frame
Selection of intra prediction modes for intra frameeSAT Publishing House
 
Selection of intra prediction modes for intra frame coding in advanced video ...
Selection of intra prediction modes for intra frame coding in advanced video ...Selection of intra prediction modes for intra frame coding in advanced video ...
Selection of intra prediction modes for intra frame coding in advanced video ...eSAT Journals
 
Optimization of image compression and ciphering based on EZW techniques
Optimization of image compression and ciphering based on EZW techniquesOptimization of image compression and ciphering based on EZW techniques
Optimization of image compression and ciphering based on EZW techniquesTELKOMNIKA JOURNAL
 
Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...
Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...
Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...IJCSIS Research Publications
 
Improved block based segmentation for jpeg
Improved block based segmentation for jpegImproved block based segmentation for jpeg
Improved block based segmentation for jpegeSAT Publishing House
 
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...CSCJournals
 
Secured Data Transmission Using Video Steganographic Scheme
Secured Data Transmission Using Video Steganographic SchemeSecured Data Transmission Using Video Steganographic Scheme
Secured Data Transmission Using Video Steganographic SchemeIJERA Editor
 
SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSION
SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSIONSECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSION
SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSIONsipij
 
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...TELKOMNIKA JOURNAL
 
Uncompressed Image Steganography using BPCS: Survey and Analysis
Uncompressed Image Steganography using BPCS: Survey and AnalysisUncompressed Image Steganography using BPCS: Survey and Analysis
Uncompressed Image Steganography using BPCS: Survey and AnalysisIOSR Journals
 
Color image steganography in YCbCr space
Color image steganography in YCbCr spaceColor image steganography in YCbCr space
Color image steganography in YCbCr spaceIJECEIAES
 
Data Steganography for Optical Color Image Cryptosystems
Data Steganography for Optical Color Image CryptosystemsData Steganography for Optical Color Image Cryptosystems
Data Steganography for Optical Color Image CryptosystemsCSCJournals
 
Selective image encryption using
Selective image encryption usingSelective image encryption using
Selective image encryption usingcsandit
 

La actualidad más candente (18)

Di2644594461
Di2644594461Di2644594461
Di2644594461
 
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHY
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYEMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHY
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHY
 
Improved block based segmentation for jpeg compressed document images
Improved block based segmentation for jpeg compressed document imagesImproved block based segmentation for jpeg compressed document images
Improved block based segmentation for jpeg compressed document images
 
Selection of intra prediction modes for intra frame
Selection of intra prediction modes for intra frameSelection of intra prediction modes for intra frame
Selection of intra prediction modes for intra frame
 
Selection of intra prediction modes for intra frame coding in advanced video ...
Selection of intra prediction modes for intra frame coding in advanced video ...Selection of intra prediction modes for intra frame coding in advanced video ...
Selection of intra prediction modes for intra frame coding in advanced video ...
 
Gg3311121115
Gg3311121115Gg3311121115
Gg3311121115
 
Optimization of image compression and ciphering based on EZW techniques
Optimization of image compression and ciphering based on EZW techniquesOptimization of image compression and ciphering based on EZW techniques
Optimization of image compression and ciphering based on EZW techniques
 
Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...
Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...
Non-Separable Histogram Based Reversible Data Hiding Approach Using Inverse S...
 
Improved block based segmentation for jpeg
Improved block based segmentation for jpegImproved block based segmentation for jpeg
Improved block based segmentation for jpeg
 
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
 
Secured Data Transmission Using Video Steganographic Scheme
Secured Data Transmission Using Video Steganographic SchemeSecured Data Transmission Using Video Steganographic Scheme
Secured Data Transmission Using Video Steganographic Scheme
 
SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSION
SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSIONSECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSION
SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSION
 
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
 
Ay32333339
Ay32333339Ay32333339
Ay32333339
 
Uncompressed Image Steganography using BPCS: Survey and Analysis
Uncompressed Image Steganography using BPCS: Survey and AnalysisUncompressed Image Steganography using BPCS: Survey and Analysis
Uncompressed Image Steganography using BPCS: Survey and Analysis
 
Color image steganography in YCbCr space
Color image steganography in YCbCr spaceColor image steganography in YCbCr space
Color image steganography in YCbCr space
 
Data Steganography for Optical Color Image Cryptosystems
Data Steganography for Optical Color Image CryptosystemsData Steganography for Optical Color Image Cryptosystems
Data Steganography for Optical Color Image Cryptosystems
 
Selective image encryption using
Selective image encryption usingSelective image encryption using
Selective image encryption using
 

Destacado

Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...Muthu Samy
 
Secret key extraction from wireless signal strength in real environments
Secret key extraction from wireless signal strength in real environmentsSecret key extraction from wireless signal strength in real environments
Secret key extraction from wireless signal strength in real environmentsMuthu Samy
 
Achieving data privacy through secrecy views and null based virtual upadates
Achieving data privacy through secrecy views and null based virtual upadatesAchieving data privacy through secrecy views and null based virtual upadates
Achieving data privacy through secrecy views and null based virtual upadatesMuthu Samy
 
Saigon Boat Adventure: Modern City Life to Little City Farm
Saigon Boat Adventure: Modern City Life to Little City FarmSaigon Boat Adventure: Modern City Life to Little City Farm
Saigon Boat Adventure: Modern City Life to Little City FarmTung Thanh
 
Nymble blocking misbehaviouring users in anonymizing networks
Nymble blocking misbehaviouring users in anonymizing networksNymble blocking misbehaviouring users in anonymizing networks
Nymble blocking misbehaviouring users in anonymizing networksMuthu Samy
 
Naresh Jogula CV Updated
Naresh Jogula CV UpdatedNaresh Jogula CV Updated
Naresh Jogula CV UpdatedRadadia
 
Privacy firewalloptimization infocom2011
Privacy firewalloptimization infocom2011Privacy firewalloptimization infocom2011
Privacy firewalloptimization infocom2011Muthu Samy
 
Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013
Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013
Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013syafiun
 
Kertas kerja kelana_prihatin_2013
Kertas kerja kelana_prihatin_2013Kertas kerja kelana_prihatin_2013
Kertas kerja kelana_prihatin_2013syafiun
 
Mobile data offloading
Mobile data offloadingMobile data offloading
Mobile data offloadingMuthu Samy
 
Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...Muthu Samy
 

Destacado (11)

Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...
 
Secret key extraction from wireless signal strength in real environments
Secret key extraction from wireless signal strength in real environmentsSecret key extraction from wireless signal strength in real environments
Secret key extraction from wireless signal strength in real environments
 
Achieving data privacy through secrecy views and null based virtual upadates
Achieving data privacy through secrecy views and null based virtual upadatesAchieving data privacy through secrecy views and null based virtual upadates
Achieving data privacy through secrecy views and null based virtual upadates
 
Saigon Boat Adventure: Modern City Life to Little City Farm
Saigon Boat Adventure: Modern City Life to Little City FarmSaigon Boat Adventure: Modern City Life to Little City Farm
Saigon Boat Adventure: Modern City Life to Little City Farm
 
Nymble blocking misbehaviouring users in anonymizing networks
Nymble blocking misbehaviouring users in anonymizing networksNymble blocking misbehaviouring users in anonymizing networks
Nymble blocking misbehaviouring users in anonymizing networks
 
Naresh Jogula CV Updated
Naresh Jogula CV UpdatedNaresh Jogula CV Updated
Naresh Jogula CV Updated
 
Privacy firewalloptimization infocom2011
Privacy firewalloptimization infocom2011Privacy firewalloptimization infocom2011
Privacy firewalloptimization infocom2011
 
Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013
Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013
Minit mesyuarat khas_program_majlis_bersama_anak_yatim_2013
 
Kertas kerja kelana_prihatin_2013
Kertas kerja kelana_prihatin_2013Kertas kerja kelana_prihatin_2013
Kertas kerja kelana_prihatin_2013
 
Mobile data offloading
Mobile data offloadingMobile data offloading
Mobile data offloading
 
Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...Harnessing the cloud for securely outsourcing large scale systems of linear e...
Harnessing the cloud for securely outsourcing large scale systems of linear e...
 

Similar a Scanned document compression using block based hybrid video codec

Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...DR.P.S.JAGADEESH KUMAR
 
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCEfficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCIDES Editor
 
Machine learning-based energy consumption modeling and comparing of H.264 and...
Machine learning-based energy consumption modeling and comparing of H.264 and...Machine learning-based energy consumption modeling and comparing of H.264 and...
Machine learning-based energy consumption modeling and comparing of H.264 and...IJECEIAES
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
 
Efficient document compression using intra frame prediction tecthnique
Efficient document compression using intra frame prediction tecthniqueEfficient document compression using intra frame prediction tecthnique
Efficient document compression using intra frame prediction tecthniqueeSAT Publishing House
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...IJERA Editor
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...IJERA Editor
 
ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS
ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS
ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS ijma
 
Patch-Based Image Learned Codec using Overlapping
Patch-Based Image Learned Codec using OverlappingPatch-Based Image Learned Codec using Overlapping
Patch-Based Image Learned Codec using Overlappingsipij
 
Error resilient for multiview video transmissions with gop analysis
Error resilient for multiview video transmissions with gop analysisError resilient for multiview video transmissions with gop analysis
Error resilient for multiview video transmissions with gop analysisijma
 
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...csandit
 
Algorithm and architecture design of the h.265 hevc intra encoder
Algorithm and architecture design of the h.265 hevc intra encoderAlgorithm and architecture design of the h.265 hevc intra encoder
Algorithm and architecture design of the h.265 hevc intra encoderjpstudcorner
 
Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...IJECEIAES
 
Comparison of ezw and h.264 2
Comparison of ezw and h.264 2Comparison of ezw and h.264 2
Comparison of ezw and h.264 2IAEME Publication
 
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...IJERA Editor
 

Similar a Scanned document compression using block based hybrid video codec (20)

A04840107
A04840107A04840107
A04840107
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
 
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCEfficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
 
Machine learning-based energy consumption modeling and comparing of H.264 and...
Machine learning-based energy consumption modeling and comparing of H.264 and...Machine learning-based energy consumption modeling and comparing of H.264 and...
Machine learning-based energy consumption modeling and comparing of H.264 and...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
 
Efficient document compression using intra frame prediction tecthnique
Efficient document compression using intra frame prediction tecthniqueEfficient document compression using intra frame prediction tecthnique
Efficient document compression using intra frame prediction tecthnique
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
 
ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS
ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS
ERROR RESILIENT FOR MULTIVIEW VIDEO TRANSMISSIONS WITH GOP ANALYSIS
 
Patch-Based Image Learned Codec using Overlapping
Patch-Based Image Learned Codec using OverlappingPatch-Based Image Learned Codec using Overlapping
Patch-Based Image Learned Codec using Overlapping
 
Error resilient for multiview video transmissions with gop analysis
Error resilient for multiview video transmissions with gop analysisError resilient for multiview video transmissions with gop analysis
Error resilient for multiview video transmissions with gop analysis
 
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
 
Algorithm and architecture design of the h.265 hevc intra encoder
Algorithm and architecture design of the h.265 hevc intra encoderAlgorithm and architecture design of the h.265 hevc intra encoder
Algorithm and architecture design of the h.265 hevc intra encoder
 
A0540106
A0540106A0540106
A0540106
 
Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...
 
Comparison of ezw and h.264 2
Comparison of ezw and h.264 2Comparison of ezw and h.264 2
Comparison of ezw and h.264 2
 
C0161018
C0161018C0161018
C0161018
 
C0161018
C0161018C0161018
C0161018
 
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
 

Último

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...liera silvan
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 

Último (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
EmpTech Lesson 18 - ICT Project for Website Traffic Statistics and Performanc...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 

Scanned document compression using block based hybrid video codec

  • 1. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1 Scanned Document Compression Using a Block-based Hybrid Video Codec Alexandre Zaghetto, Member, IEEE, and Ricardo L. de Queiroz, Senior Member, IEEE Abstract—This paper proposes a hybrid pattern matching/transform-based compression method for scanned documents. The idea is to use regular video interframe prediction as a pattern matching algorithm that can be applied to document coding. We show that this interpretation may generate residual data that can be efficiently compressed by a transform-based encoder. The efficiency of this approach is demonstrated using H.264/AVC as a high quality single- and multi-page document compressor. The proposed method, called Advanced Document Coding (ADC), uses segments of the originally independent scanned pages of a document to create a video sequence, which is then encoded through regular H.264/AVC. The encoding performance is unrivaled. Results show that ADC outperforms AVC-I (H.264/AVC operating in pure intra mode) and JPEG2000 by up to 2.7 dB and 6.2 dB, respectively. Superior subjective quality is also achieved. Index Terms—Scanned document compression, advanced doc- ument coding, pattern matching, H.264/AVC. I. INTRODUCTION COMPRESSION of scanned documents can be tricky. The scanned document is either compressed as a continuous- tone picture, or it is binarized before compression. The binary document can then be compressed using any available two- level lossless compression algorithm (such as JBIG [1] and JBIG2 [2]), or it may undergo character recognition [3]. Binarization may cause strong degradation to object contours and textures, such that, whenever possible, continuous-tone compression is preferred [4]. In single/multi-page document compression, each page may be individually encoded by some continuous-tone image compression algorithm, such as JPEG [5] or JPEG2000 [6], [7]. Multi-layer approaches such as the mixed raster content (MRC) imaging model [8]–[12] are also challenged by soft edges in scanned documents, often requiring pre- and post-processing [13]. Natural text along a document typically presents repetitive symbols such that dictionary-based compression methods be- come very efficient. For continuous-tone imagery, the recur- rence of similar patterns is illustrated in Fig. 1. Nevertheless, an efficient dictionary-based encoder relying on continuous- tone pattern matching is not that trivial. We propose an encoder that explores such a recurrence through the use of pattern- matching predictors and efficient transform encoding of the residual data. Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. The authors are with the Department of Computer Science, Universidade de Brasilia, Brazil, e-mail: alexandre@cic.unb.br, queiroz@ieee.org. Fig. 1. Digitized books usually present recurrent patterns across different pages and across regions of the same page. It is important to place our proposal within the proper scenario. Three premises are assumed. Firstly, we want to avoid complex multi-coder schemes such as MRC. Secondly, the decoder should be as standard as possible. Since we are dealing with scanned compound documents (mixed pictures and text), natural image encoders, such as JPEG2000, are the most adequate. Non-standard encoders, based on fractals [14]– [16], texture prediction [17], [18], template matching [19] or multiscale pattern recurrence [20], [21], are good options out of the scope of what is being proposed. Thirdly, one should provide high quality reconstructed versions of scanned documents. This is especially important if rare books of historical value must be digitally stored, thus discarding optical character recognition (OCR) and token-based methods [2], [10]. In summary, we want a standard single coder approach that operates on natural images and delivers high-quality reconstructed compound documents. The proposed coder makes heavy use of the H.264/AVC standard video coder [22]. H.264/AVC has been well explained in the literature [23]–[27]. H.264/AVC leads to substantial performance improvement when compared to other existing standards [25], [28], such as MPEG-2 [29] and H.263 [30]. Among such improvements we can mention [22], [31]: in- terframe variable block size prediction; arbitrary reference frames; quarter-pel motion estimation; intraframe macroblock prediction; context-adaptive binary arithmetic coding; and in- loop deblocking filter. Results point to at least a factor of two improvement over previous standards. The many cod- ing advances brought into H.264/AVC not only set a new benchmark for video compression, but they also make it a This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 2. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 2 formidable compressor for still images [32]. The intraframe macroblock prediction, combined with the context-adaptive binary arithmetic coding (CABAC) [33] turns the H.264/AVC into a powerful still image compression method (i.e. working on a video sequence composed of one single frame). We refer to this coder as AVC-I. Gains of the AVC-I over JPEG2000 are typically in the order of 0.25 dB to 0.5 dB in PSNR for pictorial images [31], [32], [34]. For compound images (mixture of text and picture) [4], the PSNR gains are more substantial, even surpassing the mark of 3 dB improvement over JPEG2000, in some cases [31]. The hypothesis presented is this paper is that a scanned document encoder that employs state-of-the-art video coding techniques and generates an H.264/AVC-decodable bit-stream yields the best rate-distortion performance compared to other continuous-tone still image compressors. 1 . II. THE PROPOSED METHOD AND ITS IMPLEMENTATION USING AVC The proposed document coder has a generic concept and an implementation based on a stock H.264/AVC video coder. We now describe the desired features and how one can implement them using AVC. The generic description may help the reader to adapt other video coders for that purpose or to develop non-standard-based (proprietary) variations. A. Block-based pattern matching The encoder is based on pattern matching. The document image is segmented into blocks of pixels. Each block is matched to an existing pattern in a dictionary which is popu- lated by the previous contents of the same document. In order to do that, we partition the scanned document, which may be made of one or more scanned pages of H ×W pixels, into Np (H/ Np × W/ Np pixels) sub-pages or frames. Hence, a scanned book may be decomposed into many frames. Figure 2 illustrates the page pre-processing (partition) algorithm, while Fig. 3 shows an example of a frame sequence built from a 3-pages set. Blocks have for example 16×16 pixels and each one is matched to an existing pattern in a previous frame. In this way, the previous frames make a dynamic dictionary of patterns to look when encoding the present frame, which is continuously being updated as more frames are encoded. Once a match is found, the matching pattern is used to predict the block and the prediction error (residue) is encoded along with the frame number and position where the match was found (reference vector). A block can be partitioned into smaller blocks to ease prediction at the cost of spending more bits to encode reference vectors. Figure 4 illustrates the effect of using the pattern matching prediction algorithm. Figures 4 (a) and (b) show examples of a reference and a current text area, respectively. Figures 4 (c), (e) and (g) represent the predictions of the current text using 16×16, 8×8 and 4×4-pixel block partitions. Figures 4 (d), (f) and (h) are the corresponding residual data. 1Preliminary results of the proposed method over multi-page text-only documents have been presented at a conference [35]. 0 1 2 3 4 5 6 7 8 9 10 11 Fig. 2. A document page is partitioned into segments (labeled in sequence). Each one is considered a frame and can be sequentially encoded. Fig. 3. Example of a frame sequence, built from a 3 pages set, Np = 4 frames/page. Frames 1 to 4, 5 to 8, and 9 to 12 are built from pages 1, 2 and 3, respectively. Notice that the 4×4-pixel prediction generates a lower-energy residual, when compared to the 16 × 16 and 8 × 8 prediction, however, they require encoding more reference vectors. In this context, video coders often use motion estimation techniques which are essentially the same as pattern matching. The H.264/AVC is capable of partitioning macroblocks of 16× 16 pixels into any valid combination of blocks of 16 × 8, 8 × 16, 8 × 8, 4 × 8, 8 × 4, and 4 × 4 pixels. Our algorithm is then to feed the document frames as video frames into AVC since its motion estimation algorithm will take care of the pattern matching search for us. However, motion estimation algorithms always take advantage of the fact that video content at the same frame position in neighbor frames are typically very correlated. Since this is not our case, it is advisable to make the search window to cover as much as possible of the reference frames, or the whole frame, in order to enrich the dictionary and to remove the spatial dependency. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 3. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 3 (a) (b) (c) (d) (e) (f) (g) 2 (h) Fig. 4. Illustration of approximate pattern matching using interframe prediction: (a) reference text; (b) current text; (c), (e) and (g) predicted text (block size: 16×16, 8×8 and 4×4 pixels, respectively); and (d), (f) and (h) prediction residue (block size: 16 × 16, 8 × 8 and 4 × 4 pixels, respectively). Each zoomed image patch has 178× 178 pixels. B. Inter- and intra-frame prediction AVC also allows for intra-frame prediction, in which a block (partitioned or not) can be predicted from neighboring blocks by means of directional extrapolation of the border pixels. The decision to use or not intra-frame prediction is typically based on rate-distortion optimization (RDO) and we use RDO in all our simulations. However, AVC does not allow for in-frame motion vectors (IFMV), but many variations using such a feature and other sophisticated methods of intra-frame prediction do exist [19]. Apparently, HEVC will also support IFMV [36]. Breaking up the pages into frames allows for some intra-document prediction similar to IFMV, yet using a stock video coder. Furthermore, the information derived from IFMV is typically very small compared to all compressed data, such that the advantage should not be much relevant. Because of that, we do not use IFMV. Another issue is the random access to different book pages. In order to get to a book page, we are forced to decompress all the frames it uses as reference. So, if random access is an issue, we suggest to periodically use no-reference frames, i.e. frames in which inter-frame prediction is not allowed, relying on pure intra-frame prediction/extrapolation. In our encoder, using the AVC structure, each block- partitioning combination and prediction mode is tested and the best one is picked through RDO. With RDO within AVC, in the k-th configuration test in a macroblock, AVC computes the rate Rk (bits spent to encode the block) and distortion Dk (sum of absolute differences - SAD) achieved by reconstructing the block. One picks the block partition method that minimizes Jk = Rk + λDk. The process is then repeated for every macroblock. As usual, λ controls compression ratios and is varied to find the RD curves in our simulations. C. Residual coding The residual macroblock, i.e. the prediction error, is trans- formed using 4 8×8-pixel discrete cosine transform (DCT) or an integer approximation of it. The transformed blocks are quantized and encoded using arithmetic coding. H.264/AVC uses an integer transform with similar properties as the DCT and the resulting transformed coefficients are quantized and entropy encoded using CABAC. D. Compound documents and region classification Compound document compression usually segments the image into regions and classifies each one as containing text and graphics or images (or halftones, for instance). Once a region is classified, it can be encoded using a proper algorithm. This approach is driven by objects such as text characters so that regions of the image are labeled based on our estimate of its contents. Our method, however, is driven by the compression itself. Rather than only testing pattern- matching-based prediction for every block partition, we also test prediction by extrapolating neighboring blocks, as in ”intra-prediction´´ in H.264/AVC. The RD-optimized selection of the best prediction assures that the best option is picked. Text and graphics shall contain recurrent patterns and will be often encoded using patterns from previous regions, while pictorial regions may resort to intra-prediction. In this sense, segmentation is embedded into the encoding process. In fact, the block prediction and RDO may have the same effect of This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 4. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 4 Fig. 5. Configuration parameters that have greater influence on the encoder performance: Rf (number of reference frames) and Sr (search range). a segmentation map, even though benefiting the compression process, and not the true identification of image contents. E. Encoder and decoder summary In our concept, the frames are fed into AVC in sequence just like in a regular video coder. Because of that relation to AVC, we refer to the proposed method as advanced document coding, or, simply, ADC. In a nutshell, ADC operation can be summarized as (i) break the book pages into frames; (ii) feed all frames to H.264/AVC resulting in an AVC-compatible stream; (iii) decode the bit stream; and (iv) assemble the decoded frames into the final document book pages. In order to work well, H.246/AVC should operate in ”High” profile, following an IPPP... framework. The encoder should periodically use no-reference I-frames in the case random access is desired. RDO should be turned on. Motion estimation should be set to full search over a window that is as large as possible. Note that other video coders such as HEVC and MPEG-2 will also work, even though achieving different performance levels due to their different sophistication levels. III. EXPERIMENTAL RESULTS In our tests, different page sets are compressed using JPEG2000, AVC-I (H.264/AVC operating in pure intra mode) and the proposed ADC. The reason we chose JPEG2000 and AVC-I for comparison is that these are the most suitable standards that would meet the three premises presented in Section I. Distortion metrics based on visual models such as Structural Similarity (SSIM) [37] and Video Quality Metric (VQM) [38] have been extensively tested for pictorial content. However, they are unproven for text and graphics, which rely more on resolution than on number of gray levels. Readability is very important and some alternative metrics such as OCR efficiency are considered. A good objective metric to reflect subjective perception of text has not been well explored yet. Hence, we opted to stick to the traditional PSNR as a distortion metric. In JPEG2000 and AVC-I compression, the pages are sepa- rately encoded. As for ADC, the first frame of the sequence is encoded as an I-frame (only intraframe prediction modes are used) and all the remaining frames are encoded as P- frames (in addition to intraframe prediction, only past frames 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 25 30 35 40 45 Bitrate (bpp) PSNR(dB) Sequence "guita" (Np = 4, Sr = 32 and Rf = 1, 3, 5) ADC (S r = 32, R f = 5) ADC (S r = 32, R f = 3) ADC (Sr = 32, Rf = 1) AVC−I JPEG2000 (a) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 25 30 35 40 45 Bitrate (bpp) PSNR(dB) Sequence "guita" (Np = 4, Sr = 08, 16, 32 and Rf = 5) ADC (S r = 32, R f = 5) ADC (S r = 16, R f = 5) ADC (Sr = 08, Rf = 5) AVC−I JPEG2000 (b) Fig. 6. Comparison between JPEG2000, AVC-I and the proposed ADC, for different combinations of search ranges (Sr) and number of reference frames (Rf ) for test sequence “guita”. The number of frames/page (Np) is 4. are used as reference by the interframe prediction). We also considered that each page may be segmented into Np = 4 frames, Np = 16 frames, or not segmented at all (Np = 1, for multi-page documents only). Two configuration parameters have greater influence on the encoder performance. One is the number of reference frames, Rf , the other is the search range, Sr, as illustrated in Fig. 5. Initially, we evaluated the effect of choosing different values for Sr and Rf . Figure 6 shows PSNR plots comparing JPEG2000, AVC-I and ADC (Np = 4 frames/page), for different combinations of Sr and Rf . The PSNR was calculated using the global mean square error (MSE). The higher the Sr and Rf values, the better the rate-distortion performance. In particular, for Sr = 32 pixels and Rf = 5 frames, ADC outperforms AVC-I by more than 2 dB and JPEG2000 by more than 5 dB, at 0.5 bit/pixel (bpp). Our test set is composed by 18 documents divided into the This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 5. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 5 TABLE I AVERAGE OBJECTIVE (PSNR IN DB) IMPROVEMENT OVER EXISTING STANDARDS FOR THE 4 DOCUMENT TEST SETS. Document Set 0 1 2 3 JPEG2000 6.26 5.61 4.47 2.41 AVC-I 2.75 2.58 1.56 0.89 following 4 classes2 : • Class 0: 6 multi-page text-only documents; • Class 1: 6 single-page text-only documents; • Class 2: 3 multi-page compound documents; and • Class 3: 3 single-page compound documents. Figure 7 illustrates one example of each class. Examples of PSNR plots are shown in Figs. 8. Figure 9 shows average PSNR improvement of ADC over JPEG2000 and AVC-I for each of the document class test sets. In all cases, JPEG2000 and AVC-I are objectively outperformed, considering Sr = 32, Rf = 5 and Np = 16. Figure 10 (a) shows a zoomed part of the original “cerrado” sequence. Its encoded and reconstructed versions using AVC-I, JPEG2000 and ADC, at approximately 0.25 bits/pixel, are shown in Figs. 10 (b), (c) and (d), respectively. ADC also yields superior subjective quality. As a reference, in Table I we present the gains of the proposed method over AVC-I and JPEG2000 using Bjontegaard’s method [39] applied to the curves in Fig. 9. As one can see from the table and from the graphics, the gains are very substantial. Our software-based tests using the popular and efficient x246 implementation of the H.264/AVC standard and the Kakadu implementation of JPEG 2000, indicate that ADC (AVC running with 5 reference frames, slowest search, 32×32 window, in IPPPP... mode) is near 10× slower than JPEG 2000. x264 in an Intel Core i7 platform has been shown to encode RGB video at a rate of 3 to 30 million pixels per second (Mps). The variation is due to the various x264 settings that affect speed and quality. Of course, in order to do that, the system was be dedicated to the task, as an appliance. A scanned letter-sized (8.5× 11 in) page at 600 pixels per inch (ppi) yields about 33 million pixels. Hence, we can expect a page compression speed roughly in the order of 5 to 50 pages per minute (ppm). This page rate may be acceptable for many on-the-fly applications and is definitely reasonable for off-line compression of books and such. A rigorous complexity study of the encoding algorithms presented here is beyond the scope of this paper. IV. CONCLUSIONS In this paper, we presented a pattern matching/transform- based encoder for scanned documents named ADC. The reason why we decided to use H.264/AVC tools to implement the proposed method is because its interframe prediction scheme allied with RDO yield an efficient pattern matching algorithm. 2The entire test set is available at http://image.unb.br/queiroz/testset In addition, the intraframe prediction, the DCT-based trans- form and the CABAC contribute to improve the encoding efficiency. In essence, our work can be summarized as splitting the document into many pages, forming frames, and feeding the frames to AVC. Despite the simplicity of the idea, the perfor- mance for scanned documents is unrivaled, to our knowledge. Results show that ADC objectively outperforms AVC-I and JPEG2000 by up to 2.7 dB and 6.2 dB, respectively, with more significant gains observed for multi-page text-only documents. Furthermore, the encoder outputs documents with superior subjective quality. Replacing H.264/AVC by HEVC in ADC would yield even larger gains. REFERENCES [1] JBIG, “Information Technology - Coded Representation of Picture and Audio Information - Progressive Bi-level Image Compression. ITU-T Recommendation T.82,” Mar. 1993. [2] JBIG2, “Information Technology - Coded Representation of Picture and Audio Information - Lossy/Lossless Coding of Bi-level Images. ITU-T Recommendation T.88,” Mar. 2000. [3] S. Mori, C.Y. Suen, and K.; Yamamoto, “Historical review of OCR research and development,” Proceedings of the IEEE, vol. 80, no. 7, pp. 1029–1058, Jul. 1992. [4] R. L. de Queiroz, Compressing Compound Documents, in the Document and Image Compression Handbook, by M. Barni, Marcel-Dekker, EUA, 2005. [5] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compres- sion Standard, Chapman and Hall, 1993. [6] JPEG, “Information Technology - JPEG2000 Image Coding System - Part 1: Core Coding System. ISO/IEC 15444-1,” 2000. [7] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression Fundamentals, Standards and Practice, Kluwer Academic, EUA, 2002. [8] MRC, “Mixed Raster Content (MRC). ITU-T Recommendation T.44,” 1999. [9] R. L. de Queiroz, R. Buckley, and M. Xu, “Mixed Raster Content (MRC) model for compound image compression,” Proc. of SPIE Visual Communications and Image Processing, vol. 3653, pp. 1106–1117, Jan. 1999. [10] P. Haffner, P. G. Howard, P. Simard, Y. Bengio, and Y. Lecun, “High quality document image compression with DjVu,” Journal of Electronic Imaging, vol. 7, pp. 410–425, 1998. [11] G. Feng and C. A. Bouman, “High-quality MRC document coding,” IEEE Trans. on Image Processing, vol. 15, no. 10, pp. 3152–3169, Oct. 2006. [12] A. Zaghetto, R. L de Queiroz, and D. Mukherjee, “MRC compression of compound documents using threshold segmentation, iterative data- filling and H.264/AVC-INTRA,” Proc. Indian Conference on Computer Vision, Graphics and Image Processing, Dec. 2008. [13] A. Zaghetto and R. L de Queiroz, “Improved layer processing for MRC compression of scanned documents,” Proc. of IEEE Intl. Conference on Image Processing, pp. 1993 – 1996, Nov. 2009. [14] E. Walach and E. Karnin, “A fractal-based approach to image com- pression,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, vol. 11, pp. 529 – 532, Apr. 1986. [15] S.M. Kocsis, “Fractal-based image compression,” in Proc. 23rd. Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 177 –181, 1989. [16] A. Wakatani, “Improvement of adaptive fractal image coding on GPUs,” in Proc. IEEE Intl. Conf. on Consumer Electronics, pp. 255 –256, Jan. 2012. [17] D.C. Garcia and R.L. de Queiroz, “Least-squares directional intra prediction in h.264/avc,” IEEE Signal Processing Letters, vol. 17, no. 10, pp. 831–834, Oct. 2010. [18] Y. Liu L. Liu and E. Delp, “Enhanced intra prediction using contex- tadaptive linear prediction,” in Proc. of PCS 2007 - Picture Coding Symp., Nov. 2007. [19] C. Lan, J. Xu, F. Wu, and G. Shi, “Intra frame coding with template matching prediction and adaptive transform,” in Proc. IEEE Intl. Conf. Image Processing,, pp. 1221 –1224, Sep., 2010. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 6. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 6 (a) (b) (c) (d) Fig. 7. Examples of documents used in our experiments: (a) class 0: multi-page text-only documents; (b) class 1: single-page text-only documents; (c) class 2: multi-page compound documents; and (d) class 3: single-page compound documents. [20] E. B. Lima Filho, E. A. B. da Silva, M. B. de Carvalho, and F. S. Pinag´e, “Universal image compression using multiscale recurrent patterns with adaptive probability model,” IEEE Trans. on Image Processing, vol. 17, no. 4, pp. 512–527, Apr. 2008. [21] N. Francisco, N. Rodrigues, E. da Silva, M. de Carvalho, S. de Faria, and V. da Silva, “Scanned compound document encoding using multiscale recurrent patterns,” IEEE Trans. on Image Processing, vol. 19, no. 10, pp. 2712–2724, Apr. 2010. [22] JVT, “Advanced Video Coding for Generic Audiovisual Services. ITU-T Recommendation H.264,” Nov. 2007. [23] I. E. G. Richardson, H.264 and MPEG-4 video compression, Wiley, EUA, 2003. [24] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003. [25] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding stan- dards,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 688–703, July 2003. [26] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video Coding with H.264/AVC: Tools, Performance, and Complexity,” IEEE Circuits and Systems Magazine, vol. 4, no. 1, pp. 7–28, Mar. 2004. [27] G. J. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions,” Proc. of SPIE Conference on Applications of Digital Image Processing XXVII, Special Session on Advances in the New Emerging Standard: H.264/AVC, vol. 5558, pp. 53–74, Aug. 2004. [28] N. Kamaci and Y. Altunbasak, “Performance comparison of the emerging H.264 video coding standard with the existing standards,” Proc. Intl. Conf. on Multimedia and Expo, vol. 1, pp. 345–348, July 2003. [29] B. G. Haskell, A. Puri, and A. N. Netravalli, Digital Video: An Introduction to MPEG-2, Chapman and Hall, EUA, 1997. [30] ITU-T, “Video Coding for Low Bit Rate Communication. ITU-T Recommendation H.263,” Version 1: Nov. 1995, Version 2: Jan. 1998, Version 3: Nov. 2000. [31] R. L. de Queiroz, R. S. Ortis, A. Zaghetto, and T. A. Fonseca, “Fringe benefits of the H.264/AVC,” Proc. of International Telecommunications Symposium, pp. 208–212, Sep. 2006. [32] D. Marpe, V. George, and T. Wiegand, “Performance comparison of intra-only H.264/AVC and JPEG2000 for a set of monochrome ISO/IEC test images,” Contribution JVT ISO/IEC MPEG and ITU-T VCEG, Doc. JVT M-014, Oct. 2004. [33] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620 – 636, July 2003. [34] A. Al, B. P. Rao, S. S. Kudva, S. Babu, D. Sumam, and A. V. Rao, “Quality and complexity comparison of H.264 intra mode with JPEG2000 and JPEG,” Proc. of IEEE International Conference on Image Processing, vol. 1, pp. 24–27, Oct. 2004. [35] A. Zaghetto and R. L de Queiroz, “High-quality scanned book compres- sion using pattern matching,” Proc. of IEEE International Conference on Image Processing, pp. 26 – 29, Sep. 2010. [36] K. Ugur et. al. , “High performance, low complexity video coding and the emerging HEVC standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1688–1697, Dec. 2010. [37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactios on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004. [38] M.H. Pinson, and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Transactions on Broadcasting, vol.50, no.3, pp. 312- 322, Sept. 2004. [39] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX, Apr. 2001. Alexandre Zaghetto received the Engineer degree in 2002, from the Federal University of Rio de Janeiro, Rio de Janeiro, Brazil, the M.Sc. degree in 2004, from the University of Brasilia, Brasilia, Brazil, and and the Ph.D. degree in 2009 also from the University of Brasilia, all in Electrical Engi- neering. In 2009, he became Associate Professor at the Computer Science Department at University of Brasilia. His main research interests are in image and video processing, compound document coding, bio- metrics, fuzzy logic and artificial neural networks. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 7. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 7 0.2 0.3 0.4 0.5 0.6 0.7 0.8 25 30 35 40 45 Bitrate(bpp) PSNR(dB) Comparison between coders: "guita" ADC − 16 frames/page ADC − 4 frames/page ADC − 1 frame/page AVC−I JPEG2000 (a) 0.2 0.4 0.6 0.8 25 30 35 40 Bitrate(bpp) PSNR(dB) Comparison between coders: page 2 of "guita" ADC − 16 frames/page ADC − 4 frames/page AVC−I JPEG2000 (b) 0.3 0.4 0.5 0.6 0.7 0.8 30 35 40 Bitrate(bpp) PSNR(dB) Comparison between coders: "paper" ADC − 16 frames/page ADC − 4 frames/page AVC−I JPEG2000 (c) 0.4 0.6 0.8 1 28 30 32 34 36 38 40 42 Bitrate(bpp) PSNR(dB) Comparison between coders: "carta" ADC − 16 frames/page ADC − 4 frames/page AVC−I JPEG2000 (d) Fig. 8. Examples of PSNR plots for: (a) class 0 (multi-page, text-only), document “guita” (number of pages: 2, size: 1568 × 1024 pixels); (b) class 1 (single-page, text-only), page 2 of document “guita”(1568 × 1024 pixels); (c) class 2 (multi-page, compound), document “paper” (number of pages: 4, size: 2304 × 1632 pixels); and (d) class 3 (single-page, compound), document “carta” (2152 × 1632 pixels). Search range and number of reference frames are Sr = 32 and Rf = 5, respectively. Ricardo L. de Queiroz received the Engineer de- gree from Universidade de Brasilia , Brazil, in 1987, the M.Sc. degree from Universidade Estadual de Campinas, Brazil, in 1990, and the Ph.D. degree from The University of Texas at Arlington , in 1994, all in Electrical Engineering. In 1990-1991, he was with the DSP research group at Universidade de Brasilia, as a research associate. He joined Xerox Corp. in 1994, where he was a member of the research staff until 2002. In 2000-2001 he was also an Adjunct Faculty at the Rochester Institute of Technology. He joined the Electrical Engineering Department at Universidade de Brasilia in 2003. In 2010, he became a Full Professor at the Computer Science Department at Universidade de Brasilia. Dr. de Queiroz has published over 150 articles in Journals and conferences and contributed chapters to books as well. He also holds 46 issued patents. He is an elected member of the IEEE Signal Processing Society’s Multimedia Signal Processing (MMSP) Technical Committee and a former member of the Image, Video and Multidimensional Signal Processing (IVMSP) Technical Committee. He is a past editor for the EURASIP Journal on Image and Video Processing, IEEE Signal Processing Letters, IEEE Transactions on Image Processing, and IEEE Transactions on Circuits and Systems for Video Technology. He has been appointed an IEEE Signal Processing Society Distinguished Lecturer for the 2011-2012 term. Dr. de Queiroz has been actively involved with the Rochester chapter of the IEEE Signal Processing Society, where he served as Chair and organized the Western New York Image Processing Workshop since its inception until 2001. He is now helping organizing IEEE SPS Chapters in Brazil and just founded the Brasilia IEEE SPS Chapter. He was the General Chair of ISCAS’2011, and MMSP’2009, and is the General Chair of SBrT’2012. He was also part of the organizing committee of ICIP’2002. His research interests include image and video compression, multirate signal processing, and color imaging. Dr. de Queiroz is a Senior Member of IEEE, a member of the Brazilian Telecommunications Society and of the Brazilian Society of Television Engineers. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 8. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 8 0.2 0.4 0.6 0.8 26 28 30 32 34 36 38 40 42 Bitrate(bpp) AveragePSNR(dB) Average over class 0 ADC − 16 frames/page AVC−I JPEG2000 (a) 0.4 0.6 0.8 1 26 28 30 32 34 36 38 40 42 Bitrate(bpp) AveragePSNR(dB) Average over class 1 ADC − 16 frames/page AVC−I JPEG2000 (b) 0.4 0.6 0.8 1 1.2 26 28 30 32 34 36 38 40 Bitrate(bpp) AveragePSNR(dB) Average over class 2 ADC − 16 frames/page AVC−I JPEG2000 (c) 0.4 0.6 0.8 1 1.2 1.4 1.6 28 30 32 34 36 38 40 Bitrate(bpp) AveragePSNR(dB) Average over class 3 ADC − 16 frames/page AVC−I JPEG2000 (c) Fig. 9. Comparison of ADC against JPEG2000 and AVC-I in terms of PSNR averaged for documents in: (a) class 0; (b) class 1; (c) class 2; and (d) class 3. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
  • 9. SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 9 (a) (b) (c) (d) Fig. 10. Subjective comparison among coders: (a) zoomed part of “cerrado” sequence; reconstructed versions using (b) AVC-I, (c) JPEG2000 and (d) ADC, at approximately 0.25 bits/pixels. ADC yields superior subjective quality. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641 Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.