Data compression, data security, and machine learning
1. Data Compression, Data Security,
and Machine Learning
Prof. Ja-Ling Wu
Dept. of Computer Science and Information Engineering
and
Graduate Institute of Networking and Multimedia
National Taiwan University
2. Data Science : Challenges and Directions
Prof. Longbing Cao, Communications ACM, Aug. 2017
Data Science =
{sta6s6cs ∩
informa6cs ∩
compu6ng ∩
communica6on ∩
sociology ∩
management |
data ∩ Domain ∩
thinking },
where “|” means
“condi6onal on.”
3. Computer Science
and
Information
Engineering
Data Science and Engineering
Data-Driven and Security-aware Information Processing
SecurityScienceandEngineering
Network
Security
Data
Security
DRM
and
Forensics
AI/ML
(DNN)
Algo/Architect
CloudComputing
and
MobileClients
Platform
Communication
Computing
and storage
Bandwidth
Others
(Law and
Regulations)
1
10
2 3
6 7
11 12
8
4
9
5
5. 這是一種 DNN model reduction (compression) 的方法:第一個stage 將原有model 中
weight 小於某個threshold 的connection 去除(減斷),並進行重新的training ( 以確保
error rate 沒有增加);第二個 stage 將 network 中每一層的weights 做分群並將各群的中
心(or 平均值)做為 code book 來表示每一層的 weights ( 此步驟 很像 vector
quantization ); 第三個 stage 則是依據 code book 中 code words 出現的機率大小,以
Huffman code 壓缩之。
大部分的 DNN 可壓個 20 倍,執行速度也比較快!(35x to 49x compression ratio was
reported in literature, as expected, this approach is very time and computing
resources consuming in the training phase)。
7. An illustration of our paradigm for using Compression to
accelerate Machine Learning algorithms.
8. • Intuitively, there are three aspects that should be considered for ML
over encoded data:
• (1) the structure of the ML model we want to learn (with its
associated loss function),
• (2) the optimization algorithm (for example, batch gradient descent
or stochastic gradient descent), and
• (3) the compression scheme, which creates opportunities for new
techniques.
9. • In the following paper, the authors take a first step towards examining
the interplay of all three aspects.
• Surprisingly and intriguingly, they find that a slight variant of the
classical Lempel-Ziv-Welch (LZW) coding scheme fits well for two
popular ML techniques: The k-means clustering and generalized linear
models (GLMs).
arXiv:1702.06943v2 [cs.LG] 1 March 2017
11. What are the major differences of Image Processing
between Human Vision System (HVS) and
Deep Neural Networks (DNNs)?
• Our major observation is :
• DNNs can respond to any important frequency component
precisely, but human visual system focuses more on the low-
frequency information than high-frequency ones, indicating
“ fewer features to be learned by DNNs after the HVS-inspired
compression.”
12. Feature degradation will impact the
classification.
The left Figure
demonstrates an
example that –the
“junco" is mis-
predicted as “robin"
after removing the top
six high-frequency
components, despite
that the differences
are almost
indistinguishable by
human eyes
燈心草雀
知更鳥
14. End-To-End Secure Platform for Machine Learning
Training
Data
Machine
Learning
algorithms
Training
Data
In
Cyphertext
Domain
Learning
Algorithm
In
Cyphertext
Domain
Classification
Results
In
Cyphertext
Domain
Classifica6on
Results
In
Plaintext
Domain
Partial/Fully
Homomorphic
Encryption
Map
key
16. • Interplay between Data security and AI/ Machine Learning
→ Crypto-analysis
• Interplay between Data security and Data compression
→ Distributed Video Coding
→ Joint Compression and Encryption schemes
→ VLC-based Authentication and Data Integrity checking
• Interplay between Forensics and AI/Machine Learning
→ Anti-spoofing
→ Fake News Detection
18. • In the past few years, combining both coding and
encryption in a single algorithm to reduce the
complexity is a new tempting approach for securing
data during transmission and storage.
• This new approach aims to extend the functionality of
compression algorithms to achieve both compression
and encryption simultaneously in a single process
without an additional encryption stage.
• It has been proven that the combined approach highly
reduces the required resources for encryption
(computational and power resources).
19. • Also, the new approach preserves all available standard
features which are not available when applying traditional
encryption schemes, such as progressive transmission for
JPEG2000 (also available for JPEG) and the random access
feature (also called compressed domain processing) in
JPEG2000.
• Furthermore, the new approach achieves more features
and capabilities over traditional encryption schemes such as
multilevel security access.
• The most attracting target for this new approach is the
arithmetic coder.
20. • Arithme6c coder is a lossless entropy coder used for
most widespread mul6media coding standards as a last
compression stage. This is due to its higher compression
efficiency than tradimonal Huffman coder.
• Arithmemc coder is included in JPEG image codec and
H.263 video codec as an alterna6ve op6on for Huffman
coder.
• For more recent mul6media standards, which require
more compression performance like JPEG2000 and JBIG
image codecs, H.264 and H.265 (HEVC) video codecs,
arithme6c coder is mandatory.
21. Avalanche effect for Arithmetic Coder
• Arithmetic coder is characterized by its high error sensitivity
and error propagation properties.
• The avalanche effect for the arithmetic coder is an important
criteria for using the arithmetic coder for security.
• It is proven by [1] that any arithmetic coder can be
considered a chaotic random generator with proven
cryptographic nonlinear properties.
• [1] N. Nagaraj, P. G. Vaidya, and K. G. Bhat, “Arithmetic coding as a
nonlinear dynamical system,” Communications in Nonlinear Science and
Numerical Simulation, vol. 14, no. 4, pp. 1013 – 1020, 2009.
22. • Moreover, a practical experiment is described in [2] uses the
NIST’s statistical test tool [3] to support these cryptographic
properties.
• Consequently, this means that any change in the input bit-
stream for the encoder/decoder side (even in a single bit),
leads to a huge avalanche effect for the all the following
encoded/decoded output bit-stream.
• [2] M. Sinaie and V. T. Vakili, “Secure arithmetic coding with error
detection capability,” EURASIP J. on Information Security, vol. 2010, pp.
4:1–4:9, Sep 2010. [Online]. Available: http:
//dx.doi.org/10.1155/2010/621521
• [3] N. I. of Standards and Technology. (2010, April) Nist statistical test suite.
[Online]. Available: http://csrc.nist.gov/groups/ST/toolkit/rng/
documentation_software.html
24. Example of AR’s Avalanche effect
• Assuming discrete-memoryless source with four symbols
{A,B,C,D} with probabilities {PA = 0.1, PB = 0.2, PC = 0.3, PD = 0.4} . Let
the input message is : {ABDCDCBCDD} , then the point 0.026189424
can be used as a result for AR-coding the message.
• The binary representation of the coded message 0.026189424 would
be : 000001101011010001011001101000100101.
• Now, changing it to be 000001111011010001011001101000100101,
that is, with a single bit error which leads to another coded point :
0.030095674635959.
• Here, the recovered message will be {ACAACADADC} with 80% errors.
Clearly, this AR’s characteristics can be applied to check data integrity.