Everyone has heard at least once of the magical powers of machine learning - who wouldn't want to be able to program a driverless car on its own? How many times, however, have you heard about the "dark sides" of data-driven technology? In this talk, we will show live demos of five aspects of deep learning that any developer should be aware of: (i) algorithmic bias; (ii) adversarial attacks on trained systems; (iii) breaches of privacy; (iv) safety threats; and (v) hidden technical debts. The lesson is: beware of blind reliance on deep learning - unless you are looking for "deep" troubles!
3. The deep learning craze
[Business Insider, 2017]
IBM speech recognition is on the verge of super-human accuracy
[Time, 2017]
Are Computers Already Smarter Than Humans?
[LiveScience, 2016]
Artificial Intelligence Beats 'Most Complex Game Devised by
Humans'
[RECODE, 2017]
Intel is paying more than $400 million to buy deep-learning
startup Nervana Systems
3
6. I know there's a proverb which that says 'To err is human,' but a
human error is nothing to what a computer can do if it tries.
--- Agatha Christie
People worry that computers will get too smart and take over the
world, but the real problem is that they're too stupid and they've
already taken over the world.
--- Pedro Domingos
6
7. What about the limitations of
DL?
DL is not magic - it is an incredibly powerful tool for
extracting regularities from data according a given
objective.
Corollary #1: A DL program will be just as smart as the
data it gets.
Corollary #2: A DL program will be just as smart as the
objective it optimizes.
7
9. Word embeddings
Can convert words to vectors of numbers - at the
hearth of most NLP applications with deep learning
9
10. Embeddings are highly sexists!
Bolukbasi, T., Chang, K.W., Zou, J., Saligrama, V. and Kalai, A., 2016.
Quantifying and reducing stereotypes in word embeddings. arXiv preprint
10
11. Hundreds of papers were published before this was
openly discussed!
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V. and Kalai, A.T., 2016.
. In Advances in Neural Information Processing Systems (pp. 4349-4357).
This is because gender biases probably account for an
increase in testing accuracy.
Man is to computer programmer as woman is to
homemaker? Debiasing word embeddings
11
12. Recent years have brought extraordinary
advances in the technical domains of AI. Alongside such efforts,
designers and researchers from a range of disciplines need to
conduct what we call social-systems analyses of AI. They need to
assess the impact of technologies on their social, cultural and
political settings
--- There is a blind spot in AI research, Nature, 2016
12
13. Racism is definitely bad PR!
[New Statesman, 2016]The rise of the racist robots
13
14. Not just an economic problem
[an investigation] found that the proprietary algorithms widely
used by judges to help determine the risk of reoffending are
almost twice as likely to mistakenly flag black defendants than
white defendants [There is a blind spot in AI research]
14
22. Anonymous data?
De Montjoye, Y.A., Radaelli, L. and Singh, V.K., 2015. Unique in the shopping mall: On the reidentifiability of credit card metadata.
Science, 347(6221), pp.536-539.
22
23. Given access to a black-box classifier, can we infer
whether a specific example was part of the training
dataset?
We can with shadow training:
Shokri, R., Stronati, M., Song, C. and Shmatikov, V., 2017, May. Membership inference attacks against machine learning models. In
2017 IEEE Symposium on Security and Privacy (SP), Â (pp. 3-18). IEEE.
23
24. Privacy in distributed
environments
Hitaj, B., Ateniese, G. and Perez-Cruz, F., 2017. Deep Models Under the GAN: Information Leakage from Collaborative Deep
Learning. arXiv preprint arXiv:1702.07464.
24
28. DL is just a tiny component!
(NIPS 2015)
Hidden Technical Debt in Machine Learning Systems
28
29. (NIPS 2015)
Machine learning offers a fantastically powerful toolkit for
building useful complex prediction systems quickly. ... it is
dangerous to think of these quick wins as coming for free. ... it is
common to incur massive ongoing maintenance costs in real-
world ML systems. [Risk factors include] boundary erosion,
entanglement, hidden feedback loops, undeclared consumers,
data dependencies, configuration issues, changes in the external
world, and a variety of system-level anti-patterns.
Hidden Technical Debt in Machine Learning Systems
29
30. If you are in Rome, check out our
Meetup:
And our new association:
Italian Association for Machine Learning
30