This document discusses how modern artificial intelligence relies on massive amounts of data rather than complex algorithms. It provides examples of how companies like Google, Facebook, and WeChat have improved services by utilizing long datasets from many users and merging different types of wide data. The author argues that while algorithms are typically well-known, data ownership allows companies to gain market power since data availability is crucial for artificial intelligence systems.
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspective
1. 9/19/2019 Heiko Paulheim 1
Big Data, Smart Algorithms, and Market Power
A Computer Scientist’s Perspective
Heiko Paulheim
Chair for Data Science
University of Mannheim
Heiko Paulheim
2. 9/19/2019 Heiko Paulheim 2
Introductory Example: GPS vs. Smart Phones
• Tests show: smart phones do the job better
– with smart phones on the rise, GPS sales decline
0
5.000
10.000
15.000
20.000
25.000
30.000
GPSsales
Smart phonesales
Source: Statista
Data for Germany;
US looks similar
3. 9/19/2019 Heiko Paulheim 3
Computer Science Interlude: Navigation
• Problem: find the shortest path through a network
• Solution: known since the 1950s
– can be written down in less than 20 lines
End
Start
2km
2km
1km
1km
1km
3km
2km
1km
4. 9/19/2019 Heiko Paulheim 4
Computer Science Interlude: Navigation
• Usually, we do not want the shortest way
– but the fastest
• We need to estimate times
End
Start
0:05 0:15
0:10
0:10
0:15
0:15
0:05
0:10
5. 9/19/2019 Heiko Paulheim 5
Estimating Times for Edges
• Static: path length and speed limit
• Dynamic: live car movements
• Google Maps: owned by Google
– So is Android (market share US: 48%, Germany: 73%, China: 79%)
– i.e., about one android phone in every other car
Source: https://gs.statcounter.com/os-market-share/mobile/
6. 9/19/2019 Heiko Paulheim 6
Visual Depiction
• One Android phone in every other car
Image: Bing Maps
7. 9/19/2019 Heiko Paulheim 7
Improving Navigation
• Ingredients:
– A simple standard textbook algorithm from the 1950s
– A lot of data
• Better navigation
– Usually: not by smarter algorithms
– But by better (=bigger) data!
End
Start
0:05
0:10
0:15
0:10 0:25
0:10
0:15
0:15
0:05
Image: https://neo4j.com/blog/top-13-resources-graph-theory-algorithms/
8. 9/19/2019 Heiko Paulheim 8
A.I. Winters and A Paradigm Shift
• AI has a massive uptake since the 2010s
– But using very different paradigms
1st
AI Winter
2nd
AI Winter
Fast & Horvitz (2016): Long-Term Trends in the Public Perception of Artificial Intelligence
9. 9/19/2019 Heiko Paulheim 9
An Example for AI: Go
• 1990s
– Using handcrafted rules
• i.e., smart algorithms
– Often defeated by children
2010s
Using data from millions of
games
i.e., big data
AlphaGo: Beat some of world’s
best players in 2016
10. 9/19/2019 Heiko Paulheim 10
AI in the Big Data Age (1)
• Algorithms are fairly simple and well known
• Data matters
Banko & Brill (2001): Scaling to Very Very Large Corpora for Natural Language Disambiguation
smarter
algorithm
more
data
11. 9/19/2019 Heiko Paulheim 11
AI in the Big Data Age (2)
• Algorithms are fairly simple and well known
• Data matters
Banko & Brill (2001): Scaling to Very Very Large Corpora for Natural Language Disambiguation
more data:
trivial baseline
beats smart
algorithms
12. 9/19/2019 Heiko Paulheim 12
Big Data: Long vs. Wide Data
• Long data = more records of the same kind
– e.g., GPS data from more users
• Wide data = more information about the same records
– e.g., additional information about users
Lehmberg & Hassanzadeh (2018): Ontology Augmentation Through Matching with Web Tables
13. 9/19/2019 Heiko Paulheim 13
It’s All about Patterns in Data
• Examples
– Traffic movements
– Online user behavior
– Cliques in social networks
– …
• Methods:
– Data Mining
– Machine Learning
– …
→ Intensively researched since the 1980s
Image: https://factordaily.com/balaraman-ravindran-reinforcement-learning/
17. 9/19/2019 Heiko Paulheim 18
Big Data: Long vs. Wide Data
• Example: YouTube (owned by Google)
– Display videos to the user that are as interesting as possible
• Long data: users’ interaction histories
• Wide data:
users’ interaction histories + Google Web searches + visited places
+ Google Play music preferences + ...
18. 9/19/2019 Heiko Paulheim 19
Big Data: Long vs. Wide Data
• Example: Facebook
– Display as much content of interest as possible
• Long data: user profile and interactions
• Wide data:
user profile and interactions + WhatsApp chats
In Germany,
OVG Hamburg
prohibits this
combination!
Image: https://www.instagram.com/p/Bt3OG4DFOsK/
19. 9/19/2019 Heiko Paulheim 20
Big Data: Long vs. Wide Data
• Example: WeChat
• Started as chat application
– showing advertisement based on chats
– later added: apps-in-app (shopping, payment, …)
– CS perspective: rather an OS than an app
• Long data
– Many people’s chats
• Wide data
– Chats
– Shopping history (also includes: products viewed)
– Payment history
Image: Wikipedia
20. 9/19/2019 Heiko Paulheim 21
Take Aways
• Modern AI Systems
– Rely on massive amounts of data
– Processed with fairly simple algorithms
• Algorithms are often well known
– e.g., textbooks, research papers
– It is hard to own an algorithm
• Data is crucial
– Longer data (e.g., acquiring more customers)
– Wider data (e.g., merging businesses)
– It is easy to own data
21. 9/19/2019 Heiko Paulheim 22
Big Data, Smart Algorithms, and Market Power
A Computer Scientist’s Perspective
Heiko Paulheim
Chair for Data Science
University of Mannheim
Heiko Paulheim