This document discusses architectures for AI systems and big data engineering. It addresses topics like what constitutes data engineering; how to store and process event data using pipelines; when to use batch vs stream processing; how AI models can be integrated into data pipelines; and considerations for engineering systems involving big data and AI like non-functional requirements, reprocessing data as models improve, and storing model outputs. The overall aim is to provide an overview of key concepts in engineering systems that handle large volumes of data and incorporate artificial intelligence.
10. How to query news feed?
SELECT
*
FROM posts
INNER JOIN friends
WHERE ...
ORDER BY
posts.timestamp DESC
11. Notify? Web,
mobile?
Who can
see this?
Racist? Vulgar?
Is this a face? Who’s
this? Friend? Celebrity?
Courtney likes. Is that
good or bad?
Paddy commented. Is
that good or bad?
Chris posted. Is that
good or bad?
Anybody tagged?
What rank
in feed?
22. ● Volume?
● Velocity? QPS reads? QPS writes?
● Latency?
● Cost? Storage & R/W
● How to write?
○ Integrity?
○ Consistency?
○ Durability?
○ Version?
● How to read?
○ Random access or sequential?
○ Full text search?
○ Geo distance?
How to store events?
23. MySQL MongoDB JSON on S3 (or
GCS)
30 GB OK Good Very good
10K WPS OK Good Very good
1K RPS OK Good Very good
Range read OK Good Very good
Cost $$ $$$ $
MySQL MongoDB
30 GB OK Good
10K WPS OK Good
1K RPS OK Good
Sequential read OK Good
Cost $$ $$$
How to store events?
43. What to do with the sink?
Write Read
Data scientist
Sales
44. What are the read use cases?
Give me summary
report of last
month’s activity
Give me posts that
contain the words
Donald Trump,
Trump or President
Give me all posts by
female, age 18-35
Aggregation Full text search Bulk data, filtered
60. Want to learn more about
AI & Big Data?
We’re hiring:
● Big Data Engineer, in training (Java)
● Big Data Engineer (Java)
● Data Scientist (Python)
http://bit.ly/quod-ai-join
herve@quod.aiHerve Roussel