15. DATA ENGINEERING HAS COME
DATA ENGINEERS ARE GETTING
TO PLAY MORE ACTIVE ROLE
COMPANIES HIRE DATA ENGINEERS
MORE IMPACT FROM
BIG DATA TECHNOLOGIES
[KAFKA, SPARK, ETC.]
BLESSING FROM
ST. MARTIN FOWLER
CROSSFUNCTIONAL TEAMS
PER SUBDOMAIN
20. MOOOARRR…
LOWER COST OF CHANGE
Changes are local, Easier to make change
Easy to deploy
LOWER MAINTANANCE COSTS
Lower OPEX
Less routine efforts from developers
EASY TO STAFF AND REPLACE
DEVS
Python or Node.js over Scala
34. BACK TO SQL (2/2)
Trend – develop a component where SQL would be a DSL
LEVERAGE CLOUD CAPABILITIES
• Well written SQL code (?!) easier to maintain and reduce complexity
• Lower cost of change
ADVANTAGES
35. MORE SERVERLESS
• Faster TTM
• Less maintenance efforts
• Cheaper solution [in some cases]
ADVANTAGES
Transforming Storage Data
Governance
Data Analytics
44. DATA LAKE HOUSES (2/2)
Separating processing (stateless
computational clusters) and persistence
Allows scalable ELT Single infrastructure for DWH and data lake (hello
DataMesh)
45.
46. AI-DRIVEN DATA ANALYTICS (1/2)
INSIGHT GENERATION AUTOMATION NATURAL LANGUAGE
INTERACTION
ADVANCED
ANALYTICS
49. WHY BIG DATA MIGHT MATTER TO YOU
• How does big data influence on ordinary engineering teams?
• More chances to see big data on a projects
• Some big data technologies
• Martin Fawler? Cross functional team
50. • Target audience
• Managers, engineers
• ETA
• Why this talk might have interest to other engineers. No hooks for them
• How big data influeces engineering teams?
• Message
• Message is to give you guys understanding in brief what big data looks like,
latest trends there and who knows maybe you will change your religion
• Give outlook of for people outside big data world what is going on.
51. ideas
• Multiple tiers
• technology
• align things with experience from sigma software (demand from
clients) – simplicity of maintanance, ETA,
• Source
• Maturity of big data
• Challenges
• A way to split this talk into stories
• Kind of great methafor where all sections would be included
somehow
52. • - dbt зараз дуже стала можна штука (https://www.getdbt.com/)
• - Datalake related : Dremio + LakeFS
• - тестування даних https://greatexpectations.io/
• - тренди типу Lakehouse + Data Mesh (і ще тут
https://www.infoq.com/articles/ai-ml-data-engineering-trends-2021/)
• - data discovery https://www.amundsen.io/ +
https://github.com/linkedin/datahub
• а ще тут може щось знайдеш корисне, я сюди майже кожного дня
пишу (https://t.me/bigdatadriven)
55. trends
• Data Lake Houses
• 1 generation
• 2 generation (snowflake, dreamio)
• Back to SQL (DBT)
• Everywhere on all tiers from execution, storage
• ETL productization
• More cloud
• More PaaS
• Data mesh