Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Engineering Data Pipeline for Data-Driven Analytics
1. Engineering a Pipeline for
Data-Driven Analytics
Data-Driven Engineering
Tagged:
Speaker: Segzpair
2. Introduction
We’ve understood the need for data in making user
centered decisions through in-depth analysis.
Have you ever wondered how data processing platforms
are built to drive intelligence?
I will be taking you on a ride to further distill BIG data,
collection, transformation, modeling and analytics.
4. Ad Tracker
- Facebook Ad Engine, Google AdSense
Infrastructure Monitoring
- DataDog, NewRelic
Atlassian Products
- Confluent, Jira, Bitbucket
Collaboration Tools
- Slack, Skype
Your platform
- website, startup-tool etc.
Data Crunching platforms
Existing Platforms
Already crunching data (not considering the scale of data)
5. Sources of Data Generation
- First, Second and Third Party Data
Forms of Data Existence
- Text, Multimedia (Audio, Images and Video)
Speed of Data Generation
- batch and stream processes
Quick Insight
What is Big Data
Why, Where and Relevance of Big Data | The 3 Vs of Data
8. Sources of data
Data ingestion
Collection and Extraction...
Batch Data Collection
- The use of Airflow
Stream Data Ingestion
- custom plugins into systems
- kafka as a streaming tool
Things to consider when choosing a technology
- Rate of data generation
- Processing rate
- Cost effective tool
9. Processing
Transformation and data Standardization
Data Catalog
- on-boarding principles
- data label categorization and standardization
Data Privacy and Security
- obfuscating PIIs
- infrastructure security and restricted access
Identity Resolution
- realtime correlation of identity attributes
Tracking
- process tracking and monitoring using CloudWatch
Executing to Scale
10. Storage Mechanism
Storage
Big data storage in cloud, cheapest approach
Data in Lake
- cloud infrastructure usage like S3
- optimized data storage format in Parquet
- security on network layer using private subnets and VPC
Data Validation
- queryable data using Athena
- first-level visibility with Quick Sight
Accessibility for ML and Other Processes
- using sagemaker
11. More Complex Approach
Data Presentation
Visualization and Presentation...
Optimized Access
- Data loaded on ElasticSearch and AeroSpike
Visualization
- any graph visualization tool of choice
12. Intelligent Reach
Meeting needs of users with ease
Adrenaline and Marketing Console
- offline targeting
Adatrix
- non-programmatic online targeting
Demand Side Platform (DSP)
- programmatic online reach
Meeting the right audience
13. Conclusion
Data in its raw state isn’t of much use, until it is refined and
well arranged.
This one of the many other things we do at Terragon.
Making Data Meaningful in a MarTech space.