SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Architecting for change:
LinkedIn's new data ecosystem
Sept 28, 2016
Shirshanka Das, Principal Staff Engineer, LinkedIn
Yael Garten, Director of Data Science, LinkedIn
@shirshanka, @yaelgarten
Design for change. Expect it. Embrace it.
Product Change Technology Culture
&
Process
Learnings
The Product Change: 

Launch a completely rewritten LinkedIn mobile app
What does this impact?
Data driven
product
Tracking data records user activity
InvitationClickEvent()
Tracking data records user activity
InvitationClickEvent()
Scale fact:

~ 1000 tracking event types, 

~ Double-digit TB per day, 

hundreds of metrics & data
products
user
engagement
tracking data
metric scripts
production code
Tracking Data Lifecycle
TransportProduce Consume
Member facing
data products
Business facing
decision making
Tracking Data Lifecycle & Teams
TransportProduce Consume
Product or App teams:
PMs, Developers, TestEng
Infra teams:
Hadoop, Kafka, DWH, ...
Data teams: 

Analytics, Relevance Engineers,...
user
engagement
tracking data
metric scripts
production code
Member facing
data products
Business facing
decision making
How do we calculate a metric: ProfileViews
PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
Metric: 

ProfileViews = sum(PageViewEvent)

where pageKey = profile_page

PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"new_profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
or new_profile_page
PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
CASE	
		WHEN	trackingInfo["profileIds"]	...	
		WHEN	trackingInfo["profileid"]	...	
		WHEN	trackingInfo["profileId"]	...	
		WHEN	trackingInfo["url$profileIds"]	...	
		WHEN	trackingInfo["11"]	LIKE	'%profileIds=%'	THEN	SUBSTRING(trackingInfo["11"],9,60)	
		WHEN	trackingInfo["12"]	LIKE	'%priceIds=%'	THEN	SUBSTRING(trackingInfo["12"],9,60)	
		ELSE	NULL	
END	AS	profile_id
Evolution as we mature and grow...
Metric: ProfileViews = sum(PageViewEvent

where 

pagekey = profile_page and
memberID != trackinginfo[vieweeID] )
Eventually… unmaintainable
get_tracking_codes	=	foreach	get_domain_rolled_up	generate	
		..entry_domain_rollup,	
	 (		(tracking_code	matches	'eml-ced.*'	or	tracking_code	matches	'eml-b2_content_ecosystem
_digest.*'	
				or	(referer	is	not	null	and	(referer	matches	'.*touch.linkedin.com.*trk=eml-ced.*'		
				or	referer	matches	'.*touch.linkedin.com.*trk=eml-b2_content_ecosystem_digest.*'))	?	'Email	-	CED'	:						
(tracking_code	matches	'eml-.*'	or	(referer	is	not	null	and	referer	matches	'.*touch.linkedin.com.*trk=eml-.*')	
or	entry_domain_rollup	==	'Email'	?	'Email	-	Other'	:	
								(tracking_code	==	'hp-feed-article-title-hpm'	and	entry_domain_rollup	==	'Linkedin'	?	'Homepage	Pulse	
Module'	:	((tracking_code	matches	'hp-feed-.*'	and	entry_domain_rollup	==	'Linkedin')	or	(std_user_interface	
matches	'(phone	app|tablet	app|phone	browser|tablet	browser)'	and	tracking_code	==	'v-feed')			or	
(tracking_code	==	'Organic	Traffic'	and	entry_domain_rollup	==	'Linkedin'	and	(referer	==	'https://
www.linkedin.com/nhome'	or	referer	==	'http://www.linkedin.com/nhome'))	?	'Feed'	:	
												(tracking_code	matches	'hb_ntf_MEGAPHONE_.*'	and	entry_domain_rollup	==	'Linkedin'	?	'Desktop	
Notifications'	:			(tracking_code	==	'm_sim2_native_reader_swipe_right'	?	'Push	Notification'	:		(tracking_code	==	
'pulse_dexter_stream_scroll'		and	entry_domain_rollup	==	'Linkedin'	?	'Pulse	-	Infinite	Scroll	on	Dexter'	:	--infinite	
scroll	on	dexter	
			((tracking_code	==	'pulse_dexter_nav_click'	or	tracking_code	==	'pulse-det-nav_art')	and	entry_domain_rollup	==	
'Linkedin'	?	'Pulse	-	Left	Rail	Click	on	Dexter'	:	--left	rail	click	on	dexter	
																				(tracking_code	==	'Organic	Traffic'	and	referer	is	not	null	and	referer	matches	'.*linkedin.com/pulse
/article.*'	?	'Publishing	Platform'	:	
																						'None	Found	Yet'))))))))))	as	entry_point;	
Homepage
team
Push Notification
team
Email team
Long form post
team
PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
We wanted to move to better data models
LI_ProfileViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	4745292951145,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
"entityView"	:	{	
					"viewType"	:	"profile-view",	
					"viewerId"	:	“12345”,		
"vieweeId"	:	“23456”,		
		},	
}
Two options:

1. Keep the old tracking:
a. Cost: producers (try to) replicate it (write bad old code
from scratch),
b. Save: consumers avoid migrating.



2. Evolve.
a. Cost: time on data modeling, and on consumer
migration,
b. Save: pays down data modeling tech debt
How much work would it be?
How much work would it be?
Two options:
1. Keep the old tracking:
a. Cost: producers (try to) replicate it (write bad old code
from scratch),
b. Save: consumers avoid migrating.



2. Evolve.
a. Cost: time on data modeling, and on consumer
migration,
b. Save: pays down data modeling tech debt
2000 daysEstimated cost to update consumers to new tracking with clean, committee-approved data models
Estimated cost for producers to attempt to replicate old tracking
5000 days
#AnalyticsHappiness
The Task and Opportunity
Must do: So we will do the data modeling, and rewrite all the metrics to
account for the changes happening upstream… but…



Extra credit points: How do we make sure that the cost is not this high the
next time?




How do we handle evolution in a principled way?
Product Change Technology Culture
&
Process
Learnings
Metrics ecosystem at LinkedIn: 3 yrs ago
Operational Challenges
Diminished Trust due to multiple sources of truth
Data Stages
Ingest Process Serve VisualizeCreate
Ingest Process Serve VisualizeCreate
Tracking
Kafka
Espresso
…
Tracking Architecture
SDKs in different frameworks
(server, client)
Tracking front-end
Monitoring Tools
Components
KafkaClient-side
Tracking
Tracking
Frontend
Services
Tools
Create
Data Stages
Ingest Process Serve VisualizeCreate
Unified Ingestion with
Hundreds of TB / day
Thousands of datasets
80+% of data ingest
Ingest
In production @ LinkedIn, Intel, Swisscom, NerdWallet, PayPal
Ingest
Ingest Process Serve VisualizeCreate
Hadoop
Processing engines @ LinkedInProcess
Ingest Process Serve VisualizeCreate
Pinot
Pinot
Kafka Hadoop
Samza Jobs
Pinot
minutes
hour +
Distributed Multi-dimensional OLAP
Columnar + indexes
No joins
Latency: low ms to sub-second
Serve
Site-facing	Apps Reporting	dashboards Monitoring
In production @
LinkedIn, Uber
Serve
Ingest Process VisualizeCreate
Hadoop Pinot Raptor
Serve
Ingest Process VisualizeCreate
Unified Metrics Platform (UMP)
Hadoop Pinot Raptor
Serve
Unified Metrics Platform
Metrics Logic
Raw
Data
Pinot
UMP Harness
Incremental
Aggregate
Backfill
Auto-join
Raptor
dashboards
HDFS
Aggregated
Data
Experiment
Analysis
Relevance
...
HDFS
Ad-hoc
Ingest Process Serve VisualizeCreate
RaptorKafka

Espresso

…
Hadoop Pinot
Tracking Unified Metrics Platform (UMP)
How do we handle old and new?
PageViewEvent
ProfileViewEvent
Producers Consumers
old
new
Relevance
Analytics
The Big Challenge
load “/data/tracking/PageViewEvent” using AvroStorage()
(Pig scripts)
My Raw Data
Our scripts were doing ….
My Raw Data
My Data API
We need “microservices" for Data
The Database community solved this
decades ago...
Views!
We had been working on something that could help...
A Data Access Layer for Linkedin
Abstract away underlying physical details to allow users to
focus solely on the logical concerns
Logical Tables + Views
Logical FileSystem
Solving
With
Views
Producers
LinkedInProfileView
PageViewEvent
ProfileViewEvent
new
old
Consumers
pagekey==
profile
1:1
Relevance
Analytics
Views
ecosystem
41
Producers Consumers
LinkedInProfileView
JSAProfileView
Job Seeker App
(JSA)
LinkedIn App
UnifiedProfileView
Data Catalog +
Discovery
(DALI)
DaliFileSystem Client
Data Source
(HDFS)
Data Sink
(HDFS)
Processing Engine
(MapReduce, Spark)
DALI Datasets (Tables + Views)
Query Layers
(Hive, Pig, Spark)
View Defs +
UDFs
(Artifactory, Git)
Dataflow APIs
(MR, Spark,
Scalding)
DALI CLI
Dali: Implementation Details in Context
From
load ‘/data/tracking/PageViewEvent’
using AvroStorage();
To
load ‘tracking.UnifiedProfileView’ using
DaliStorage();
One small step for a script
A Few Hard Problems
Versioning
Views and UDFs
Mapping to Hive metastore entities
Development lifecycle
Git as source of truth
Gradle for build
LinkedIn tooling integration for deployment
Early experiences with Dali views
How we executed
Lots of work to get the infra ready
Closed beta model
Tons of training and education (hand holding) for all
Governance body
Feedback from analysts is overwhelmingly positive:
+ Much simpler to share and standardize data cleansing code with peers
+ Provides effective insulation to scripts from upstream changes
- Harder to debug where problems are due to additional layer
State of the world today
~100 producer views
~200 consumer views
~30% of UMP metrics use Dali data
sources
~80 unique tracking event data sources
ProfileViews
MessagesSent
Searches
InvitationsSent
ArticlesRead
JobApplications
...
What’s next for Dali?
Real-time Views on streaming data
Selective materialization
Hive is an implementation detail, not a long term bet
Open source
Data Quality Framework
Product Change Technology Culture
&
Process
Learnings
Infrastructure enables, but culture really preserves
get_tracking_codes	=	foreach	get_domain_rolled_up	generate	
		..entry_domain_rollup,	
	 (		(tracking_code	matches	'eml-ced.*'	or	tracking_code	matches	'eml-b2_content
_ecosystem_digest.*'	
				or	(referer	is	not	null	and	(referer	matches	'.*touch.linkedin.com.*trk=eml-ced.*'		
				or	referer	matches	'.*touch.linkedin.com.*trk=eml-b2_content_ecosystem_digest.*'))	?	'Email	
-	CED'	:						(tracking_code	matches	'eml-.*'	or	(referer	is	not	null	and	referer	matches	'.*touch.linkedin
.com.*trk=eml-.*')	or	entry_domain_rollup	==	'Email'	?	'Email	-	Other'	:	
								(tracking_code	==	'hp-feed-article-title-hpm'	and	entry_domain_rollup	==	'Linkedin'	?	'Homepage	
Pulse	Module'	:	((tracking_code	matches	'hp-feed-.*'	and	entry_domain_rollup	==	'Linkedin')	or	
(std_user_interface	matches	'(phone	app|tablet	app|phone	browser|tablet	browser)'	and	tracking_code	
==	'v-feed')			or	(tracking_code	==	'Organic	Traffic'	and	entry_domain_rollup	==	'Linkedin'	and	(referer	==	
'https://www.linkedin.com/nhome'	or	referer	==	'http://www.linkedin.com/nhome'))	?	'Feed'	:	
												(tracking_code	matches	'hb_ntf_MEGAPHONE_.*'	and	entry_domain_rollup	==	'Linkedin'	?	
'Desktop	Notifications'	:			(tracking_code	==	'm_sim2_native_reader_swipe_right'	?	'Push	Notification'	:		
(tracking_code	==	'pulse_dexter_stream_scroll'		and	entry_domain_rollup	==	'Linkedin'	?	'Pulse	-
For a great data ecosystem that can handle change:
1. Standardize core data entities

2. Create clear maintainable contracts between data producers
& consumers

3. Ensure dialogue between data producers & consumers

1. Standardize core data entities
• Event types and names: Page, Action, Impression
• Framework level client side tracking: views, clicks, flows
• For all else (custom) - guide when to create a new Event or Dali view

Navigation
Page View
Control Interaction
2. Create clear maintainable contracts
1
1. Tracking specification with monitoring: clear, visual, consistent contract
Need tooling to support culture shift

Tracking specification Tool
2
2. Dali dataset specification with data quality rules
3. Ensure dialogue between Producers & Consumers
• Awareness: Train about end-to-end data pipeline, data modeling
• Instill communication & collaborative ownership process between all: a step-by-step
playbook for who & how to develop and own tracking

PM → Analyst → Engineer → All3 → TestEng → Analyst
user engagement
tracking data
metric 

scripts
production

code
Member facing

data products
Business facing
decision making
Product Change Technology Culture
&
Process
Learnings
Our Learnings
Culture and Process
● Spend time to identify what needs culture & process, and 

what needs tools & tech
● Big changes can mean big opportunities
● Very hard to massively change things like data culture or data tech debt; never
a good time to invest in “invisible” behind-the-scenes change

→ Make it non-invisible -- try to clarify or size out the cost of NOT doing it

→ needed strong leaders, and a village

Tech
● Must build tooling to support that culture change otherwise culture will revert
● Work hard to make any new layer as frictionless as possible
● Virtual views on Hadoop data can work at scale! (Dali views)
For a great data ecosystem that can handle change:
1. Standardize core data entities

2. Create clear maintainable contracts between data producers & consumers

3. Ensure dialogue between data producers & consumers

Design for change. Expect it. Embrace it.
Did we succeed? We just handled another huge change!
#AnalyticsHappiness
Thank you.
@shirshanka, @yaelgarten

Más contenido relacionado

La actualidad más candente

MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
Mat Keep
 

La actualidad más candente (20)

Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 
Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Scaling Your Data: Data Democratisation and DataOps
Scaling Your Data: Data Democratisation and DataOpsScaling Your Data: Data Democratisation and DataOps
Scaling Your Data: Data Democratisation and DataOps
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
 
ATAAS2016 - Big data analytics – data visualization himanshu and santosh
ATAAS2016 - Big data analytics – data visualization   himanshu and santoshATAAS2016 - Big data analytics – data visualization   himanshu and santosh
ATAAS2016 - Big data analytics – data visualization himanshu and santosh
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Milkrun routing optimization
Milkrun routing optimizationMilkrun routing optimization
Milkrun routing optimization
 
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entitySpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
 
Shortest path routing
 Shortest path routing Shortest path routing
Shortest path routing
 
The Rise of the Citizen Data Scientist
The Rise of the Citizen Data ScientistThe Rise of the Citizen Data Scientist
The Rise of the Citizen Data Scientist
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Accelerating Digital Transformation With Microsoft Azure And Cognitive Services
Accelerating Digital Transformation With Microsoft Azure And Cognitive ServicesAccelerating Digital Transformation With Microsoft Azure And Cognitive Services
Accelerating Digital Transformation With Microsoft Azure And Cognitive Services
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 

Destacado

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
Cloudera, Inc.
 

Destacado (20)

How to use your data science team: Becoming a data-driven organization
How to use your data science team: Becoming a data-driven organizationHow to use your data science team: Becoming a data-driven organization
How to use your data science team: Becoming a data-driven organization
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Présentation Micropole 3eme Forum MDM - 19 novembre 2014
Présentation Micropole 3eme Forum MDM - 19 novembre 2014Présentation Micropole 3eme Forum MDM - 19 novembre 2014
Présentation Micropole 3eme Forum MDM - 19 novembre 2014
 
DATA FORUM MICROPOLE - 2015
DATA FORUM MICROPOLE - 2015DATA FORUM MICROPOLE - 2015
DATA FORUM MICROPOLE - 2015
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL Support
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Fasten you seatbelt and listen to the Data Steward
Fasten you seatbelt and listen to the Data StewardFasten you seatbelt and listen to the Data Steward
Fasten you seatbelt and listen to the Data Steward
 
SecPod: A Framework for Virtualization-based Security Systems
SecPod: A Framework for Virtualization-based Security SystemsSecPod: A Framework for Virtualization-based Security Systems
SecPod: A Framework for Virtualization-based Security Systems
 
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
 
Nested Types in Impala
Nested Types in ImpalaNested Types in Impala
Nested Types in Impala
 
BIg Data Trends in 2016
BIg Data Trends in 2016BIg Data Trends in 2016
BIg Data Trends in 2016
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven Organization
 
1DMP: Marketing Data Platform - the future of data-driven marketing
1DMP: Marketing Data Platform - the future of data-driven marketing1DMP: Marketing Data Platform - the future of data-driven marketing
1DMP: Marketing Data Platform - the future of data-driven marketing
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 

Similar a Architecting for change: LinkedIn's new data ecosystem

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Kai Wähner
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
VMware Tanzu
 

Similar a Architecting for change: LinkedIn's new data ecosystem (20)

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanBackstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
 
Digital Twin: A radical new approach to IoT
Digital Twin: A radical new approach to IoTDigital Twin: A radical new approach to IoT
Digital Twin: A radical new approach to IoT
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
Scaling Legacy
Scaling LegacyScaling Legacy
Scaling Legacy
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
DataAquitaine February 2022
DataAquitaine February 2022DataAquitaine February 2022
DataAquitaine February 2022
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 
Microsoft Graph: The API for Microsoft 365
Microsoft Graph: The API for Microsoft 365Microsoft Graph: The API for Microsoft 365
Microsoft Graph: The API for Microsoft 365
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 

Último

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 

Último (20)

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 

Architecting for change: LinkedIn's new data ecosystem