SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
How Lucene Powers LinkedIn
Segmentation & Targeting Platform
Lucene/SOLR Revolution EU, November 2013
Hien Luu, Raj Rangaswamy
©2013 LinkedIn Corporation. All Rights Reserved.
About Us
*

Hien	
  Luu	
  

Rajasekaran	
  
Rangaswamy	
  
Agenda
§  Little bit about LinkedIn
§  Segmentation & Targeting Platform Overview
§  How Lucene powers Segmentation & Targeting
Platform
§  Q&A

©2013 LinkedIn Corporation. All Rights Reserved.
Our Mission
Connect the world’s professionals to make them
more productive and successful.

Our Vision
Create economic opportunity for every
professional in the world.

Members First!
The world’s largest professional network
Over 65% of members are now international

	
  
>30M
	
  
>90%

Fortune	
  100	
  Companies	
  	
  
use	
  LinkedIn	
  Talent	
  Soln	
  to	
  hire	
  

>3M	
  
Company	
  Pages	
  

	
  

	
  
19

Languages	
  

	
  

>5.7B	
  
Professional	
  searches	
  in	
  2012	
  

	
  
©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
•  Headquartered	
  in	
  Mountain	
  View,	
  Calif.,	
  with	
  offices	
  around	
  the	
  world!
•  LinkedIn	
  has	
  ~4200	
  full-­‐Kme	
  employees	
  located	
  around	
  the	
  world	
  
*
	
  

Source :
http://press.linkedin.com/about
SegmentaKon	
  &	
  TargeKng	
  

©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting

Bhaskar Ghosh

Attribute types
Segmentation & Targeting
1. Create attributes
§ 
§ 
§ 
§ 
§ 

Name
Email
State
Occupation
Etc.

2. Attributes Added to Table
Name	
  

Email	
  

State	
  

OccupaEon	
  

John	
  Smith	
  

jsmith@blah.com	
  

California	
  

Engineer	
  

Jane	
  Smith	
  

smithj@mail.com	
  

Nevada	
  

HR	
  Manager	
  

Jane	
  Doe	
  

jdoe@email.com	
  

California	
  

…	
  

Engineer	
  

3. Create Target Segment:
California, Engineer
Name	
  

Email	
  

State	
  

OccupaEon	
  

John	
  Smith	
  

jsmith@blah.com	
  

California	
  

Engineer	
  

Jane	
  Doe	
  

jdoe@email.com	
  

California	
  

4. Export List & Send Vendor

Engineer	
  

LinkedIn Confidential ©2013 All Rights Reserved

10	
  
Segmentation & Targeting

§  Business definition
–  Business would like to launch new campaign
often
–  Business would like to specify targeting criteria
using arbitrary set of attributes
–  Attributes need to be computed to fulfill the
targeting criteria
–  The attribute data resides on Hadoop or TD
–  Business is most comfortable with SQL-like
language
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting

Attribute
Computation
Engine

©2013 LinkedIn Corporation. All Rights Reserved.

Attribute
Serving
Engine
Segmentation & Targeting
Attribute
consolidation

Self-service

Attribute
Computation
Engine

Support various
data sources
©2013 LinkedIn Corporation. All Rights Reserved.

Attribute
availability
Segmentation & Targeting
PB

Attribute computation
~238M
TB

TB

~440

©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Build
segments

Self-service

Attribute
Serving
Engine

Attribute predicate
expression
©2013 LinkedIn Corporation. All Rights Reserved.

Build lists
Segmentation & Targeting
count

filter
$

1234

complex
sum expressions

Σ

Serving Engine
~238M

~440
LinkedIn Member Attribute table

©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform
Who are the job seekers?

Who are the LinkedIn Talent Solution prospects
in Europe?

Who are north American recruiters that
don’t work for a competitor?

©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn Segmentation & Targeting Platform

Complex tree-like attribute predicate expressions

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture

§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Architecture

Attribute
Serving
Engine

Attribute
Computation
Engine

Data
Storage
Layer
©2013 LinkedIn Corporation. All Rights Reserved.

Attribute
Indexing

Attribute
Creation
Engine

Attribute
Serving
Engine

Attribute
Materialization
Engine

Attribute
Metastore
Indexer
Mapper
mysql
attribute
store

Avro data in
HDFS

Attribute
Definitions
HDFS

Hadoop
Indexer MR

shard 1

shard 2

Index Merger
shard n

K=> AvroKey<GenericRecord>
V=> AvroValue<NullWritable>

Reducer
K=> NullWritable
V=> LuceneDocumentWrapper

LuceneOutputFormat
RecordWriter
LuceneDocumentWrapper
Document

Web Servers

Index
©2013 LinkedIn Corporation. All Rights Reserved.
Serving
JSON Predicate
Expression

JSON Lucene
Query Parser

Inverted
Index
©2013 LinkedIn Corporation. All Rights Reserved.

Inverted
Index

Segment &
List

Inverted
Index
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Serving – Load Balanced Model
HTTP Request

Load Balancer

Web Server 1

Shard 1

Web Server 2

Shard 2

Shared Drive
©2013 LinkedIn Corporation. All Rights Reserved.

Web Server n

Shard n
Serving – Load Balanced Model

But Wait…..
•  Is load balancing alone good enough?
•  What about distribution and failover?

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Next Steps - Distributed Model

•  A generic cluster management framework
•  Used to manage partitioned and replicated resources in
distributed systems
•  Built on top of Zookeeper that hides the complexity of ZK
primitives
•  Provides distributed features such as leader election, twophase commit etc. via a model of state machine
http://helix.incubator.apache.org/
©2013 LinkedIn Corporation. All Rights Reserved.
Next Steps - Distributed Model
HTTP Request

Load Balancer

Scatter Gather

Web Server 1

Web Server 2

Web Server 3

Shard
1

active

Shard
2

active

Shard
3

active

Shard
2

standby

Shard
3

standby

Shard
1

standby

©2013 LinkedIn Corporation. All Rights Reserved.
Next Steps - Distributed Model
HTTP Request

Load Balancer

Scatter Gather

Web Server 1

Web Server 2

Web Server 3

Shard
1

active

Shard
2

active

Shard
3

failure

Shard
2

standby

Shard
3

active

Shard
1

failure

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues – Use Case
•  Once segments are built, users want to forecast, see a
target revenue projection for the campaigns that they want
to run.
•  Campaigns can be run on various Revenue Models
•  This involves adding per member Propensity Scores and
Dollar Amounts

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues – Why not Stored Fields?
Why not use Stored Fields?

Document ID

•  Stored fields have one indirection
per document resulting in two disk
seeks per document

.fdx

fetch filepointer to field data

.fdt

scan by id until field is found

•  Performance cost quickly adds up
when fetching millions of documents

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues – Why not Field Cache?
Why not use Field Cache?
•  Is memory resident
•  Works fine when there is enough memory
•  But keeping millions of un-inverted values in memory is impossible
•  Additional cost to parse values (from String and to String)

©2013 LinkedIn Corporation. All Rights Reserved.
DocValues
•  Dense column based storage (1 Value per Document and 1 Column
per field and segment)
•  Accepts primitives
•  No conversion from/to String needed
•  Loads 80x-100x faster than building a FieldCache
•  All the work is done during Indexing
•  DocValue fields can be indexed and stored too

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Lessons Learnt
Indexing
•  Reuse index writers, field and document instances
•  Create many partitions and Merge them in a different process
•  Rebuild (bootstrap) entire index if possible
•  Use partial updates with caution
•  Analyze the index
Serving
•  Reuse a single instance of IndexSearcher
•  Limit usage of stored fields and term vectors
•  Plan for load balancing and failover
•  Cache term frequencies
•  Use different machines for Serving and indexing

©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
§  Architecture
–  Indexer Architecture
–  Serving Architecture
§  Load Balanced Model
§  Next Steps - Distributed Model
§  DocValues
§  Lessons Learnt
§  Why not use an existing solution?

©2013 LinkedIn Corporation. All Rights Reserved.
Why not use an existing solution?
•  Doesn’t allow dynamic schema
•  Difficult to bootstrap indexes built in
hadoop
•  Indexing elevates query latency

•  Doesn’t allow dynamic schema
•  Difficult to bootstrap indexes built in
hadoop
•  Larger memory overhead
•  Comparatively slow

©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com

©2013 LinkedIn Corporation. All Rights Reserved.

Más contenido relacionado

La actualidad más candente

Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Foreign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with PostgresForeign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with PostgresEDB
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneInnovative Management Services
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
OpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewOpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewKingsley Uyi Idehen
 
Virtuoso Universal Server Overview
Virtuoso Universal Server OverviewVirtuoso Universal Server Overview
Virtuoso Universal Server Overviewrumito
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIAndrew Brust
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Alex Gorbachev
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkVimal Sharma
 

La actualidad más candente (20)

Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Foreign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with PostgresForeign Data Wrappers and You with Postgres
Foreign Data Wrappers and You with Postgres
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
 
Big data course
Big data  courseBig data  course
Big data course
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
OpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewOpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers Overview
 
Virtuoso Universal Server Overview
Virtuoso Universal Server OverviewVirtuoso Universal Server Overview
Virtuoso Universal Server Overview
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BI
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas Talk
 

Similar a How Lucene Powers the LinkedIn Segmentation and Targeting Platform

How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHien Luu
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to PostgresEDB
 
The Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesThe Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesEDB
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresEDB
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldMaria Colgan
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQServiceRocket
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationEmbarcadero Technologies
 
SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)Alan Eardley
 
SharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptSharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptJohn Mongell
 
LinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformLinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformHien Luu
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)Sid Anand
 
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationDataWorks Summit
 
Introduction to Active Directory
Introduction to Active DirectoryIntroduction to Active Directory
Introduction to Active DirectoryJalpesh Vadgama
 
(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020Markus Michalewicz
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?Nicolas Georgeault
 
#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?Tammy Bednar
 
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...Chris Muir
 

Similar a How Lucene Powers the LinkedIn Segmentation and Targeting Platform (20)

How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
The Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesThe Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle Databases
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to Postgres
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQ
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)SharePoint Databases: What you need to know (201509)
SharePoint Databases: What you need to know (201509)
 
SharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the CryptSharePoint Migrations Pitfalls from the Crypt
SharePoint Migrations Pitfalls from the Crypt
 
LinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformLinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting Platform
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data Application
 
Introduction to Active Directory
Introduction to Active DirectoryIntroduction to Active Directory
Introduction to Active Directory
 
(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020(Oracle) DBA and Other Skills Needed in 2020
(Oracle) DBA and Other Skills Needed in 2020
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?
 
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...Oracle ADF Architecture TV -  Planning & Getting Started - Team, Skills and D...
Oracle ADF Architecture TV - Planning & Getting Started - Team, Skills and D...
 
Muruga logeswaran CV-Senior .Net Developer
Muruga logeswaran CV-Senior .Net DeveloperMuruga logeswaran CV-Senior .Net Developer
Muruga logeswaran CV-Senior .Net Developer
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

How Lucene Powers the LinkedIn Segmentation and Targeting Platform

  • 1. How Lucene Powers LinkedIn Segmentation & Targeting Platform Lucene/SOLR Revolution EU, November 2013 Hien Luu, Raj Rangaswamy ©2013 LinkedIn Corporation. All Rights Reserved.
  • 2. About Us * Hien  Luu   Rajasekaran   Rangaswamy  
  • 3. Agenda §  Little bit about LinkedIn §  Segmentation & Targeting Platform Overview §  How Lucene powers Segmentation & Targeting Platform §  Q&A ©2013 LinkedIn Corporation. All Rights Reserved.
  • 4. Our Mission Connect the world’s professionals to make them more productive and successful. Our Vision Create economic opportunity for every professional in the world. Members First!
  • 5. The world’s largest professional network Over 65% of members are now international   >30M   >90% Fortune  100  Companies     use  LinkedIn  Talent  Soln  to  hire   >3M   Company  Pages       19 Languages     >5.7B   Professional  searches  in  2012     ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Other Company Facts •  Headquartered  in  Mountain  View,  Calif.,  with  offices  around  the  world! •  LinkedIn  has  ~4200  full-­‐Kme  employees  located  around  the  world   *   Source : http://press.linkedin.com/about
  • 7. SegmentaKon  &  TargeKng   ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. Segmentation & Targeting Bhaskar Ghosh Attribute types
  • 10. Segmentation & Targeting 1. Create attributes §  §  §  §  §  Name Email State Occupation Etc. 2. Attributes Added to Table Name   Email   State   OccupaEon   John  Smith   jsmith@blah.com   California   Engineer   Jane  Smith   smithj@mail.com   Nevada   HR  Manager   Jane  Doe   jdoe@email.com   California   …   Engineer   3. Create Target Segment: California, Engineer Name   Email   State   OccupaEon   John  Smith   jsmith@blah.com   California   Engineer   Jane  Doe   jdoe@email.com   California   4. Export List & Send Vendor Engineer   LinkedIn Confidential ©2013 All Rights Reserved 10  
  • 11. Segmentation & Targeting §  Business definition –  Business would like to launch new campaign often –  Business would like to specify targeting criteria using arbitrary set of attributes –  Attributes need to be computed to fulfill the targeting criteria –  The attribute data resides on Hadoop or TD –  Business is most comfortable with SQL-like language ©2013 LinkedIn Corporation. All Rights Reserved.
  • 12. Segmentation & Targeting Attribute Computation Engine ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine
  • 13. Segmentation & Targeting Attribute consolidation Self-service Attribute Computation Engine Support various data sources ©2013 LinkedIn Corporation. All Rights Reserved. Attribute availability
  • 14. Segmentation & Targeting PB Attribute computation ~238M TB TB ~440 ©2013 LinkedIn Corporation. All Rights Reserved.
  • 15. Segmentation & Targeting Build segments Self-service Attribute Serving Engine Attribute predicate expression ©2013 LinkedIn Corporation. All Rights Reserved. Build lists
  • 16. Segmentation & Targeting count filter $ 1234 complex sum expressions Σ Serving Engine ~238M ~440 LinkedIn Member Attribute table ©2013 LinkedIn Corporation. All Rights Reserved.
  • 17. LinkedIn Segmentation & Targeting Platform Who are the job seekers? Who are the LinkedIn Talent Solution prospects in Europe? Who are north American recruiters that don’t work for a competitor? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 18. LinkedIn Segmentation & Targeting Platform Complex tree-like attribute predicate expressions ©2013 LinkedIn Corporation. All Rights Reserved.
  • 19. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 20. Architecture Attribute Serving Engine Attribute Computation Engine Data Storage Layer ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Indexing Attribute Creation Engine Attribute Serving Engine Attribute Materialization Engine Attribute Metastore
  • 21. Indexer Mapper mysql attribute store Avro data in HDFS Attribute Definitions HDFS Hadoop Indexer MR shard 1 shard 2 Index Merger shard n K=> AvroKey<GenericRecord> V=> AvroValue<NullWritable> Reducer K=> NullWritable V=> LuceneDocumentWrapper LuceneOutputFormat RecordWriter LuceneDocumentWrapper Document Web Servers Index ©2013 LinkedIn Corporation. All Rights Reserved.
  • 22. Serving JSON Predicate Expression JSON Lucene Query Parser Inverted Index ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Segment & List Inverted Index
  • 23. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 24. Serving – Load Balanced Model HTTP Request Load Balancer Web Server 1 Shard 1 Web Server 2 Shard 2 Shared Drive ©2013 LinkedIn Corporation. All Rights Reserved. Web Server n Shard n
  • 25. Serving – Load Balanced Model But Wait….. •  Is load balancing alone good enough? •  What about distribution and failover? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 26. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 27. Next Steps - Distributed Model •  A generic cluster management framework •  Used to manage partitioned and replicated resources in distributed systems •  Built on top of Zookeeper that hides the complexity of ZK primitives •  Provides distributed features such as leader election, twophase commit etc. via a model of state machine http://helix.incubator.apache.org/ ©2013 LinkedIn Corporation. All Rights Reserved.
  • 28. Next Steps - Distributed Model HTTP Request Load Balancer Scatter Gather Web Server 1 Web Server 2 Web Server 3 Shard 1 active Shard 2 active Shard 3 active Shard 2 standby Shard 3 standby Shard 1 standby ©2013 LinkedIn Corporation. All Rights Reserved.
  • 29. Next Steps - Distributed Model HTTP Request Load Balancer Scatter Gather Web Server 1 Web Server 2 Web Server 3 Shard 1 active Shard 2 active Shard 3 failure Shard 2 standby Shard 3 active Shard 1 failure ©2013 LinkedIn Corporation. All Rights Reserved.
  • 30. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 31. DocValues – Use Case •  Once segments are built, users want to forecast, see a target revenue projection for the campaigns that they want to run. •  Campaigns can be run on various Revenue Models •  This involves adding per member Propensity Scores and Dollar Amounts ©2013 LinkedIn Corporation. All Rights Reserved.
  • 32. DocValues – Why not Stored Fields? Why not use Stored Fields? Document ID •  Stored fields have one indirection per document resulting in two disk seeks per document .fdx fetch filepointer to field data .fdt scan by id until field is found •  Performance cost quickly adds up when fetching millions of documents ©2013 LinkedIn Corporation. All Rights Reserved.
  • 33. DocValues – Why not Field Cache? Why not use Field Cache? •  Is memory resident •  Works fine when there is enough memory •  But keeping millions of un-inverted values in memory is impossible •  Additional cost to parse values (from String and to String) ©2013 LinkedIn Corporation. All Rights Reserved.
  • 34. DocValues •  Dense column based storage (1 Value per Document and 1 Column per field and segment) •  Accepts primitives •  No conversion from/to String needed •  Loads 80x-100x faster than building a FieldCache •  All the work is done during Indexing •  DocValue fields can be indexed and stored too ©2013 LinkedIn Corporation. All Rights Reserved.
  • 35. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 36. Lessons Learnt Indexing •  Reuse index writers, field and document instances •  Create many partitions and Merge them in a different process •  Rebuild (bootstrap) entire index if possible •  Use partial updates with caution •  Analyze the index Serving •  Reuse a single instance of IndexSearcher •  Limit usage of stored fields and term vectors •  Plan for load balancing and failover •  Cache term frequencies •  Use different machines for Serving and indexing ©2013 LinkedIn Corporation. All Rights Reserved.
  • 37. Agenda §  Architecture –  Indexer Architecture –  Serving Architecture §  Load Balanced Model §  Next Steps - Distributed Model §  DocValues §  Lessons Learnt §  Why not use an existing solution? ©2013 LinkedIn Corporation. All Rights Reserved.
  • 38. Why not use an existing solution? •  Doesn’t allow dynamic schema •  Difficult to bootstrap indexes built in hadoop •  Indexing elevates query latency •  Doesn’t allow dynamic schema •  Difficult to bootstrap indexes built in hadoop •  Larger memory overhead •  Comparatively slow ©2013 LinkedIn Corporation. All Rights Reserved.
  • 39. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.