SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
2018.05.17
Real-time Access-log
Analysis @ BlaBlaCar
@DataXDay
Paris
@BlaBlaCar since 2017
@ThomasLamirault
Thomas Lamirault
Software Architect Data
@DataXDay
@DataXDay
Over 60 million members and growing
1.5 million
joining each month
@DataXDay
We are in
22 countries
@DataXDay
Use cases
Global solution
Components details
Concrete example
Today’s
agenda
@DataXDay
Use cases
@DataXDay
Security
Enables us to identify
crawling of our web site
Attempted hack
Product simplification
Statistics of all our
endpoint usage
Simplify migration
API
Help our partner to use
our API in a good way.
Use cases
@DataXDay
Global solution
@DataXDay
Global solution
NGINX Flink
Kafka
Data Lake
@DataXDay
Components detail
@DataXDay
Components detail
NGINX
Hindsight - Lua
Flink
Kafka
Data Lake
Schema registry
Kafka connect
@DataXDay
Flink
Standalone mode
Apply regex on free
format text
Build dynamically
schema and push to
schema registry &
kafka
Containerized
(Rkt/Fleet, k8s
migration on going)
Serialize into avro
format
@DataXDay
Flink
// Consumer
FlinkKafkaConsumer010 <String> consumer =
new FlinkKafkaConsumer010 <>(
parameterTool.getRequired("topic") + topicVersion,
new SimpleStringSchema (), consProps);
DataStream<String> messageStream =
env.addSource(consumer).setParallelism (
Integer.parseInt(customParamTool .getRequired(
"consumer_parallelism" ), 10)).name("kafka_consumer" );
@DataXDay
Flink
// FlatMap
DataStream<byte[]> stream =
messageStream.flatMap(new Regex2AvroFunction()).setParallelism(
Integer.parseInt(customParamTool.getRequired("flatmap_parallelism"),
10)).name("avroserializer");
// Producer
FlinkKafkaProducer010<byte[]> producer = new FlinkKafkaProducer010<>(
parameterTool.getRequired("bootstrap.servers"),
parameterTool.getRequired("target_topic"),
new SerializationSchema<byte[]>() {
[...]
@DataXDay
Schema Registry
→ Kafka message represented by a key and a value
→ Schema registry will have two schemas for a topic
- One for the key
- One for the value
→ A schema is represented by a Json
@DataXDay
Schema Registry - API
GET : http://schema-registry/subjects
["accesslogs_avro-value","accesslogs_avro-key"]
GET :
http://schema-registry/subjects/accesslogs_avro-value/versions
[1,2,3,4]
@DataXDay
Schema Registry - API
GET : http://schema/subjects/accesslogs_avro-value/versions/4
{"subject":"accesslogs_avro-value","version":4,"id":181,"schema":"
{"type":"record","name":"accesslog","fields":
[
{"name":"fieldname1","type":"string"},
{"name":"fieldname2","type":"int"}}
]
}
"}
@DataXDay
Concrete example
@DataXDay
Concrete example
@DataXDay
Concrete example
@DataXDay
Concrete example
@DataXDay
Concrete example
Help identify crawlers
Fast identification of
bugs / bad behaviors
of a new release
Detect bad API
usages
Help us to integrate
new functionalities
@DataXDay
DataXDay - Real-Time Access log analysis

Más contenido relacionado

La actualidad más candente

Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
confluent
 

La actualidad más candente (19)

Scalable Application Development @ Picnic
Scalable Application Development @ PicnicScalable Application Development @ Picnic
Scalable Application Development @ Picnic
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
The Future of Data Pipelines
The Future of Data PipelinesThe Future of Data Pipelines
The Future of Data Pipelines
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
CCT Check and Calculate Transfer
CCT Check and Calculate TransferCCT Check and Calculate Transfer
CCT Check and Calculate Transfer
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
 
Visualizing large datasets with elasticsearch and kibana
Visualizing large datasets with elasticsearch and kibanaVisualizing large datasets with elasticsearch and kibana
Visualizing large datasets with elasticsearch and kibana
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
 
Mongodb Spring
Mongodb SpringMongodb Spring
Mongodb Spring
 
Leader in Cloud and Object Storage for Service Providers
Leader in Cloud and Object Storage for Service ProvidersLeader in Cloud and Object Storage for Service Providers
Leader in Cloud and Object Storage for Service Providers
 
Elastic @ John Deere
Elastic @ John DeereElastic @ John Deere
Elastic @ John Deere
 
2017 Hackathon Scality & 42 School
2017 Hackathon Scality & 42 School2017 Hackathon Scality & 42 School
2017 Hackathon Scality & 42 School
 
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint MonitoringGain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
 

Similar a DataXDay - Real-Time Access log analysis

28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not YearsReplatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
VMware Tanzu
 

Similar a DataXDay - Real-Time Access log analysis (20)

Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
 
The Role of Blockchain in Enterprise Commerce and Product Content Management
The Role of Blockchain in Enterprise Commerce and Product Content ManagementThe Role of Blockchain in Enterprise Commerce and Product Content Management
The Role of Blockchain in Enterprise Commerce and Product Content Management
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
LIVE DEMO: Big Data Suite
LIVE DEMO: Big Data SuiteLIVE DEMO: Big Data Suite
LIVE DEMO: Big Data Suite
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsNetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL Tool
 
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not YearsReplatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
 
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
 
LIVE DEMO: Big Data Suite
LIVE DEMO: Big Data SuiteLIVE DEMO: Big Data Suite
LIVE DEMO: Big Data Suite
 

Más de DataXDay Conference by Xebia

Más de DataXDay Conference by Xebia (6)

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
DataXDay - A data scientist journey to industrialization of machine learning
DataXDay - A data scientist journey to industrialization of machine learning DataXDay - A data scientist journey to industrialization of machine learning
DataXDay - A data scientist journey to industrialization of machine learning
 
DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML
 
DataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at Scale
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

DataXDay - Real-Time Access log analysis