SlideShare una empresa de Scribd logo
1 de 28
Fight against robots with
enbrite.ly data platform
Joe MÉSZÁROS
Joe MÉSZÁROS
lead software engineer
@joemesz
joemeszaros
Who we are?
Our vision is to revolutionize the KPIs and metrics the online
advertisement industry currently using. With our products,
Antifraud, Brandsafety and Viewability we provide actionable
data to our customers.
Ad display fraud
(ad stacking, pixel stuffing)
Ad viewability
Brand safety
Detecting traffic that comes from unwanted
categories (e.g. adult), countries and single domains
39%
39%Anti fraud detection
DATA
COLLECTION
ANALYZE
DATA PROCESSION
ANTI FRAUD
VIEWABILITY
BRAND SAFETY
REPORT + API
What we do?
How we do? DATA PLATFORM
...so we need do analyze vast amount of data
Infrastucture Big Data
technologies
+
enbrite.ly
data
platform
=
Amazon Web Services (AWS)
● Most popular cloud service provider
● ~70 services, 13 geographical "regions"
● Amazon Big Data = Elastic Map Reduce
● BUT Do not trust the BIG guy (API problem)
https://aws.amazon.com/
Apache Hadoop
● de facto Big Data technology
● open source software
● distributed storage (HDFS) + data processing
(MapReduce)
● ecosystem: many additional softwares
http://hadoop.apache.org/ | https://github.com/apache/hadoop
Apache Spark
● large-scale data processing engine
● open source software (popular)
● modules: core, sql, sreaming, graph, ML
● faster than Hadoop MapReduce
http://spark.apache.org/ | https://github.com/apache/spark
Data platform in numbers
20+ node cluster
16 services 110 servers
0.5 - 4 TB /day
100+ TB on S3
How we do? DATA COLLECTION
How we do? DATA PROCESSION
Let me tell you a short story...
Real world example
You have a simple idea to detect bot traffic, which saves
the world. Let’s implement it!
Real world example
THE IDEA: Analyse events which are too hasty and deviate from
regular, humanlike profiles: too many clicks in a defined
timeframe.
INPUT: Collected events on Amazon S3
OUTPUT: Invalid sessions
Step 1: sessionize events
How to solve it?
Step 2: detect too many clicks
code: https://github.com/enbritely/startup-safary
Step 1: event to session
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
Application code : https://github.com/enbritely/startup-safary
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
JavaRDD<Event> clicks = events.filter(e ->
e.type.equals("click"));
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
JavaRDD<Event> clicks = events.filter(e ->
e.type.equals("click"));
JavaPairRDD<String, List<Event>> grouped = clicks
.groupBy(Event::sessionId);
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
JavaRDD<Event> clicks = events.filter(e ->
e.type.equals("click"));
JavaPairRDD<String, List<Event>> grouped = clicks
.groupBy(Event::sessionId);
JavaRDD<Session> sessions = grouped.mapValues(sessionizer);
Step 1: event to session
//Sessionizer
(Function<Iterable<Event>, Session>) unorderedEvents -> {
List<Event> clickOrdered = sortyByTimestamp(unorderedEvents);
Session session = new Session(sessionId);
for (Event event: clickOrdered) {
session.addClick(event.getTimestamp());
}
return session;
}
Application code : https://github.com/enbritely/startup-safary
Step 2: apply heuristic
Application code : https://github.com/enbritely/startup-safary
JavaRDD<String> badSessions = sessions
.filter(s -> s.getClickCount() > threshold)
.map(s -> s.sessionId + ":" + s.clickCount);
// save output to HDFS
Live demo!
● 4 node EMR (Hadoop) Cluster
● Apache Spark 1.6.1
● 1 GB input events
build app : create-cluster : events S3 -> HDFS : submit app
Congratulation!
MISSION COMPLETED
YOU just saved the world with a
simple idea within ~10 minutes.
WE ARE HIRING!
working @exPrezi office, K9
check out the company in Forbes :-)
amazing company culture
BUT the real reason ….
WE ARE HIRING!
… is our mood manager, Bigyó :)
BEYOND enbrite.ly
...our investor and event sponsor
is looking for talented guys
Joe MÉSZÁROS
lead software engineer
joe@enbrite.ly
@joemesz
@enbritely
joemeszaros
enbritely
THANK YOU!
QUESTIONS?

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

DevOOPS: Attacks and Defenses for DevOps Toolchains
DevOOPS: Attacks and Defenses for DevOps ToolchainsDevOOPS: Attacks and Defenses for DevOps Toolchains
DevOOPS: Attacks and Defenses for DevOps Toolchains
 
AWS Survival Guide
AWS Survival GuideAWS Survival Guide
AWS Survival Guide
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга Свиридова
 
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
 
HAProxy as Egress Controller
HAProxy as Egress ControllerHAProxy as Egress Controller
HAProxy as Egress Controller
 
ruxc0n 2012
ruxc0n 2012ruxc0n 2012
ruxc0n 2012
 
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
 
HashiCorp Vault Workshop:幫 Credentials 找個窩
HashiCorp Vault Workshop:幫 Credentials 找個窩HashiCorp Vault Workshop:幫 Credentials 找個窩
HashiCorp Vault Workshop:幫 Credentials 找個窩
 
Dynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningDynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency Planning
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
 
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
 
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 201910 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Incident Resolution as Code
Incident Resolution as CodeIncident Resolution as Code
Incident Resolution as Code
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations Center
 
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
 
JavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
JavaOne India 2011 - Running your Java EE 6 Apps in the CloudJavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
JavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
 
Managing secrets at scale
Managing secrets at scaleManaging secrets at scale
Managing secrets at scale
 
You wouldn't build a toast, would you?
You wouldn't build a toast, would you?You wouldn't build a toast, would you?
You wouldn't build a toast, would you?
 

Destacado

Destacado (20)

Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.ly
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
The Drift Brand Book
The Drift Brand BookThe Drift Brand Book
The Drift Brand Book
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
BigWeatherGear Group and Corporate Services Brochure 2013
BigWeatherGear Group and Corporate Services Brochure 2013BigWeatherGear Group and Corporate Services Brochure 2013
BigWeatherGear Group and Corporate Services Brochure 2013
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Privacy is for losers 2016
Privacy is for losers 2016Privacy is for losers 2016
Privacy is for losers 2016
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
UX at York: starting small and scaling up (#nclxux)
UX at York: starting small and scaling up (#nclxux)UX at York: starting small and scaling up (#nclxux)
UX at York: starting small and scaling up (#nclxux)
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
 

Similar a Startup Safary | Fight against robots with enbrite.ly data platform

Web APIs & Apps - Mozilla
Web APIs & Apps - MozillaWeb APIs & Apps - Mozilla
Web APIs & Apps - Mozilla
Robert Nyman
 
NSA for Enterprises Log Analysis Use Cases
NSA for Enterprises   Log Analysis Use Cases NSA for Enterprises   Log Analysis Use Cases
NSA for Enterprises Log Analysis Use Cases
WSO2
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
Yael Garten
 
AnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior CreativelyAnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior Creatively
Kiana Tennyson
 

Similar a Startup Safary | Fight against robots with enbrite.ly data platform (20)

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Web APIs & Apps - Mozilla
Web APIs & Apps - MozillaWeb APIs & Apps - Mozilla
Web APIs & Apps - Mozilla
 
NSA for Enterprises Log Analysis Use Cases
NSA for Enterprises   Log Analysis Use Cases NSA for Enterprises   Log Analysis Use Cases
NSA for Enterprises Log Analysis Use Cases
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Honeypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinHoneypots, Deception, and Frankenstein
Honeypots, Deception, and Frankenstein
 
Serverless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersServerless Swift for Mobile Developers
Serverless Swift for Mobile Developers
 
Bootstrapping an App for Launch
Bootstrapping an App for LaunchBootstrapping an App for Launch
Bootstrapping an App for Launch
 
APIsecure 2023 - Learning from a decade of API breaches and why application-c...
APIsecure 2023 - Learning from a decade of API breaches and why application-c...APIsecure 2023 - Learning from a decade of API breaches and why application-c...
APIsecure 2023 - Learning from a decade of API breaches and why application-c...
 
Outsmarting smartphones
Outsmarting smartphonesOutsmarting smartphones
Outsmarting smartphones
 
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
 
CQRS and Event Sourcing
CQRS and Event Sourcing CQRS and Event Sourcing
CQRS and Event Sourcing
 
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
 
So You Want a Job in Cybersecurity
So You Want a Job in CybersecuritySo You Want a Job in Cybersecurity
So You Want a Job in Cybersecurity
 
cybersecurity-careers.pdf
cybersecurity-careers.pdfcybersecurity-careers.pdf
cybersecurity-careers.pdf
 
2016 IBM Watson IoT Forum
2016 IBM Watson IoT Forum2016 IBM Watson IoT Forum
2016 IBM Watson IoT Forum
 
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
 
AnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior CreativelyAnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior Creatively
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
 

Último

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Último (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 

Startup Safary | Fight against robots with enbrite.ly data platform

  • 1. Fight against robots with enbrite.ly data platform Joe MÉSZÁROS
  • 2. Joe MÉSZÁROS lead software engineer @joemesz joemeszaros
  • 3. Who we are? Our vision is to revolutionize the KPIs and metrics the online advertisement industry currently using. With our products, Antifraud, Brandsafety and Viewability we provide actionable data to our customers.
  • 4. Ad display fraud (ad stacking, pixel stuffing) Ad viewability
  • 5. Brand safety Detecting traffic that comes from unwanted categories (e.g. adult), countries and single domains
  • 7.
  • 9. How we do? DATA PLATFORM ...so we need do analyze vast amount of data Infrastucture Big Data technologies + enbrite.ly data platform =
  • 10. Amazon Web Services (AWS) ● Most popular cloud service provider ● ~70 services, 13 geographical "regions" ● Amazon Big Data = Elastic Map Reduce ● BUT Do not trust the BIG guy (API problem) https://aws.amazon.com/
  • 11. Apache Hadoop ● de facto Big Data technology ● open source software ● distributed storage (HDFS) + data processing (MapReduce) ● ecosystem: many additional softwares http://hadoop.apache.org/ | https://github.com/apache/hadoop
  • 12. Apache Spark ● large-scale data processing engine ● open source software (popular) ● modules: core, sql, sreaming, graph, ML ● faster than Hadoop MapReduce http://spark.apache.org/ | https://github.com/apache/spark
  • 13. Data platform in numbers 20+ node cluster 16 services 110 servers 0.5 - 4 TB /day 100+ TB on S3
  • 14. How we do? DATA COLLECTION
  • 15. How we do? DATA PROCESSION
  • 16. Let me tell you a short story...
  • 17. Real world example You have a simple idea to detect bot traffic, which saves the world. Let’s implement it!
  • 18. Real world example THE IDEA: Analyse events which are too hasty and deviate from regular, humanlike profiles: too many clicks in a defined timeframe. INPUT: Collected events on Amazon S3 OUTPUT: Invalid sessions
  • 19. Step 1: sessionize events How to solve it? Step 2: detect too many clicks code: https://github.com/enbritely/startup-safary
  • 20. Step 1: event to session //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); Application code : https://github.com/enbritely/startup-safary //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); JavaRDD<Event> clicks = events.filter(e -> e.type.equals("click")); //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); JavaRDD<Event> clicks = events.filter(e -> e.type.equals("click")); JavaPairRDD<String, List<Event>> grouped = clicks .groupBy(Event::sessionId); //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); JavaRDD<Event> clicks = events.filter(e -> e.type.equals("click")); JavaPairRDD<String, List<Event>> grouped = clicks .groupBy(Event::sessionId); JavaRDD<Session> sessions = grouped.mapValues(sessionizer);
  • 21. Step 1: event to session //Sessionizer (Function<Iterable<Event>, Session>) unorderedEvents -> { List<Event> clickOrdered = sortyByTimestamp(unorderedEvents); Session session = new Session(sessionId); for (Event event: clickOrdered) { session.addClick(event.getTimestamp()); } return session; } Application code : https://github.com/enbritely/startup-safary
  • 22. Step 2: apply heuristic Application code : https://github.com/enbritely/startup-safary JavaRDD<String> badSessions = sessions .filter(s -> s.getClickCount() > threshold) .map(s -> s.sessionId + ":" + s.clickCount); // save output to HDFS
  • 23. Live demo! ● 4 node EMR (Hadoop) Cluster ● Apache Spark 1.6.1 ● 1 GB input events build app : create-cluster : events S3 -> HDFS : submit app
  • 24. Congratulation! MISSION COMPLETED YOU just saved the world with a simple idea within ~10 minutes.
  • 25. WE ARE HIRING! working @exPrezi office, K9 check out the company in Forbes :-) amazing company culture BUT the real reason ….
  • 26. WE ARE HIRING! … is our mood manager, Bigyó :)
  • 27. BEYOND enbrite.ly ...our investor and event sponsor is looking for talented guys
  • 28. Joe MÉSZÁROS lead software engineer joe@enbrite.ly @joemesz @enbritely joemeszaros enbritely THANK YOU! QUESTIONS?