SlideShare una empresa de Scribd logo
1 de 16
Winning the Big Data Spam Challenge


 Erich Nachbar   Stefan Groschupf   Florian Leibert
Spam Types - Email Spam

• What do spammers do? 
  • Many domains
  • Cycle through IPs (TOR, bulk blocks)
  • Bulk account creation (increase IP reputation)
     • Break captchas (Mechanical Turk)
     • Common names
       (e.g. http://www.census.gov/genealogy/names/dist.male.first) 
Spam Types - Social Media

• Spam Carriers
  • Blog Postings
  • Comments
  • Friend Requests
  • ...
• Spam Generation through
  • Actual User Accounts (Hacked / User Virus)
  • Bot Accounts
Spam Types - Social Media

• Differences
   • Detection is the same
   • Account treatment is different (cancel vs. clean)
• 99% of all Spam contains URLs:
   • Ignore text-only messages.
   • Look at the URL not the text.
Spam Types - Web Spam

• Goal: influence search engine results
  • Link farms
  • Keyword, Meta tag stuffing
  • Hidden or invisible unrelated text
  • Scraper sites
  • Spam blogs
Why process Spam in Hadoop?

• Easy to parallelize  
  • Bucketization
     • User
     • Date
     • Source
     • etc.
  • Count models (probabilities) are very "hadoopy"
Why process Spam in Hadoop?

• Large data sets
   • More samples ~ better results
• Algorithms require preprocessing
• Existing code
   • e.g. url-parsers, bayes implementations
   •
Sample System Architecture
Heuristics for Spam Detection

• Easy to compute, Group-By & Count
• Captcha solving rates
• Source IP/Email
   • Historic vs. current volume
   • Reputation
• Link
   • Rrequency, position, ratio 
Heuristics for Spam Detection

• Content
   • Self similarity
   • Hash of media content
   • Keywords
Looking at arrival times


• Inter-arrival times
• Fast Fourier Transform 
   • Timestamps
    -> frequency space
Evaluating content
• Jaccard similarity
   • Bucketize by user / source email
      • s1 = S(x1,x3,x5), s2 = S(x2,x4,x6)
   • Easy with Hadoop
      • map: emit user_id (K), text (V)
      • reduce: 

• Even simpler: 
  • # links / user
  • # complaints, spam tags / user
  • etc.
Solutions - Pay Level Domain

• Requires payment at Top Level Domain
• Simple heuristic
• Much simpler than Trust-Rank, Page-Rank, etc.




  Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov
  IRLbot: Scaling to 6 Billion Pages and Beyond
Demo
Take Aways

• Spam Reports are important
• Rolling real-time Ham & Spam Samples
• Have Knobs to turn (e.g. over JMX)
• Simple solutions can get you pretty far, the easy 80% 
• Spammers adapt very fast, stay agile
• Try to break your own system
Thank you!

erich@quantifind.com, @enachb

sg@datameer.com, @datameer

  flo@leibert.de, @floleibert

Más contenido relacionado

La actualidad más candente

The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked DataJoshua Shinavier
 
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...Anand Hemmige
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php applicationMichele Orselli
 
Austin Day of Rest - Introduction
Austin Day of Rest - IntroductionAustin Day of Rest - Introduction
Austin Day of Rest - IntroductionHandsOnWP.com
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With PythonRobert Dempsey
 
Demand, Media, and Search Analytics at AOL
Demand, Media, and Search Analytics at AOLDemand, Media, and Search Analytics at AOL
Demand, Media, and Search Analytics at AOLSean Timm
 
Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingCynthiaCruz55
 
Intro to web scraping with Python
Intro to web scraping with PythonIntro to web scraping with Python
Intro to web scraping with PythonMaris Lemba
 
On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017BookNet Canada
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamLetsConnect
 
Getting started with Web Scraping in Python
Getting started with Web Scraping in PythonGetting started with Web Scraping in Python
Getting started with Web Scraping in PythonSatwik Kansal
 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?Abhishek Mitra
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python Viren Rajput
 
Web crawler with seo analysis
Web crawler with seo analysis Web crawler with seo analysis
Web crawler with seo analysis Vikram Parmar
 
Almost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingAlmost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingMichelle Minkoff
 
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr..."PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...Stefan Adam
 

La actualidad más candente (19)

The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php application
 
Austin Day of Rest - Introduction
Austin Day of Rest - IntroductionAustin Day of Rest - Introduction
Austin Day of Rest - Introduction
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
Demand, Media, and Search Analytics at AOL
Demand, Media, and Search Analytics at AOLDemand, Media, and Search Analytics at AOL
Demand, Media, and Search Analytics at AOL
 
Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen Scraping
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
Vít Listík - Email.cz workshop
Vít Listík - Email.cz workshopVít Listík - Email.cz workshop
Vít Listík - Email.cz workshop
 
Intro to web scraping with Python
Intro to web scraping with PythonIntro to web scraping with Python
Intro to web scraping with Python
 
On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity Stream
 
Getting started with Web Scraping in Python
Getting started with Web Scraping in PythonGetting started with Web Scraping in Python
Getting started with Web Scraping in Python
 
What is a Robot txt file?
What is a Robot txt file?What is a Robot txt file?
What is a Robot txt file?
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
 
Web crawler with seo analysis
Web crawler with seo analysis Web crawler with seo analysis
Web crawler with seo analysis
 
Almost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingAlmost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without Programming
 
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr..."PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
 

Destacado

Final paper
Final paperFinal paper
Final paperJDonpfd3
 
Link Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalLink Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalCarlos Castillo (ChaTo)
 
Webspam (English Version)
Webspam (English Version)Webspam (English Version)
Webspam (English Version)Dirk Haun
 
ASSP: Extracting the Ham from Spam -- by David J. Young
ASSP: Extracting the Ham from Spam -- by David J. YoungASSP: Extracting the Ham from Spam -- by David J. Young
ASSP: Extracting the Ham from Spam -- by David J. Youngtotojepasttrain
 
Job seekers defense against spammers/spambots Sept 7, 2012
Job seekers defense against spammers/spambots Sept 7, 2012Job seekers defense against spammers/spambots Sept 7, 2012
Job seekers defense against spammers/spambots Sept 7, 2012chuckthomassql
 
Top Cyber Threats of 2009
Top Cyber Threats of 2009Top Cyber Threats of 2009
Top Cyber Threats of 2009Symantec
 
Le novità di Ubuntu 10.04
Le novità di Ubuntu 10.04Le novità di Ubuntu 10.04
Le novità di Ubuntu 10.04Paolo Sammicheli
 
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward Online
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward OnlineYou’re Not A Dog: How Lawyers Can Put Their Best Foot Forward Online
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward OnlineRocket Matter, LLC
 
How to Trace an E-mail Part 1
How to Trace an E-mail Part 1How to Trace an E-mail Part 1
How to Trace an E-mail Part 1Lebowitzcomics
 
Online Advertising Year Of Living Dangerously
Online Advertising Year Of Living DangerouslyOnline Advertising Year Of Living Dangerously
Online Advertising Year Of Living DangerouslyInternet Law Center
 
Online Advt: Year of Living. Dangerously
Online Advt: Year of Living. DangerouslyOnline Advt: Year of Living. Dangerously
Online Advt: Year of Living. DangerouslyBennet Kelley
 
Google Summer of Code™ (in German; neutral version)
Google Summer of Code™ (in German; neutral version)Google Summer of Code™ (in German; neutral version)
Google Summer of Code™ (in German; neutral version)Dirk Haun
 
Mobile WebKit Optimizations & Tools
Mobile WebKit Optimizations & ToolsMobile WebKit Optimizations & Tools
Mobile WebKit Optimizations & ToolsAndrew Hedges
 

Destacado (20)

Final paper
Final paperFinal paper
Final paper
 
Link Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalLink Analysis for Web Information Retrieval
Link Analysis for Web Information Retrieval
 
Webspam (English Version)
Webspam (English Version)Webspam (English Version)
Webspam (English Version)
 
ASSP: Extracting the Ham from Spam -- by David J. Young
ASSP: Extracting the Ham from Spam -- by David J. YoungASSP: Extracting the Ham from Spam -- by David J. Young
ASSP: Extracting the Ham from Spam -- by David J. Young
 
Job seekers defense against spammers/spambots Sept 7, 2012
Job seekers defense against spammers/spambots Sept 7, 2012Job seekers defense against spammers/spambots Sept 7, 2012
Job seekers defense against spammers/spambots Sept 7, 2012
 
Top Cyber Threats of 2009
Top Cyber Threats of 2009Top Cyber Threats of 2009
Top Cyber Threats of 2009
 
E spam
E spamE spam
E spam
 
Immigration Related Audits (DHN Presentation) Japan Tokyo - Revised
Immigration Related Audits (DHN Presentation) Japan Tokyo - RevisedImmigration Related Audits (DHN Presentation) Japan Tokyo - Revised
Immigration Related Audits (DHN Presentation) Japan Tokyo - Revised
 
Le novità di Ubuntu 10.04
Le novità di Ubuntu 10.04Le novità di Ubuntu 10.04
Le novità di Ubuntu 10.04
 
RIC
RICRIC
RIC
 
You’re not a dog
You’re not a dogYou’re not a dog
You’re not a dog
 
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward Online
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward OnlineYou’re Not A Dog: How Lawyers Can Put Their Best Foot Forward Online
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward Online
 
Cyber Harassment
Cyber HarassmentCyber Harassment
Cyber Harassment
 
How to Trace an E-mail Part 1
How to Trace an E-mail Part 1How to Trace an E-mail Part 1
How to Trace an E-mail Part 1
 
Net Neutrality Overview
Net Neutrality OverviewNet Neutrality Overview
Net Neutrality Overview
 
Online Advertising Year Of Living Dangerously
Online Advertising Year Of Living DangerouslyOnline Advertising Year Of Living Dangerously
Online Advertising Year Of Living Dangerously
 
Online Advt: Year of Living. Dangerously
Online Advt: Year of Living. DangerouslyOnline Advt: Year of Living. Dangerously
Online Advt: Year of Living. Dangerously
 
Can Spam At 5
Can Spam At 5Can Spam At 5
Can Spam At 5
 
Google Summer of Code™ (in German; neutral version)
Google Summer of Code™ (in German; neutral version)Google Summer of Code™ (in German; neutral version)
Google Summer of Code™ (in German; neutral version)
 
Mobile WebKit Optimizations & Tools
Mobile WebKit Optimizations & ToolsMobile WebKit Optimizations & Tools
Mobile WebKit Optimizations & Tools
 

Similar a Winning Big Data Spam with Hadoop

Email Address Harvesting
Email Address HarvestingEmail Address Harvesting
Email Address HarvestingMichael Lamont
 
OSINT for Attack and Defense
OSINT for Attack and DefenseOSINT for Attack and Defense
OSINT for Attack and DefenseAndrew McNicol
 
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...Joshua Kamdjou
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrJake Mannix
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
Lesson 6 web based attacks
Lesson 6 web based attacksLesson 6 web based attacks
Lesson 6 web based attacksFrank Victory
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringChris Gates
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Internet and Social Media for Beginners
Internet and Social Media for BeginnersInternet and Social Media for Beginners
Internet and Social Media for Beginnersbecarreno
 
NotaCon 2011 - Networking for Pentesters
NotaCon 2011 - Networking for PentestersNotaCon 2011 - Networking for Pentesters
NotaCon 2011 - Networking for PentestersRob Fuller
 
Week 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and ProducingWeek 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and Producingkurtgessler
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015Nathan Halko
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBMihnea Giurgea
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsHisham Arafat
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
DataSploit - Tool Demo at Null Bangalore - March Meet.
DataSploit - Tool Demo at Null Bangalore - March Meet. DataSploit - Tool Demo at Null Bangalore - March Meet.
DataSploit - Tool Demo at Null Bangalore - March Meet. Shubham Mittal
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrBrooke Ganz
 

Similar a Winning Big Data Spam with Hadoop (20)

Email Address Harvesting
Email Address HarvestingEmail Address Harvesting
Email Address Harvesting
 
Geek basics
Geek basicsGeek basics
Geek basics
 
OSINT for Attack and Defense
OSINT for Attack and DefenseOSINT for Attack and Defense
OSINT for Attack and Defense
 
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Lesson 6 web based attacks
Lesson 6 web based attacksLesson 6 web based attacks
Lesson 6 web based attacks
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information Gathering
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Internet and Social Media for Beginners
Internet and Social Media for BeginnersInternet and Social Media for Beginners
Internet and Social Media for Beginners
 
NotaCon 2011 - Networking for Pentesters
NotaCon 2011 - Networking for PentestersNotaCon 2011 - Networking for Pentesters
NotaCon 2011 - Networking for Pentesters
 
Week 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and ProducingWeek 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and Producing
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDB
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Web hacking
Web hackingWeb hacking
Web hacking
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
DataSploit - Tool Demo at Null Bangalore - March Meet.
DataSploit - Tool Demo at Null Bangalore - March Meet. DataSploit - Tool Demo at Null Bangalore - March Meet.
DataSploit - Tool Demo at Null Bangalore - March Meet.
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache Solr
 

Más de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Más de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Winning Big Data Spam with Hadoop

  • 1. Winning the Big Data Spam Challenge Erich Nachbar Stefan Groschupf Florian Leibert
  • 2. Spam Types - Email Spam • What do spammers do?  • Many domains • Cycle through IPs (TOR, bulk blocks) • Bulk account creation (increase IP reputation) • Break captchas (Mechanical Turk) • Common names (e.g. http://www.census.gov/genealogy/names/dist.male.first) 
  • 3. Spam Types - Social Media • Spam Carriers • Blog Postings • Comments • Friend Requests • ... • Spam Generation through • Actual User Accounts (Hacked / User Virus) • Bot Accounts
  • 4. Spam Types - Social Media • Differences • Detection is the same • Account treatment is different (cancel vs. clean) • 99% of all Spam contains URLs: • Ignore text-only messages. • Look at the URL not the text.
  • 5. Spam Types - Web Spam • Goal: influence search engine results • Link farms • Keyword, Meta tag stuffing • Hidden or invisible unrelated text • Scraper sites • Spam blogs
  • 6. Why process Spam in Hadoop? • Easy to parallelize   • Bucketization • User • Date • Source • etc. • Count models (probabilities) are very "hadoopy"
  • 7. Why process Spam in Hadoop? • Large data sets • More samples ~ better results • Algorithms require preprocessing • Existing code • e.g. url-parsers, bayes implementations •
  • 9. Heuristics for Spam Detection • Easy to compute, Group-By & Count • Captcha solving rates • Source IP/Email • Historic vs. current volume • Reputation • Link • Rrequency, position, ratio 
  • 10. Heuristics for Spam Detection • Content • Self similarity • Hash of media content • Keywords
  • 11. Looking at arrival times • Inter-arrival times • Fast Fourier Transform  • Timestamps -> frequency space
  • 12. Evaluating content • Jaccard similarity • Bucketize by user / source email • s1 = S(x1,x3,x5), s2 = S(x2,x4,x6) • Easy with Hadoop • map: emit user_id (K), text (V) • reduce:  • Even simpler:  • # links / user • # complaints, spam tags / user • etc.
  • 13. Solutions - Pay Level Domain • Requires payment at Top Level Domain • Simple heuristic • Much simpler than Trust-Rank, Page-Rank, etc. Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov IRLbot: Scaling to 6 Billion Pages and Beyond
  • 14. Demo
  • 15. Take Aways • Spam Reports are important • Rolling real-time Ham & Spam Samples • Have Knobs to turn (e.g. over JMX) • Simple solutions can get you pretty far, the easy 80%  • Spammers adapt very fast, stay agile • Try to break your own system
  • 16. Thank you! erich@quantifind.com, @enachb sg@datameer.com, @datameer flo@leibert.de, @floleibert