This session was recorded in NYC on October 22nd, 2019 and can be viewed here: https://youtu.be/micyBEIoE0Q
Leveraging Data for Successful Ad Campaigns
Marketing dollars should be spent to reach real people and make digital campaigns successful. IAS leverages large amounts of data and machine learning software to measure, analyze, and predict on billions of digital advertisements every day. I’ll be discussing how we do this in the context of fraud detection and brand safety, helping to ensure marketing dollars are used to reach the right people.
Bio: With a desire for problem-solving and handling messy data, Amitpal Tagore completed a PhD and postdoc in astrophysics. Using the skills gained in academia, he became a data scientist at Vydia, working with rising artists on social media. Currently, Amit is a data scientist in the fraud detection lab at Integral Ad Science.
4. Brand safety
4
The Product: IAS Brand Safety
Dynamically score websites and control what
content will appear with your advertisement. Your
Ad
Here
TRAGIC
NEWS
STORY
#&^!!
@#*
$^*%
5. LocalChron.com -- NATURE
HEADLINE NEWS
Ad Banner
Internal News Link
Internal News Link
Internal News Link
Internal News Link
Local US International Sports Tech Entertainment Nature
External Link
External Link
External Link
External Link
Ad Banner
Ad Banner
Similar layouts across subdomains
Repeated text across subdomains
Large number of subdomains
External links
Text from various fields
Metadata -- Keywords
10. Why do ad fraud?
10
Source: Hewlett Packard Enterprises, “The Business of Hacking”, May 2016
PAYOUTPOTENTIAL
EFFORT & RISK ESTIMATION
Cyber
warfare
Identity
theft
Organized
crime
IP theft
Extortion
Ad fraud
Payment system
fraud
Bank fraud
Medical records
fraud
Credential
harvesting
Credit card
fraud
Hacktivism
HIGH LOW
HIGH
LOW
11. The monetary impact of fraud
11
The IAB and Ernst & Young
estimate that $4.6 billion is lost
due to ad fraud/NHT annually
12. Detecting & preventing ad fraud: 3 pillars
12
Behavioral & network
analysis
Browser & device analysis Targeted reconnaissance &
malware analysis
• Dissection of malware and
infiltration of hacker communities
• Validate that browser viewing
ad is a real, human web
browser like Chrome or Mobile
Safari
• Validate that device viewing ad
is actually an iPhone or
Windows 10 computer
• Differentiate human from bot
behavior
• Process vast amounts of data
17. Predicted Viewability: Product overview & Implementation
17
The Product: IAS Predicted Viewability
Provides advertisers with a probability that their
ads will be seen by the end user.
Out of view portion of the web site
Ad in-
view
Your Ad Here
18. Data Description
18
➢ Proprietary Data: Ad impression logs collected from advertising
campaigns of our clients.
➢ Available data points:
○ URL properties
○ User’s device/environment (desktop web, mobile web, mobile app)
○ Impression Type (banner, video)
○ Viewability measurements
19. Tools Used
19
Data exploration / preparation / cleansing: Apache Hive
AI Hosts: G3.16xlarge with 64 vCPUS and 4 NVIDIA Tesla GPUs
AI-driven feature engineering framework: H2O DriverlessAI
AI-driven workflow for model ML optimization: H2O AutoML
Hierarchical Post modeling Processing: Jupyter/Python
20. Total number of features:
Preferred Models:
Significant improvement:
Results
20
46 machine engineered features from
Driverless AI.
LightGBM, gradient boosted regression
H2O Driverless AI improved accuracy and
provides insights.
21. Motivation and Business Value
21
➢ Gain competitive edge with first-in-class ML-based
brand safety and viewability predictions, aided by
H2O Driverless IA
➢ Enable greater flexibility for our clients’ custom
viewability standards
➢ Enable rapid model development, testing, and
deployment
➢ Provide insights in an automated framework