SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Using ML to Protect Customer Privacy
by fmr Amazon Sr PM
www.productschool.com
CERTIFICATES
Your Product Management Certificate Path
Product Leadership
Certificate™
Full Stack Product
Management Certificate™
Product Management
Certificate™
20 HOURS
40 HOURS
40 HOURS
Corporate
Training
Level up your team’s Product
Management skills
Free Product Management Resources
BOOKS
EVENTS
JOB PORTAL
COMMUNITIES
bit.ly/product_resources
COURSES
Using ML to
protect customer
privacy
Pushpak Pujari
PM at Verkada | ex Sr. PM at Amazon
Bio
PM at Verkada for Security Cameras
Sr. PM at Amazon Alexa AI - Privacy
Sr. PM at Amazon Web Services IoT
Wharton MBA, EE from IIT Delhi
Hobbies: Tennis, Hiking, Beer Brewing
Takeaways from this Webinar
Privacy fundamentals
Privacy preservation techniques
Using ML for privacy – a walkthrough
Strategies for being an impactful Privacy PM
Why Privacy Matters
Companies collect and retain tons of
customer data:
• Fulfilling a service request
• Legal or regulatory requirements
• Better CX: recommendations, marketing etc.
• Resell data to 3P
Collected data can contain sensitive
information
Such data landing into wrong hands can be
devastating – both for customer and the
organization
Why Privacy Matters
• Data breaches happen way more
frequently than you think
• Data is spread across different
organizations and medium. Almost
impossible to track data lineage
• Rise of privacy laws (HIPAA, GDPR, CCPA,
COPPA etc.) with more coming soon
• Growing distrust of social media providers
• Customers want transparency on how their
data is being used
What constitutes Personal Data
• Direct identifiers
• E.g.: Full name, address, SSN, phone
number
• Indirect identifiers
• E.g.: location history, gender, demographic
information, salary
Data classification
• Identified: contains direct or
indirect identifiers
• Pseudonymous: eliminate or
transform direct identifiers
• De-identified: direct and known
indirect identifiers removed
• Anonymous: mathematically
proven to prevent re-
identification
John Doe
Personal Data
eEf2gT_334
Pseudonymized Data
Mary Jane
Personal Data
********
Anonymous Data
Random
Noise
Key
Privacy vs Utility Tradeoff
Picture credit: Mostly AI
Stakeholders in Privacy
Enactment
• Compliance Team
• Information Security
• Legal
• Privacy Engineering
• Product
Benefits of being Privacy-first
• Avoid huge fines
• Prevent loss of business licenses
• Brand impact, trust
• Customer loyalty and retention
• Increase Customer Lifetime Value and higher conversion
• Competitive moat
Privacy-first positioning is table stakes
Sources of
Privacy Risk
Raw Customer Data and its
derivatives
Metadata and logs
ML Models
For attackers, raw data is the holy grail, but ML Model should not be ignored
Privacy Risks from ML Models
non-members
in training
dataset
member in
training dataset
predictions
Output distributions
Delta denotes
privacy risk
Test Dataset (potential members)
Source: Privacy-Preserving Machine Learning: Threats and Solutions
Don’t be alarmed!
• Locking customer data in a secure vault and
throwing away the key is not the answer
• Goal is to protect customer data while using it
to deliver great CX without sacrificing
customer privacy
Rest of the presentation is focused on using ML
to mitigate the privacy risks while leaving
enough utility in the data
Data
Sanitization
Privacy Preservation Techniques
Privacy-preserving
Computation
• Direct Identifier Detection
and Filtering
• Pseudonymization
• K-anonymization
• Differential Privacy
• Homomorphic Encryption
• Secure Multi-Party Computation
• Trusted Execution Environments
• Federated Learning
Direct
Identifiers
Examples
• Name
• Address (all geographic subdivisions smaller than state)
• All dates related to an individual
• Telephone / Fax numbers
• Email address
• Social Security Number
• Medical record number
• Health plan beneficiary number
• Any account number
• Any certificate or license number
• Vehicle identifiers including license plate numbers
• Device identifiers and serial numbers
• Web URLs
• Internet Protocol (IP) Address
• Biometrics including finger or voice print
• Photographic image - not limited to images of the face
Direct Identifier Detection
and Filtering
Define a list of identifiers and scan
datasets for said identifiers
Easiest to implement
No measurable guarantees
Needs humans in the loop
Maintaining and improving models is hard
Pseudonymization
Map direct identifiers to unique tokens
Can be one-way or two-way
Easier to implement
Allows joins with other data tables
Re-identification impossible from tokens
Original data can be extracted
Needs consistent implementation
----------------------------
----------------------------
4145 4455 3489 9985
----------------------------
----------------------------
41ss utoh dkjbg 9985
K-anonymization
Generalize quasi-identifiers and make each
record indistinguishable from at least k-1
other records
Stronger anonymization
Reduces data utility
Choosing ideal k value is hard
Choosing generalization logic is hard
944*
94401
94454
94432
Zip Codes
26
24
27
29
Age
Differential Privacy
Query outcome is not dependent
on any one record
Measurable privacy guarantees
Hard to choose right parameters
Not practical for a lot of use cases (yet)
Maintaining DP datasets over time is expensive
Picture credit: Winton Research
ML to detect direct identifiers: a walkthrough
• Use cases:
• [p0] Scan search phrases for direct identifiers, if found delete immediately
• [p1] If an employee is trying to access customer data for customer analytics, ensure
that it contains no direct identifiers
• Functional requirements
• Detect 5 types of identifiers: full name, address, telephone numbers, email id, SSN
• en_US locale only
• Goal Success Criteria – precision 70%, recall 95%
• Non-functional requirements
• [p0] Scan 1 query (~5 search words) in 250ms
• [p1] Provide API for batch detection
Ingredients for a spicy ML model
Training
Data
Success
Metrics
Model
architecture
ML
Infrastructure
Continuous
improvement
Training Data
• Garbage-in, garbage-out: training data should be as close as to your
runtime data as possible in syntax and semantics
• Human labeling challenges
• Identifying which search phrases contain PII so it can be annotated
• Ambiguity – high quality ground truth requires multiple passes
• Using actual customer data might lead to privacy exposure
• Track Labeling metrics as it directly impact model performance
• Size and diversity in training data to minimize overfit and underfit
Metrics and
Performance
Evaluation
Precision and Recall – which one is
more important?
Sampling challenges with skewed
identifier distribution
Measurement can be expensive
How frequently should your run
measurement workflow
Model Architecture: Choose Your Weapon
• Logistic Regression based binary classifiers
• Easy to implement
• Hard to attribute what is working and what isn’t
• Regular Expression (Regex)
• Highly effective for direct identifiers which have consistent schema
• Dumb, hard to generalize, hard to expand and scale
• NER (Stanford NER, Stanza, FLAIR, spaCy, transformers like BERT)
• Ideal for names, addresses and context dependent identifiers
• Computationally expensive, requires large training data
• No one size fits all solution
• Trial and Error based experimentation is key
Model Architecture: Choose Your Weapon
1. Name - NER
2. Address - NER
3. Telephone numbers - Regex
4. Email address - Regex
5. Social Security Number - Regex
Infrastructure
All public cloud providers have offerings
for training, testing, hosting and MLOps
Work with ML scientists to pick
framework of choice
Continuous
Improvement
Workflow
Re-train your model
periodically
Track model
performance metrics
regularly
Optimize training
frequency
Watch out for model
drift over time
Track labeling quality
metrics regularly
Optimize labeling
workflow
Effective Privacy PM
Cheat Sheet
The most rewarding
PM opportunity
Can seem technically challenging and ambiguous
but
• True opportunity to lead and stand out
• Core Product Management
• Tremendous learning opportunity, build specific
skills for the data-first world
• Truly multi-disciplinary cutting –AI/ML/data,
security, legal, compliance, cloud
• Create positive impact and make the world a
better place
Strategies to
Gain
Leverage
Partner Identify who cares – CISO, senior leadership
Quantify Quantify impact on Brand and tie it to organization’s
business metrics
Goals Work backwards from Customer Promises
Vision Set an exciting and appealing North Star vision
Strategies to
Gain
Leverage
Team
Put together a cross-
team task force of
curious people
Incremental
Build an incremental
roadmap with few
quick wins
Visibility
Provide continuous
visibility
Incentivize
Create adoption plan
with the right
incentives
Where to begin
Follow the data
Chart the customer data
lifecycle
Create threat map
Where are humans in the loop
What tools do they use to
access the data
Identify use cases Privacy vs Utility tradeoff
Identify drivers and define success
metrics
Ingestion Deletion
Usage
Storage
Best Practices
Stay abreast with
new technology
Build a community Join conferences Experiment
Resources
• Visual guide to practical data de-identification: https://fpf.org/wp-
content/uploads/2016/04/FPF_Visual-Guide-to-Practical-Data-DeID.pdf
• Google's Patent on PII detection: https://patents.google.com/patent/US8561185B1/en
• Microsoft Presidio: https://github.com/microsoft/presidio
• Use NER mode to detect person names in text: https://pii-tools.com/detect-person-
names-in-text/
• Custom NLP approaches to data anonymization: https://towardsdatascience.com/nlp-
approaches-to-data-anonymization-1fb5bde6b929
• Detecting and redacting PII using Amazon Comprehend:
https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using-
amazon-comprehend/
Thank
you!
• https://www.linkedin.com/in/pushpakpujari/
• @pushpakpujari
www.productschool.com
Part-time Product Management Training Courses
and
Corporate Training

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Your First 90 Days: Listen, Learn & Act by Tumblr CPO
Your First 90 Days: Listen, Learn & Act by Tumblr CPOYour First 90 Days: Listen, Learn & Act by Tumblr CPO
Your First 90 Days: Listen, Learn & Act by Tumblr CPO
 
Turbocharge Your PM Career: Unleashing 5 Game-Changing Tactics
Turbocharge Your PM Career: Unleashing 5 Game-Changing TacticsTurbocharge Your PM Career: Unleashing 5 Game-Changing Tactics
Turbocharge Your PM Career: Unleashing 5 Game-Changing Tactics
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Mastering Ownership Mindset: Unlocking Your Potential as a Product Manager.pdf
Mastering Ownership Mindset: Unlocking Your Potential as a Product Manager.pdfMastering Ownership Mindset: Unlocking Your Potential as a Product Manager.pdf
Mastering Ownership Mindset: Unlocking Your Potential as a Product Manager.pdf
 
Product Discovery At Google
Product Discovery At GoogleProduct Discovery At Google
Product Discovery At Google
 
Mental Models to Guide Product Decisions by Google Product Manager
Mental Models to Guide Product Decisions by Google Product ManagerMental Models to Guide Product Decisions by Google Product Manager
Mental Models to Guide Product Decisions by Google Product Manager
 
Making Products in GenAI’s World by LinkedIn VP of Product.pdf
Making Products in GenAI’s World by LinkedIn VP of Product.pdfMaking Products in GenAI’s World by LinkedIn VP of Product.pdf
Making Products in GenAI’s World by LinkedIn VP of Product.pdf
 
Scrum Product Owner
Scrum Product OwnerScrum Product Owner
Scrum Product Owner
 
The Future of Product Management by Product School Founder & CEO.pdf
The Future of Product Management by Product School Founder & CEO.pdfThe Future of Product Management by Product School Founder & CEO.pdf
The Future of Product Management by Product School Founder & CEO.pdf
 
The Art of Building a Roadmap
The Art of Building a RoadmapThe Art of Building a Roadmap
The Art of Building a Roadmap
 
Inside The Tornado by Geoffrey Moore
Inside The Tornado by Geoffrey MooreInside The Tornado by Geoffrey Moore
Inside The Tornado by Geoffrey Moore
 
How Product Managers and Designers Work Together by XO Group PM
How Product Managers and Designers Work Together by XO Group PMHow Product Managers and Designers Work Together by XO Group PM
How Product Managers and Designers Work Together by XO Group PM
 
Navigating Polarities as a PM by Google Product Leader
Navigating Polarities as a PM by Google Product LeaderNavigating Polarities as a PM by Google Product Leader
Navigating Polarities as a PM by Google Product Leader
 
Building AI products by Google Group Product Manager.pdf
Building AI products by Google Group Product Manager.pdfBuilding AI products by Google Group Product Manager.pdf
Building AI products by Google Group Product Manager.pdf
 
The Types of Product Management Roles by PayPal Sr Product Manager
The Types of Product Management Roles by PayPal Sr Product ManagerThe Types of Product Management Roles by PayPal Sr Product Manager
The Types of Product Management Roles by PayPal Sr Product Manager
 
Agile Product Manager/Product Owner Dilemma (PMEC)
Agile Product Manager/Product Owner Dilemma (PMEC)Agile Product Manager/Product Owner Dilemma (PMEC)
Agile Product Manager/Product Owner Dilemma (PMEC)
 
Combating Burnout as a Product Manager by CNN Director of Product
Combating Burnout as a Product Manager by CNN Director of ProductCombating Burnout as a Product Manager by CNN Director of Product
Combating Burnout as a Product Manager by CNN Director of Product
 
Analysis In Agile: It's More than Just User Stories
Analysis In Agile: It's More than Just User StoriesAnalysis In Agile: It's More than Just User Stories
Analysis In Agile: It's More than Just User Stories
 
How to Use Data to Drive Product Decisions by PayPal PM
How to Use Data to Drive Product Decisions by PayPal PMHow to Use Data to Drive Product Decisions by PayPal PM
How to Use Data to Drive Product Decisions by PayPal PM
 
What is API Product Management by PayPal Director of Product
What is API Product Management by PayPal Director of ProductWhat is API Product Management by PayPal Director of Product
What is API Product Management by PayPal Director of Product
 

Similar a Using ML to Protect Customer Privacy by fmr Amazon Sr PM

It’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationIt’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience Presentation
Yao H. Morin, Ph.D.
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
Cybersecurity Education and Research Centre
 
Building a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICSBuilding a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICS
Shiv Bharti
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
Peter Skomoroch
 
AI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AIAI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AI
Skyl.ai
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
DATAVERSITY
 

Similar a Using ML to Protect Customer Privacy by fmr Amazon Sr PM (20)

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
 
Delivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMDelivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PM
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
It’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationIt’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience Presentation
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
 
Dont let governance risk and compliance be a roll of the dice | ESPC22
Dont let governance risk and compliance be a roll of the dice |  ESPC22 Dont let governance risk and compliance be a roll of the dice |  ESPC22
Dont let governance risk and compliance be a roll of the dice | ESPC22
 
Building a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICSBuilding a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICS
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
How to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDMHow to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDM
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPO
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
How AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceHow AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee Experience
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
AI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AIAI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AI
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 

Más de Product School

Más de Product School (20)

Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Webinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdf
Webinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdfWebinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdf
Webinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdf
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
 
Unlocking High-Performance Product Teams by former Meta Global PMM
Unlocking High-Performance Product Teams by former Meta Global PMMUnlocking High-Performance Product Teams by former Meta Global PMM
Unlocking High-Performance Product Teams by former Meta Global PMM
 
The Types of TPM Content Roles by Facebook product Leader
The Types of TPM Content Roles by Facebook product LeaderThe Types of TPM Content Roles by Facebook product Leader
The Types of TPM Content Roles by Facebook product Leader
 
Match Is the New Sell in The Digital World by Amazon Product leader
Match Is the New Sell in The Digital World by Amazon Product leaderMatch Is the New Sell in The Digital World by Amazon Product leader
Match Is the New Sell in The Digital World by Amazon Product leader
 
Beyond the Cart: Unleashing AI Wonders with Instacart’s Shopping Revolution
Beyond the Cart: Unleashing AI Wonders with Instacart’s Shopping RevolutionBeyond the Cart: Unleashing AI Wonders with Instacart’s Shopping Revolution
Beyond the Cart: Unleashing AI Wonders with Instacart’s Shopping Revolution
 
Designing Great Products The Power of Design and Leadership
Designing Great Products The Power of Design and LeadershipDesigning Great Products The Power of Design and Leadership
Designing Great Products The Power of Design and Leadership
 
Command the Room: Empower Your Team of Product Managers with Effective Commun...
Command the Room: Empower Your Team of Product Managers with Effective Commun...Command the Room: Empower Your Team of Product Managers with Effective Commun...
Command the Room: Empower Your Team of Product Managers with Effective Commun...
 
Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...
Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...
Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...
 
Customer-Centric PM: Anticipating Needs Across the Product Life Cycle
Customer-Centric PM: Anticipating Needs Across the Product Life CycleCustomer-Centric PM: Anticipating Needs Across the Product Life Cycle
Customer-Centric PM: Anticipating Needs Across the Product Life Cycle
 
AI in Action The New Age of Intelligent Products and Sales Automation
AI in Action The New Age of Intelligent Products and Sales AutomationAI in Action The New Age of Intelligent Products and Sales Automation
AI in Action The New Age of Intelligent Products and Sales Automation
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Using ML to Protect Customer Privacy by fmr Amazon Sr PM

  • 1. Using ML to Protect Customer Privacy by fmr Amazon Sr PM www.productschool.com
  • 2. CERTIFICATES Your Product Management Certificate Path Product Leadership Certificate™ Full Stack Product Management Certificate™ Product Management Certificate™ 20 HOURS 40 HOURS 40 HOURS
  • 3. Corporate Training Level up your team’s Product Management skills
  • 4. Free Product Management Resources BOOKS EVENTS JOB PORTAL COMMUNITIES bit.ly/product_resources COURSES
  • 5. Using ML to protect customer privacy Pushpak Pujari PM at Verkada | ex Sr. PM at Amazon
  • 6. Bio PM at Verkada for Security Cameras Sr. PM at Amazon Alexa AI - Privacy Sr. PM at Amazon Web Services IoT Wharton MBA, EE from IIT Delhi Hobbies: Tennis, Hiking, Beer Brewing
  • 7. Takeaways from this Webinar Privacy fundamentals Privacy preservation techniques Using ML for privacy – a walkthrough Strategies for being an impactful Privacy PM
  • 8. Why Privacy Matters Companies collect and retain tons of customer data: • Fulfilling a service request • Legal or regulatory requirements • Better CX: recommendations, marketing etc. • Resell data to 3P Collected data can contain sensitive information Such data landing into wrong hands can be devastating – both for customer and the organization
  • 9. Why Privacy Matters • Data breaches happen way more frequently than you think • Data is spread across different organizations and medium. Almost impossible to track data lineage • Rise of privacy laws (HIPAA, GDPR, CCPA, COPPA etc.) with more coming soon • Growing distrust of social media providers • Customers want transparency on how their data is being used
  • 10. What constitutes Personal Data • Direct identifiers • E.g.: Full name, address, SSN, phone number • Indirect identifiers • E.g.: location history, gender, demographic information, salary
  • 11. Data classification • Identified: contains direct or indirect identifiers • Pseudonymous: eliminate or transform direct identifiers • De-identified: direct and known indirect identifiers removed • Anonymous: mathematically proven to prevent re- identification John Doe Personal Data eEf2gT_334 Pseudonymized Data Mary Jane Personal Data ******** Anonymous Data Random Noise Key
  • 12. Privacy vs Utility Tradeoff Picture credit: Mostly AI
  • 13. Stakeholders in Privacy Enactment • Compliance Team • Information Security • Legal • Privacy Engineering • Product
  • 14. Benefits of being Privacy-first • Avoid huge fines • Prevent loss of business licenses • Brand impact, trust • Customer loyalty and retention • Increase Customer Lifetime Value and higher conversion • Competitive moat Privacy-first positioning is table stakes
  • 15. Sources of Privacy Risk Raw Customer Data and its derivatives Metadata and logs ML Models For attackers, raw data is the holy grail, but ML Model should not be ignored
  • 16. Privacy Risks from ML Models non-members in training dataset member in training dataset predictions Output distributions Delta denotes privacy risk Test Dataset (potential members) Source: Privacy-Preserving Machine Learning: Threats and Solutions
  • 17. Don’t be alarmed! • Locking customer data in a secure vault and throwing away the key is not the answer • Goal is to protect customer data while using it to deliver great CX without sacrificing customer privacy Rest of the presentation is focused on using ML to mitigate the privacy risks while leaving enough utility in the data
  • 18. Data Sanitization Privacy Preservation Techniques Privacy-preserving Computation • Direct Identifier Detection and Filtering • Pseudonymization • K-anonymization • Differential Privacy • Homomorphic Encryption • Secure Multi-Party Computation • Trusted Execution Environments • Federated Learning
  • 19. Direct Identifiers Examples • Name • Address (all geographic subdivisions smaller than state) • All dates related to an individual • Telephone / Fax numbers • Email address • Social Security Number • Medical record number • Health plan beneficiary number • Any account number • Any certificate or license number • Vehicle identifiers including license plate numbers • Device identifiers and serial numbers • Web URLs • Internet Protocol (IP) Address • Biometrics including finger or voice print • Photographic image - not limited to images of the face
  • 20. Direct Identifier Detection and Filtering Define a list of identifiers and scan datasets for said identifiers Easiest to implement No measurable guarantees Needs humans in the loop Maintaining and improving models is hard
  • 21. Pseudonymization Map direct identifiers to unique tokens Can be one-way or two-way Easier to implement Allows joins with other data tables Re-identification impossible from tokens Original data can be extracted Needs consistent implementation ---------------------------- ---------------------------- 4145 4455 3489 9985 ---------------------------- ---------------------------- 41ss utoh dkjbg 9985
  • 22. K-anonymization Generalize quasi-identifiers and make each record indistinguishable from at least k-1 other records Stronger anonymization Reduces data utility Choosing ideal k value is hard Choosing generalization logic is hard 944* 94401 94454 94432 Zip Codes 26 24 27 29 Age
  • 23. Differential Privacy Query outcome is not dependent on any one record Measurable privacy guarantees Hard to choose right parameters Not practical for a lot of use cases (yet) Maintaining DP datasets over time is expensive Picture credit: Winton Research
  • 24. ML to detect direct identifiers: a walkthrough • Use cases: • [p0] Scan search phrases for direct identifiers, if found delete immediately • [p1] If an employee is trying to access customer data for customer analytics, ensure that it contains no direct identifiers • Functional requirements • Detect 5 types of identifiers: full name, address, telephone numbers, email id, SSN • en_US locale only • Goal Success Criteria – precision 70%, recall 95% • Non-functional requirements • [p0] Scan 1 query (~5 search words) in 250ms • [p1] Provide API for batch detection
  • 25. Ingredients for a spicy ML model Training Data Success Metrics Model architecture ML Infrastructure Continuous improvement
  • 26. Training Data • Garbage-in, garbage-out: training data should be as close as to your runtime data as possible in syntax and semantics • Human labeling challenges • Identifying which search phrases contain PII so it can be annotated • Ambiguity – high quality ground truth requires multiple passes • Using actual customer data might lead to privacy exposure • Track Labeling metrics as it directly impact model performance • Size and diversity in training data to minimize overfit and underfit
  • 27. Metrics and Performance Evaluation Precision and Recall – which one is more important? Sampling challenges with skewed identifier distribution Measurement can be expensive How frequently should your run measurement workflow
  • 28. Model Architecture: Choose Your Weapon • Logistic Regression based binary classifiers • Easy to implement • Hard to attribute what is working and what isn’t • Regular Expression (Regex) • Highly effective for direct identifiers which have consistent schema • Dumb, hard to generalize, hard to expand and scale • NER (Stanford NER, Stanza, FLAIR, spaCy, transformers like BERT) • Ideal for names, addresses and context dependent identifiers • Computationally expensive, requires large training data • No one size fits all solution • Trial and Error based experimentation is key
  • 29. Model Architecture: Choose Your Weapon 1. Name - NER 2. Address - NER 3. Telephone numbers - Regex 4. Email address - Regex 5. Social Security Number - Regex
  • 30. Infrastructure All public cloud providers have offerings for training, testing, hosting and MLOps Work with ML scientists to pick framework of choice
  • 31. Continuous Improvement Workflow Re-train your model periodically Track model performance metrics regularly Optimize training frequency Watch out for model drift over time Track labeling quality metrics regularly Optimize labeling workflow
  • 33. The most rewarding PM opportunity Can seem technically challenging and ambiguous but • True opportunity to lead and stand out • Core Product Management • Tremendous learning opportunity, build specific skills for the data-first world • Truly multi-disciplinary cutting –AI/ML/data, security, legal, compliance, cloud • Create positive impact and make the world a better place
  • 34. Strategies to Gain Leverage Partner Identify who cares – CISO, senior leadership Quantify Quantify impact on Brand and tie it to organization’s business metrics Goals Work backwards from Customer Promises Vision Set an exciting and appealing North Star vision
  • 35. Strategies to Gain Leverage Team Put together a cross- team task force of curious people Incremental Build an incremental roadmap with few quick wins Visibility Provide continuous visibility Incentivize Create adoption plan with the right incentives
  • 36. Where to begin Follow the data Chart the customer data lifecycle Create threat map Where are humans in the loop What tools do they use to access the data Identify use cases Privacy vs Utility tradeoff Identify drivers and define success metrics Ingestion Deletion Usage Storage
  • 37. Best Practices Stay abreast with new technology Build a community Join conferences Experiment
  • 38. Resources • Visual guide to practical data de-identification: https://fpf.org/wp- content/uploads/2016/04/FPF_Visual-Guide-to-Practical-Data-DeID.pdf • Google's Patent on PII detection: https://patents.google.com/patent/US8561185B1/en • Microsoft Presidio: https://github.com/microsoft/presidio • Use NER mode to detect person names in text: https://pii-tools.com/detect-person- names-in-text/ • Custom NLP approaches to data anonymization: https://towardsdatascience.com/nlp- approaches-to-data-anonymization-1fb5bde6b929 • Detecting and redacting PII using Amazon Comprehend: https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using- amazon-comprehend/
  • 40. www.productschool.com Part-time Product Management Training Courses and Corporate Training