Privacy engineering is an emerging discipline within the software and data engineering domains aiming to provide methodologies, tools, and techniques such that the engineered systems provide acceptable levels of privacy. In this talk, I will present our recent work on anonymization and privacy preserving analytics on large scale geo location datasets. In particular, the focus is on how to scale anonymization and geospatial analytics workloads with Spark, maximizing the performance by combining multi-dimensional spatial indexing with Spark in-memory computations.
1. The Beauty of (Big) Data
Privacy Engineering
Yangcheng Huang
Director of Software Engineering, Data & Analytics
Truata
2. Who we are
§ Truata was founded in 2018, with investment by
Mastercard and IBM
§ Our goal is to be the world’s leading provider of privacy-
enhanced data analytics and management solutions
§ Based in Dublin, we have a team of 70 people with an R&D
focus on developing cutting edge privacy enhancing
technologies (PETs)
§ International client base across major industry verticals
§ Multiple EU regulators consulted on the Truata solution
Truata Anonymization Service is a cutting-edge solution for GDPR-grade
data anonymization & analytics, allowing companies to analyse and monetize
customer data in fully-anonymized form.
§ Sophisticated and
proprietary
technologies for
data anonymization
and risk calibration
§ Able to generate
fully anonymized
data sets ideally
suited for privacy-
preserving analytics
§ Experienced in
delivering large-
scale anonymized
data analytics
projects
§ Able to drive
significant value
from data while
maintaining
customer trust
§ Delivered by our
customer success
team of data
science and privacy
experts
§ Fully focused on
using privacy-centric
techniques to
generate value from
data
§ Consulting solutions based on our proprietary
methodologies, IP and expertise delivered by
industry leading, subject matter experts.
3. A big-data privacy engineering problem
• Geo privacy
• Zip-level targeted advertising
• Lat/Long GPS
• Shapefile of zip codes
• Using neighbouring zip’s shopping behaviour
• Problems
• Lat/Long mapping (generalisation of GPS information) &
nearest 10
• (32m) customer’s Lat long mapping onto (1.7m) UK Zips
• (1.7m) nearest 10 Zips out of 1.3m Zips (with Customer
transactions)
• A ‘Trillion’ Problem
• 1,000,000,000,000
• Google processes 61.6 billion web pages today
• Dublin Population 2019: 1,214,666
• Measure the similarity of any two Dubliners
4. Definition of beauty
Definition of beauty
: the quality of being physically attractive.
: the qualities in a person or a thing that give
pleasure to the senses or the mind.
…
Beauty | Definition of Beauty by Merriam-Webster
https://www.merriam-webster.com › dictionary › beauty