SlideShare a Scribd company logo
1 of 48
PUTTING DATA
SCIENCE INTO
PERSPECTIVE
PRIVACY &
ETHICS
SRAVAN ANKARAJU
FOUNDER & CEO
DIVERGENCE ACADEMY
June 29th, 2017
Is Fake News a Well-defined Machine Learning Problem?
Source
SRAVAN ANKARAJU
• Technology Leader focused on Strategy & Innovation, Risk and
Decision Management
• 13.5 years with Microsoft in Technology Integration consulting,
Developer Support – focus on DevOps & Agile Development
• Big Picture educator – start with concepts & then play to learn
advanced areas. Iterate and iterate often.
• Implementer of Gamification systems in Learning/Training
• Experienced in High volume & high performance transactional
systems.
• Data Analytics for various Fortune 100 companies
FOUNDER & PRESIDENT
CLAIM TO
FAME
WHY THIS
WHY NOW?
3 TRENDS SHAPING MACHINE LEARNING IN 2017
Algorithm Economy is on
the Rise
Expect more Interaction
between Machine and
Humans
Giant companies will develop
ML based AI systems
Source: http://www.datasciencecentral.com/profiles/blogs/trends-shaping-machine-learning-in-2017
1 2
3
EMERGING
TECHNOLOGY
HYPE CYCLE
FOR 2016
AGENDA
• What’s the catch if there is ton of goodness in AI-based systems
• When do you get human involved
• Global companies & Importance of May 25th, 2018
• Privacy Paradox & Distinctive aspects of Big Data
• Data Science Ethics Framework
• Where do you go from here / What you can do today
PUTTING DATA SCIENCE INTO PERSPECTIVE
WHAT IS
THE
CATCH?
To study possibly racist algorithms,
professors have to sue the US
http://arstechnica.com/tech-policy/2016/06/do-housing-jobs-sites-have-racist-algorithms-academics-sue-to-find-out/
http://moralmachine.mit.edu/
WHEN DO
YOU GET A
HUMAN
INVOLVED?
QUALITY ASSURANCE
When does securing AI against attacks or reverse-
engineering become more of an issue?
It’s an issue now. One of my biggest learnings from [chatbot] Tay was that
you need to build even AI that is resilient to attacks. It was fascinating to
see what happened on Twitter, but for instance we didn’t face the same
thing in China. Just the social conversation in China is different, and if you
put it in the US corpus it’s different. Then, of course, there was a concerted
attack. Just like you build software today that is resilient to a DDOS
(Distributed Denial of Service) attack, you need to be able to be resilient to
a corpus attack, that tries to pollute the corpus so that you pick up the
wrong thing in your AI learners.
- Satya Nadella, Microsoft CEO
HUMAN INTERVENTION
Whenever you have ambiguity and errors, you need to think
about how you put the human in the loop and escalate to the
human to make choices. That is the art form of an AI product. If
you have ambiguity and error rates, you have to be able to handle
exception. But first you have to detect that exception, and luckily
enough in AI you have confidence and probability and
distribution, so you have to use all of those to get the human in
the loop.
GOVERNANCE
- Satya Nadella, Microsoft CEO
GLOBAL
DATA
PROTECTION
REGULATION
GDPR
Stricter rules will apply to the
collection and use of personal data.
Will apply for May 2018
GLOBAL DATA PROTECTION
REGULATIONOperational Impacts
Mandatory Data-
breach Protection
Privacy Impact
Assessments
Right to be
Forgotten
Privacy by Design
and Default
Mandatory Data
Protection
Officers
1 2 3
4 5
MANDATORY DATA-BREACH
PROTECTION
• Companies that experience data breaches will need to notify
regulators and individuals whose personal data was
compromised.
• Companies will most likely want to avoid the negative
publicity of these disclosures. Multinationals will gradually
ramp up:
• Comprehensive risk assessments
• End-to-end security enhancements
• Outsourced managed security services.
Global Data Protection Regulation Operational Impact
PRIVACY IMPACT ASSESSMENTS
• Require companies to conduct data protection impact
assessments (DPIAs) where their data processing operations
are highly invasive.
• Include marketing activities based on advanced profiling and
analytics.
• Privacy operations may need to extend outside the legal
office, where it has traditionally resided, and into the day-to-
day processes of European businesses.
Global Data Protection Regulation Operational Impact
RIGHT TO BE FORGOTTEN
• Right to erasure could impose a significant burden on
companies with personal data stored across multiple
systems.
• Companies may need to –
• Maintain comprehensive data inventories
• Accelerate data-governance strategies
• Potentially re-architect key systems in order to more efficiently
process Right to be erasure requests.
Global Data Protection Regulation Operational Impact
PRIVACY BY DESIGN AND DEFAULT
• Privacy-friendly settings or postures—such as those that
collect, retain, and share personal information – will be built
into new products, devices, and business processes.
• The flip-side of DPIAs, privacy-design requirements may
give rise to a need for privacy engineers to embed privacy
features throughout the daily operations of their businesses
Global Data Protection Regulation Operational Impact
MANDATORY DATA PROTECTION
OFFICERS
• Require large companies to appoint data protection officers
(DPOs), if their core activities consist of large-scale,
systematic monitoring of people.
• DPOs will have to exhibit expertise in technology and
business processes and project and program management,
such as risk assessment and compliance monitoring skills.
• Talent is in short supply.
Global Data Protection Regulation Operational Impact
PRIVACY
PARADOX
PRIVACY PARADOX
Price of using internet services
• People may express concerns about the impact on their privacy of ‘creepy’ uses of their
data, but in practice they contribute their data anyway via the online systems they use. In
other words they provide the data because it is the price of using internet services.
RIGHT TO MY IDENTITY
Microsoft’s Digital Trends report 2015 noted a trend called Right to My
Identity which means that, rather than simply wishing to preserve
privacy through anonymity, a significant percentage of global consumers
now want to be able to control how long information they have shared
stays online, and are also interested in services that help them manage
their digital identity. This suggests consumers have increasing
expectations of how organizations will use their data and want to be able
to influence it.
A Lawyer and A Data Scientist
Walk Into A Bar
An organization wants to use data
generated in different regulatory
environments to learn about its customers
or to predict their behavior. Some
customers are in Germany, some are in
Switzerland, and others are in the U.S. and
Canada. How can a data scientist get the
most out of this data without breaking the
law, when each country has its own
regulations on what he or she can do with
the data?
“Where the data subject has provided the personal data and the
processing is based on consent or on a contract, the data subject shall
have the right to transmit those personal data and any other
information provided by the data subject and retained by an automated
processing system, into another one, in an electronic format which is
commonly used, without hindrance from the controller from whom the
personal data are withdrawn.”
GDPR CONTROVERSY
Data Portability Legalese
DISTINCTIVE ASPECTS OF BIG DATA
ANALYTICS
Potential implications for data protection
Use of algorithms
Opacity of
processing
Tendency to collect
all the data
Repurposing of data
Use of new type of
data
1 2 3
4 5
#1 USE OF ALGORITHMS
• Thinking with data: Find correlations / system learns
• Acting with data: Applied to particular case in the
Application phase
Unpredictability by Design
#2 OPACITY OF THE PROCESSING
The ‘Black Box’ effect
• Deep learning, involves feeding vast quantities of data
through non-linear neural networks that classify the data
based on the outputs from each successive layer.
• The complexity of the processing of data through such
massive networks creates a ‘black box’ effect.
• Makes it very difficult to understand the reasons for
decisions made as a result of deep learning.
#3 USING ALL THE DATA
n=all
In a retail context it could mean analyzing all the purchases
made by shoppers using a loyalty card, and using this to find
correlations, rather than asking a sample of shoppers to take
part in a survey.
#4 REPURPOSING DATA
Different than the original intent
• Geolocated Twitter data to infer people’s residence and mobility
patterns, to supplement official population estimates.
• Geotagged photos on Flickr, together with the profiles of
contributors, have been used as a reliable proxy for estimating
visitor numbers at tourist sites and where the visitors have come
from.
• Mobile-phone presence data to analyze the foot traffic into the
retail centers.
• Data about where shoppers to plan advertising campaigns.
• Data about patterns of movement in an airport to set the rents for
shops and restaurants.
#5 NEW TYPES OF DATA
Tracking without permission
• Developments in technology such as IoT mean that the
traditional scenario in which people consciously provide their
personal data is no longer the only or main way in which
personal data is collected.
• For example by tracking online activity, rather than being
consciously provided by individuals - investigate the
possibility of using data from domestic smart meters to
predict the number of people in a household and whether
they include children or older people.
DATA
SCIENCE
ETHICS
FRAMEWORK
4
ALGORITHMIC ACCOUNTABILITY
Five Principles
Needs to be a person with the authority to deal with its adverse individual or societal effects in
a timely fashion. This is not a statement about legal responsibility but, rather, a focus on
avenues for redress, public dialogue, and internal authority for change.
RESPONSIBILI
TY
Any decisions produced by an algorithmic system should be explainable to the people affected
by those decisions. These explanations must be accessible and understandable to the target
audience; purely technical descriptions are not appropriate for the general public.
EXPLAINABILI
TY
Algorithms make mistakes, whether because of data errors in their inputs (garbage in, garbage out)
or statistical uncertainty in their outputs. The principle of accuracy suggests that sources of error
and uncertainty throughout an algorithm and its data sources need to be identified, logged, and
benchmarked. Understanding the nature of errors produced by an algorithmic system can inform
mitigation procedures.
ACCURACY
https://www.technologyreview.com/s/602933/how-to-hold-algorithms-accountable/
The principle of auditability states that algorithms should be developed to enable third parties to
probe and review the behavior of an algorithm. Enabling algorithms to be monitored, checked, and
criticized would lead to more conscious design and course correction in the event of failure.
AUDITABILITY
As algorithms increasingly make decisions based on historical and societal data, existing biases
and historically discriminatory human decisions risk being “baked in” to automated decisions. All
algorithms making decisions about individuals should be evaluated for discriminatory effects. The
results of the evaluation and the criteria used should be publicly released and explained.
FAIRENESS
WHERE DO
YOU GO
FROM
HERE?
DATA SCIENCE ACTIVITIES & ORG
MATURITY
Source: Booz Allen Hamilton
IMPLEMENTATION CONSTRAINTS
Source: Booz Allen Hamilton
OPERATING MODELS
Source: Booz Allen Hamilton
GDPR PREPARATION
“No legislation rivals the potential global impact of the EU’s General Data Protection
Regulation (GDPR), going into effect in April 2018. The new law will usher in cascading
privacy demands that will require a renewed focus on data privacy for US companies
that offer goods and services to EU citizens,” said Jay Cline, PwC’s US Privacy Leader.
“Businesses that do not comply with GDPR face a potential 4% fine of global revenues,
increasing the need to successfully navigate how to plan for and implement the
necessary changes.”
Source - http://www.pwc.com/us/en/press-releases/2017/pwc-gdpr-compliance-press-release.html
INFORMATION
SECURITY
TOP INITIATIVES
PRIVACY
POLICIES
GAP
ASSESSMENT
DATA
DISCOVERY
GOVERNANCE
FactGem is a platform that allows users to generate
their own visualization and analysis applications on top
of Neo4j, without the need to learn any other
programming language. FactGem makes data analysis
accessible to everyone, whether they’re a seasoned data
scientist or completely new to data science.
Through the integration of two platforms users can
access regulated data without worrying about the risk
of violating policies. This enables users to gain insight
into data without having to worry about writing code,
requesting data engineering support, or repercussions
for failing to add policies to data. This process
dramatically accelerates innovation across teams, as
the joint solution provides an end-to-end self-service
mechanism for analysts to exploit the most important
data within an organization.
BLOCKCHAIN IMPLEMENTATION
INTERNET OF EVERYTHING NEEDS LEDGER OF EVERYTHING
1. DECENTRALIZED (Shared
Control)
2. TRUSTED (Immutability /
Audit Trail)
3. PUBLIC (Tokens / Exchanges)
Algorithmic Law and Blockchain Enabled Automation
RESOURCES
- Machine Learning: The High-interest Credit Card of Technical Debt
- Attacking discrimination with smarter machine learning
- Rules of Machine Learning [43 rules]
THANK YOU
WHERE DATA SCIENCE MEETS CYBERSECURITY

More Related Content

What's hot

DBryant-Cybersecurity Challenge
DBryant-Cybersecurity ChallengeDBryant-Cybersecurity Challenge
DBryant-Cybersecurity Challenge
msdee3362
 
Information Leakage & DLP
Information Leakage & DLPInformation Leakage & DLP
Information Leakage & DLP
Yun Lu
 
wp-us-cities-exposed
wp-us-cities-exposedwp-us-cities-exposed
wp-us-cities-exposed
Numaan Huq
 
Opportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis InformaticsOpportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis Informatics
Lea Shanley
 
ZoomLens - Loveland, Subramanian -Tackling Info Risk
ZoomLens - Loveland, Subramanian -Tackling Info RiskZoomLens - Loveland, Subramanian -Tackling Info Risk
ZoomLens - Loveland, Subramanian -Tackling Info Risk
John Loveland
 

What's hot (20)

DBryant-Cybersecurity Challenge
DBryant-Cybersecurity ChallengeDBryant-Cybersecurity Challenge
DBryant-Cybersecurity Challenge
 
Data Breach Visualization
Data Breach VisualizationData Breach Visualization
Data Breach Visualization
 
Looking Forward - Regulators and Data Incidents
Looking Forward - Regulators and Data IncidentsLooking Forward - Regulators and Data Incidents
Looking Forward - Regulators and Data Incidents
 
Information Leakage & DLP
Information Leakage & DLPInformation Leakage & DLP
Information Leakage & DLP
 
Data Protection Maturity Survey Results 2013
Data Protection Maturity Survey Results 2013 Data Protection Maturity Survey Results 2013
Data Protection Maturity Survey Results 2013
 
wp-us-cities-exposed
wp-us-cities-exposedwp-us-cities-exposed
wp-us-cities-exposed
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...
 
Opportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis InformaticsOpportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis Informatics
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of age
 
2021 12 nyu-the_business_of_disinformation
2021 12 nyu-the_business_of_disinformation2021 12 nyu-the_business_of_disinformation
2021 12 nyu-the_business_of_disinformation
 
Article 1 currently, smartphone, web, and social networking techno
Article 1 currently, smartphone, web, and social networking technoArticle 1 currently, smartphone, web, and social networking techno
Article 1 currently, smartphone, web, and social networking techno
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
 
Spo2 t17
Spo2 t17Spo2 t17
Spo2 t17
 
Global Technology Outlook 2012 Booklet
Global Technology Outlook 2012 BookletGlobal Technology Outlook 2012 Booklet
Global Technology Outlook 2012 Booklet
 
ZoomLens - Loveland, Subramanian -Tackling Info Risk
ZoomLens - Loveland, Subramanian -Tackling Info RiskZoomLens - Loveland, Subramanian -Tackling Info Risk
ZoomLens - Loveland, Subramanian -Tackling Info Risk
 
Sj terp emerging tech radar
Sj terp emerging tech radarSj terp emerging tech radar
Sj terp emerging tech radar
 
MCCA Global TEC Forum - Bug Bounties, Ransomware, and Other Cyber Hype for Le...
MCCA Global TEC Forum - Bug Bounties, Ransomware, and Other Cyber Hype for Le...MCCA Global TEC Forum - Bug Bounties, Ransomware, and Other Cyber Hype for Le...
MCCA Global TEC Forum - Bug Bounties, Ransomware, and Other Cyber Hype for Le...
 
Proven Practices to Protect Critical Data - DarkReading VTS Deck
Proven Practices to Protect Critical Data - DarkReading VTS DeckProven Practices to Protect Critical Data - DarkReading VTS Deck
Proven Practices to Protect Critical Data - DarkReading VTS Deck
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...
 

Similar to Putting data science into perspective

Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
stilliegeorgiana
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
Trillium Software
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
Needa Multani
 
FINAL presentationMay2016
FINAL presentationMay2016FINAL presentationMay2016
FINAL presentationMay2016
Melissa Krasnow
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
Pranav Godse
 

Similar to Putting data science into perspective (20)

Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellence
 
From information to intelligence
From information to intelligence From information to intelligence
From information to intelligence
 
Big data security
Big data securityBig data security
Big data security
 
Big data security
Big data securityBig data security
Big data security
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid World
 
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of Graphs
 
Master Data in the Cloud: 5 Security Fundamentals
Master Data in the Cloud: 5 Security FundamentalsMaster Data in the Cloud: 5 Security Fundamentals
Master Data in the Cloud: 5 Security Fundamentals
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Internet of Things With Privacy in Mind
Internet of Things With Privacy in MindInternet of Things With Privacy in Mind
Internet of Things With Privacy in Mind
 
Big Data, Analytics and Data Science
Big Data, Analytics and Data ScienceBig Data, Analytics and Data Science
Big Data, Analytics and Data Science
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
The top trends changing the landscape of Information Management
The top trends changing the landscape of Information ManagementThe top trends changing the landscape of Information Management
The top trends changing the landscape of Information Management
 
Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Setting the right GDPR priorities
Setting the right GDPR prioritiesSetting the right GDPR priorities
Setting the right GDPR priorities
 
FINAL presentationMay2016
FINAL presentationMay2016FINAL presentationMay2016
FINAL presentationMay2016
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Putting data science into perspective

  • 1. PUTTING DATA SCIENCE INTO PERSPECTIVE PRIVACY & ETHICS SRAVAN ANKARAJU FOUNDER & CEO DIVERGENCE ACADEMY June 29th, 2017
  • 2. Is Fake News a Well-defined Machine Learning Problem? Source
  • 3. SRAVAN ANKARAJU • Technology Leader focused on Strategy & Innovation, Risk and Decision Management • 13.5 years with Microsoft in Technology Integration consulting, Developer Support – focus on DevOps & Agile Development • Big Picture educator – start with concepts & then play to learn advanced areas. Iterate and iterate often. • Implementer of Gamification systems in Learning/Training • Experienced in High volume & high performance transactional systems. • Data Analytics for various Fortune 100 companies FOUNDER & PRESIDENT
  • 6. 3 TRENDS SHAPING MACHINE LEARNING IN 2017 Algorithm Economy is on the Rise Expect more Interaction between Machine and Humans Giant companies will develop ML based AI systems Source: http://www.datasciencecentral.com/profiles/blogs/trends-shaping-machine-learning-in-2017 1 2 3
  • 8.
  • 9. AGENDA • What’s the catch if there is ton of goodness in AI-based systems • When do you get human involved • Global companies & Importance of May 25th, 2018 • Privacy Paradox & Distinctive aspects of Big Data • Data Science Ethics Framework • Where do you go from here / What you can do today PUTTING DATA SCIENCE INTO PERSPECTIVE
  • 11. To study possibly racist algorithms, professors have to sue the US http://arstechnica.com/tech-policy/2016/06/do-housing-jobs-sites-have-racist-algorithms-academics-sue-to-find-out/
  • 13. WHEN DO YOU GET A HUMAN INVOLVED?
  • 14.
  • 15. QUALITY ASSURANCE When does securing AI against attacks or reverse- engineering become more of an issue? It’s an issue now. One of my biggest learnings from [chatbot] Tay was that you need to build even AI that is resilient to attacks. It was fascinating to see what happened on Twitter, but for instance we didn’t face the same thing in China. Just the social conversation in China is different, and if you put it in the US corpus it’s different. Then, of course, there was a concerted attack. Just like you build software today that is resilient to a DDOS (Distributed Denial of Service) attack, you need to be able to be resilient to a corpus attack, that tries to pollute the corpus so that you pick up the wrong thing in your AI learners. - Satya Nadella, Microsoft CEO
  • 16. HUMAN INTERVENTION Whenever you have ambiguity and errors, you need to think about how you put the human in the loop and escalate to the human to make choices. That is the art form of an AI product. If you have ambiguity and error rates, you have to be able to handle exception. But first you have to detect that exception, and luckily enough in AI you have confidence and probability and distribution, so you have to use all of those to get the human in the loop. GOVERNANCE - Satya Nadella, Microsoft CEO
  • 18. GDPR Stricter rules will apply to the collection and use of personal data. Will apply for May 2018
  • 19. GLOBAL DATA PROTECTION REGULATIONOperational Impacts Mandatory Data- breach Protection Privacy Impact Assessments Right to be Forgotten Privacy by Design and Default Mandatory Data Protection Officers 1 2 3 4 5
  • 20. MANDATORY DATA-BREACH PROTECTION • Companies that experience data breaches will need to notify regulators and individuals whose personal data was compromised. • Companies will most likely want to avoid the negative publicity of these disclosures. Multinationals will gradually ramp up: • Comprehensive risk assessments • End-to-end security enhancements • Outsourced managed security services. Global Data Protection Regulation Operational Impact
  • 21. PRIVACY IMPACT ASSESSMENTS • Require companies to conduct data protection impact assessments (DPIAs) where their data processing operations are highly invasive. • Include marketing activities based on advanced profiling and analytics. • Privacy operations may need to extend outside the legal office, where it has traditionally resided, and into the day-to- day processes of European businesses. Global Data Protection Regulation Operational Impact
  • 22. RIGHT TO BE FORGOTTEN • Right to erasure could impose a significant burden on companies with personal data stored across multiple systems. • Companies may need to – • Maintain comprehensive data inventories • Accelerate data-governance strategies • Potentially re-architect key systems in order to more efficiently process Right to be erasure requests. Global Data Protection Regulation Operational Impact
  • 23. PRIVACY BY DESIGN AND DEFAULT • Privacy-friendly settings or postures—such as those that collect, retain, and share personal information – will be built into new products, devices, and business processes. • The flip-side of DPIAs, privacy-design requirements may give rise to a need for privacy engineers to embed privacy features throughout the daily operations of their businesses Global Data Protection Regulation Operational Impact
  • 24. MANDATORY DATA PROTECTION OFFICERS • Require large companies to appoint data protection officers (DPOs), if their core activities consist of large-scale, systematic monitoring of people. • DPOs will have to exhibit expertise in technology and business processes and project and program management, such as risk assessment and compliance monitoring skills. • Talent is in short supply. Global Data Protection Regulation Operational Impact
  • 26. PRIVACY PARADOX Price of using internet services • People may express concerns about the impact on their privacy of ‘creepy’ uses of their data, but in practice they contribute their data anyway via the online systems they use. In other words they provide the data because it is the price of using internet services. RIGHT TO MY IDENTITY Microsoft’s Digital Trends report 2015 noted a trend called Right to My Identity which means that, rather than simply wishing to preserve privacy through anonymity, a significant percentage of global consumers now want to be able to control how long information they have shared stays online, and are also interested in services that help them manage their digital identity. This suggests consumers have increasing expectations of how organizations will use their data and want to be able to influence it.
  • 27. A Lawyer and A Data Scientist Walk Into A Bar
  • 28. An organization wants to use data generated in different regulatory environments to learn about its customers or to predict their behavior. Some customers are in Germany, some are in Switzerland, and others are in the U.S. and Canada. How can a data scientist get the most out of this data without breaking the law, when each country has its own regulations on what he or she can do with the data?
  • 29. “Where the data subject has provided the personal data and the processing is based on consent or on a contract, the data subject shall have the right to transmit those personal data and any other information provided by the data subject and retained by an automated processing system, into another one, in an electronic format which is commonly used, without hindrance from the controller from whom the personal data are withdrawn.” GDPR CONTROVERSY Data Portability Legalese
  • 30. DISTINCTIVE ASPECTS OF BIG DATA ANALYTICS Potential implications for data protection Use of algorithms Opacity of processing Tendency to collect all the data Repurposing of data Use of new type of data 1 2 3 4 5
  • 31. #1 USE OF ALGORITHMS • Thinking with data: Find correlations / system learns • Acting with data: Applied to particular case in the Application phase Unpredictability by Design
  • 32. #2 OPACITY OF THE PROCESSING The ‘Black Box’ effect • Deep learning, involves feeding vast quantities of data through non-linear neural networks that classify the data based on the outputs from each successive layer. • The complexity of the processing of data through such massive networks creates a ‘black box’ effect. • Makes it very difficult to understand the reasons for decisions made as a result of deep learning.
  • 33. #3 USING ALL THE DATA n=all In a retail context it could mean analyzing all the purchases made by shoppers using a loyalty card, and using this to find correlations, rather than asking a sample of shoppers to take part in a survey.
  • 34. #4 REPURPOSING DATA Different than the original intent • Geolocated Twitter data to infer people’s residence and mobility patterns, to supplement official population estimates. • Geotagged photos on Flickr, together with the profiles of contributors, have been used as a reliable proxy for estimating visitor numbers at tourist sites and where the visitors have come from. • Mobile-phone presence data to analyze the foot traffic into the retail centers. • Data about where shoppers to plan advertising campaigns. • Data about patterns of movement in an airport to set the rents for shops and restaurants.
  • 35. #5 NEW TYPES OF DATA Tracking without permission • Developments in technology such as IoT mean that the traditional scenario in which people consciously provide their personal data is no longer the only or main way in which personal data is collected. • For example by tracking online activity, rather than being consciously provided by individuals - investigate the possibility of using data from domestic smart meters to predict the number of people in a household and whether they include children or older people.
  • 37. 4
  • 38. ALGORITHMIC ACCOUNTABILITY Five Principles Needs to be a person with the authority to deal with its adverse individual or societal effects in a timely fashion. This is not a statement about legal responsibility but, rather, a focus on avenues for redress, public dialogue, and internal authority for change. RESPONSIBILI TY Any decisions produced by an algorithmic system should be explainable to the people affected by those decisions. These explanations must be accessible and understandable to the target audience; purely technical descriptions are not appropriate for the general public. EXPLAINABILI TY Algorithms make mistakes, whether because of data errors in their inputs (garbage in, garbage out) or statistical uncertainty in their outputs. The principle of accuracy suggests that sources of error and uncertainty throughout an algorithm and its data sources need to be identified, logged, and benchmarked. Understanding the nature of errors produced by an algorithmic system can inform mitigation procedures. ACCURACY https://www.technologyreview.com/s/602933/how-to-hold-algorithms-accountable/ The principle of auditability states that algorithms should be developed to enable third parties to probe and review the behavior of an algorithm. Enabling algorithms to be monitored, checked, and criticized would lead to more conscious design and course correction in the event of failure. AUDITABILITY As algorithms increasingly make decisions based on historical and societal data, existing biases and historically discriminatory human decisions risk being “baked in” to automated decisions. All algorithms making decisions about individuals should be evaluated for discriminatory effects. The results of the evaluation and the criteria used should be publicly released and explained. FAIRENESS
  • 40. DATA SCIENCE ACTIVITIES & ORG MATURITY Source: Booz Allen Hamilton
  • 43. GDPR PREPARATION “No legislation rivals the potential global impact of the EU’s General Data Protection Regulation (GDPR), going into effect in April 2018. The new law will usher in cascading privacy demands that will require a renewed focus on data privacy for US companies that offer goods and services to EU citizens,” said Jay Cline, PwC’s US Privacy Leader. “Businesses that do not comply with GDPR face a potential 4% fine of global revenues, increasing the need to successfully navigate how to plan for and implement the necessary changes.” Source - http://www.pwc.com/us/en/press-releases/2017/pwc-gdpr-compliance-press-release.html INFORMATION SECURITY TOP INITIATIVES PRIVACY POLICIES GAP ASSESSMENT DATA DISCOVERY
  • 45. FactGem is a platform that allows users to generate their own visualization and analysis applications on top of Neo4j, without the need to learn any other programming language. FactGem makes data analysis accessible to everyone, whether they’re a seasoned data scientist or completely new to data science. Through the integration of two platforms users can access regulated data without worrying about the risk of violating policies. This enables users to gain insight into data without having to worry about writing code, requesting data engineering support, or repercussions for failing to add policies to data. This process dramatically accelerates innovation across teams, as the joint solution provides an end-to-end self-service mechanism for analysts to exploit the most important data within an organization.
  • 46. BLOCKCHAIN IMPLEMENTATION INTERNET OF EVERYTHING NEEDS LEDGER OF EVERYTHING 1. DECENTRALIZED (Shared Control) 2. TRUSTED (Immutability / Audit Trail) 3. PUBLIC (Tokens / Exchanges) Algorithmic Law and Blockchain Enabled Automation
  • 47. RESOURCES - Machine Learning: The High-interest Credit Card of Technical Debt - Attacking discrimination with smarter machine learning - Rules of Machine Learning [43 rules]
  • 48. THANK YOU WHERE DATA SCIENCE MEETS CYBERSECURITY