SlideShare una empresa de Scribd logo
1 de 20
Complex Adaptive Systems 2012 – Washington DC USA,
                                                  November 14-16




Towards A Differential Privacy and Utility Preserving
           Machine Learning Classifier

       Kato Mivule, Claude Turner, and Soo-Yeon Ji

             Computer Science Department
                Bowie State University

  Complex Adaptive Systems 2012 – Washington DC USA,
                    November 14-16

                                                                                     1
Complex Adaptive Systems 2012 – Washington DC USA,
Outline                                   November 14-16




     Introduction
     Related work
     Essential Terms
     Methodology
     Results
     Conclusion




                                                                       2
Introduction

                 Entities transact in ‘big data’ containing personal identifiable
                  information (PII).

                 Organizations are bound by federal and state law to ensure data privacy.

                 In the process to achieve privacy, the utility of privatized datasets
                  diminishes.

                 Achieving balance between privacy and utility is an ongoing problem.

                 Therefore, we investigate a differential privacy preserving machine
                  learning classification approach that seeks an acceptable level of
                  utility.


Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                         3
Related Work
   There is a growing interest in investigating privacy preserving data mining
   solutions that provide a balance between data privacy and utility.

            Kifer and Gehrke (2006) did a broad study of enhanced data utility in
             privacy preserving data publishing by using statistical approaches.

            Wong (2007) described how achieving global optimal privacy while
             maintaining utility is an NP-hard problem.

            Krause and Horvitz (2010) noted that endeavours of finding trade-offs
             between privacy and utility is still an NP-hard problem.

            Muralidhar and Sarathy (2011) showed that differential privacy provides
             strong privacy guarantees but utility is still a problem due to noise levels.

            Finding the optimal balance between privacy and utility remains a
             challenge—even with differential privacy.                                       4
Complex Adaptive Systems 2012 – Washington DC USA, November 14-16
Data Utility verses Privacy

          Data utility is the extent of how useful a published dataset is to the
           consumer of that publicized dataset.

          In the course of a data privacy process, original data will lose statistical
           value despite privacy guarantees.




                                                 Image Source: Kenneth Corbin/Internet News.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                              5
Objective

                  Achieving an optimal balance between data privacy and utility
                   remains an ongoing challenge.

                  Such optimality is highly desired and remains our investigation goal.




                                                 Image Source: Wikipedia, on Confidentiality.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                               6
Ensemble classification
          Is a machine learning process, in which a collection of several
           independently trained classifiers are merged to achieve better prediction.




          Examples include single trained decision trees joined to make accurate
           predictions.
Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                   7
AdaBoost Ensemble – Adaptive Boosting
          Proposed by Freund and Schapire (1995), uses several iterations by adding weak
           learners to create a powerful learner, adjusting weights to center on misclassified
           data in earlier iterations.

          Classification Error in AdaBoost Ensemble is computed as below:




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                                8
AdaBoost Ensemble (Cont’d )
          AdaBoost Ensemble computes as follows:




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16   9
Differential Privacy




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16   10
Differential Privacy (Cont’d)




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16   11
Methodology (Cont’d)

          We utilized a public available Barack Obama 2008 campaign donations dataset.

          The data set contained 17,695 records of original unperturbed data.

          Two attributes, the donation amount and income status, are utilized to classify data
           into three classes.

          The three classes are low income, middle income, and high income, for donations
           $1 to $49, $50 to $80, $81 and above respectively.

          Validating our approach, the dataset comprised 50 percent on training and the
           remainder on testing, on both Original and Privatized datasets.

          Oracle database is queried via MATLAB ODBC connector. MATLAB is used for
           differential privacy and machine learning classification.


Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                           12
Results

          Essential statistical traits of the original and differential privacy datasets,
           a necessary requirement to publish privatized datasets, are kept.

          As depicted, the mean, standard deviation, and variance of the original
           and differential privacy datasets remained the same.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                        13
Results (Cont’d)
          There is a strong positive covariance of 1060.8 between the two datasets, which
           means that they grow simultaneously, as illustrated below:




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                            14
Results (Cont’d)
          There is almost no correlation (the correlation was 0.0054) between the
           original and differentially privatized datasets.

          Indicates some privacy assurances, and difficulty for an attacker, dealing
           only with the privatized dataset, to correctly infer any alterations.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                       15
Results (Cont’d)
          After applying differential privacy, AdaBoost ensemble classifier is
           performed.

          The outcome of the donors’ dataset was Low, Middle, and High income,
           for donations 0 to 50, 51 to 80, and 81 to 100, respectively.

          This same classification outcome is used for the perturbed dataset to
           investigate whether the classifier would categorize the perturbed dataset
           correctly.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                  16
Results (Cont’d)

          The training dataset from the original data showed that the classification
           error dropped from 0.25 to 0 with increased weak decision tree learners.

          The results changed with the training dataset on the differentially private
           data when the classification error dropped from 0.588 to 0.58.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                   17
Results (Cont’d)
   When the same procedure is applied to the test dataset of the original data the
    classification error dropped from 0.03 to 0.

   However, when this procedure perform on the differentially private data, the error rate
    did not change even with increased number of weak decision tree.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                       18
Conclusion
   In this study, we found that while differential privacy might guarantee strong
    confidentiality, providing data utility still remains a challenge.

   However, this study is instructive in a variety of ways:

               The level of Laplace noise does affect the classification error.

               Increasing the number of weak learners is not too significant.

               Adjusting the Laplace noise parameter, ε, is essential for further study.

               However, accurate classification means loss of privacy.

               Tradeoffs must be made between privacy and utility.

               We plan on investigating optimization approaches for such tradeoffs.
Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                       19
Complex Adaptive Systems 2012 – Washington DC USA,
  Questions?                                          November 14-16




Contact:
Kato Mivule: kmivule@gmail.com



                            Thank You.




                                                                                    20

Más contenido relacionado

La actualidad más candente

Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction Systemijtsrd
 
Deep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial IntelligenceDeep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial IntelligenceRukshan Batuwita
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++Afraz Khan
 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)cairo university
 
Fuzzy Logic ppt
Fuzzy Logic pptFuzzy Logic ppt
Fuzzy Logic pptRitu Bafna
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?
Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?
Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?Kirill Eremenko
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MININGshivaniyadav112
 
Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4
Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4
Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4Kirill Eremenko
 

La actualidad más candente (20)

Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
 
Deep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial IntelligenceDeep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial Intelligence
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 
Final ppt
Final pptFinal ppt
Final ppt
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++
 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)
 
Fuzzy Logic ppt
Fuzzy Logic pptFuzzy Logic ppt
Fuzzy Logic ppt
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?
Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?
Deep Learning A-Z™: Self Organizing Maps (SOM) - How Do SOMs Work?
 
My8clst
My8clstMy8clst
My8clst
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
 
Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4
Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4
Deep Learning A-Z™: Self Organizing Maps (SOM) - Module 4
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 

Destacado

Wonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley IntroductionWonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley Introductiongmesmatch
 
Wmit introduction 2012 english
Wmit introduction 2012 englishWmit introduction 2012 english
Wmit introduction 2012 englishgmesmatch
 
Presentazione Peopleware Marcom
Presentazione Peopleware MarcomPresentazione Peopleware Marcom
Presentazione Peopleware Marcomrobertoiacobino
 
Реальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовРеальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовarsney
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plansAji Subramanyan
 
June 2013 IRMAC slides
June 2013 IRMAC slidesJune 2013 IRMAC slides
June 2013 IRMAC slidesAlistair Croll
 
Baker Business Bootcamp
Baker Business BootcampBaker Business Bootcamp
Baker Business BootcampLGLG Ministry
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013Jennifer L. Scheffer
 
4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip4 Seasons Virtual Field Trip
4 Seasons Virtual Field Triphhfricke
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 
Resolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina DisplaysResolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina Displayssteveschrab
 
17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringan17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringanAn Atsa
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
 

Destacado (20)

AM01PRO
AM01PROAM01PRO
AM01PRO
 
Wonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley IntroductionWonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley Introduction
 
Carta mordiscon
Carta mordisconCarta mordiscon
Carta mordiscon
 
Wmit introduction 2012 english
Wmit introduction 2012 englishWmit introduction 2012 english
Wmit introduction 2012 english
 
Presentazione Peopleware Marcom
Presentazione Peopleware MarcomPresentazione Peopleware Marcom
Presentazione Peopleware Marcom
 
Iltabloidmotori
IltabloidmotoriIltabloidmotori
Iltabloidmotori
 
About P&T
About P&TAbout P&T
About P&T
 
Реальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовРеальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторов
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plans
 
June 2013 IRMAC slides
June 2013 IRMAC slidesJune 2013 IRMAC slides
June 2013 IRMAC slides
 
Baker Business Bootcamp
Baker Business BootcampBaker Business Bootcamp
Baker Business Bootcamp
 
Oumh1103 bab 4
Oumh1103 bab 4Oumh1103 bab 4
Oumh1103 bab 4
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
 
4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 
Resolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina DisplaysResolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina Displays
 
17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringan17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringan
 
Vocab dict
Vocab dictVocab dict
Vocab dict
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
 
Burton Industries ppt 2012
Burton Industries ppt 2012Burton Industries ppt 2012
Burton Industries ppt 2012
 

Similar a Towards A Differential Privacy Preserving Utility Machine Learning Classifier

Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databasestusharjadhav2611
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine LearningDelip Rao
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moVinaOconner450
 
Government Linked Data Projects in the Wild
Government Linked Data Projects in the WildGovernment Linked Data Projects in the Wild
Government Linked Data Projects in the WildBernadette Hyland-Wood
 
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESSDETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESSijcsit
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs3 Round Stones
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison Esteban Alcaide
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls Dan Elton
 
Second Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the BiosciencesSecond Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the BiosciencesPhilip Bourne
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseWanBK Leo
 
Information Literacy in an Age of Algorithms
Information Literacy in an Age of AlgorithmsInformation Literacy in an Age of Algorithms
Information Literacy in an Age of AlgorithmsKristen Yarmey
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsHong-Linh Truong
 
Modeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender SystemsModeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender Systemskib_83
 

Similar a Towards A Differential Privacy Preserving Utility Machine Learning Classifier (20)

Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
Datamodelling
DatamodellingDatamodelling
Datamodelling
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated mo
 
Government Linked Data Projects in the Wild
Government Linked Data Projects in the WildGovernment Linked Data Projects in the Wild
Government Linked Data Projects in the Wild
 
parth presentation
parth presentationparth presentation
parth presentation
 
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESSDETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
Second Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the BiosciencesSecond Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the Biosciences
 
Determining Business Intelligence Usage Success
Determining Business Intelligence Usage SuccessDetermining Business Intelligence Usage Success
Determining Business Intelligence Usage Success
 
Determining Business Intelligence Usage Success
Determining Business Intelligence Usage SuccessDetermining Business Intelligence Usage Success
Determining Business Intelligence Usage Success
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To Database
 
Information Literacy in an Age of Algorithms
Information Literacy in an Age of AlgorithmsInformation Literacy in an Age of Algorithms
Information Literacy in an Age of Algorithms
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data Analytics
 
Modeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender SystemsModeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender Systems
 

Más de Kato Mivule

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization Kato Mivule
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialKato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsKato Mivule
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoostKato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...Kato Mivule
 

Más de Kato Mivule (20)

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy Engineering
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
 

Último

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Towards A Differential Privacy Preserving Utility Machine Learning Classifier

  • 1. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 Towards A Differential Privacy and Utility Preserving Machine Learning Classifier Kato Mivule, Claude Turner, and Soo-Yeon Ji Computer Science Department Bowie State University Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 1
  • 2. Complex Adaptive Systems 2012 – Washington DC USA, Outline November 14-16  Introduction  Related work  Essential Terms  Methodology  Results  Conclusion 2
  • 3. Introduction  Entities transact in ‘big data’ containing personal identifiable information (PII).  Organizations are bound by federal and state law to ensure data privacy.  In the process to achieve privacy, the utility of privatized datasets diminishes.  Achieving balance between privacy and utility is an ongoing problem.  Therefore, we investigate a differential privacy preserving machine learning classification approach that seeks an acceptable level of utility. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 3
  • 4. Related Work There is a growing interest in investigating privacy preserving data mining solutions that provide a balance between data privacy and utility.  Kifer and Gehrke (2006) did a broad study of enhanced data utility in privacy preserving data publishing by using statistical approaches.  Wong (2007) described how achieving global optimal privacy while maintaining utility is an NP-hard problem.  Krause and Horvitz (2010) noted that endeavours of finding trade-offs between privacy and utility is still an NP-hard problem.  Muralidhar and Sarathy (2011) showed that differential privacy provides strong privacy guarantees but utility is still a problem due to noise levels.  Finding the optimal balance between privacy and utility remains a challenge—even with differential privacy. 4 Complex Adaptive Systems 2012 – Washington DC USA, November 14-16
  • 5. Data Utility verses Privacy  Data utility is the extent of how useful a published dataset is to the consumer of that publicized dataset.  In the course of a data privacy process, original data will lose statistical value despite privacy guarantees. Image Source: Kenneth Corbin/Internet News. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 5
  • 6. Objective  Achieving an optimal balance between data privacy and utility remains an ongoing challenge.  Such optimality is highly desired and remains our investigation goal. Image Source: Wikipedia, on Confidentiality. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 6
  • 7. Ensemble classification  Is a machine learning process, in which a collection of several independently trained classifiers are merged to achieve better prediction.  Examples include single trained decision trees joined to make accurate predictions. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 7
  • 8. AdaBoost Ensemble – Adaptive Boosting  Proposed by Freund and Schapire (1995), uses several iterations by adding weak learners to create a powerful learner, adjusting weights to center on misclassified data in earlier iterations.  Classification Error in AdaBoost Ensemble is computed as below: Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 8
  • 9. AdaBoost Ensemble (Cont’d )  AdaBoost Ensemble computes as follows: Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 9
  • 10. Differential Privacy Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 10
  • 11. Differential Privacy (Cont’d) Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 11
  • 12. Methodology (Cont’d)  We utilized a public available Barack Obama 2008 campaign donations dataset.  The data set contained 17,695 records of original unperturbed data.  Two attributes, the donation amount and income status, are utilized to classify data into three classes.  The three classes are low income, middle income, and high income, for donations $1 to $49, $50 to $80, $81 and above respectively.  Validating our approach, the dataset comprised 50 percent on training and the remainder on testing, on both Original and Privatized datasets.  Oracle database is queried via MATLAB ODBC connector. MATLAB is used for differential privacy and machine learning classification. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 12
  • 13. Results  Essential statistical traits of the original and differential privacy datasets, a necessary requirement to publish privatized datasets, are kept.  As depicted, the mean, standard deviation, and variance of the original and differential privacy datasets remained the same. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 13
  • 14. Results (Cont’d)  There is a strong positive covariance of 1060.8 between the two datasets, which means that they grow simultaneously, as illustrated below: Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 14
  • 15. Results (Cont’d)  There is almost no correlation (the correlation was 0.0054) between the original and differentially privatized datasets.  Indicates some privacy assurances, and difficulty for an attacker, dealing only with the privatized dataset, to correctly infer any alterations. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 15
  • 16. Results (Cont’d)  After applying differential privacy, AdaBoost ensemble classifier is performed.  The outcome of the donors’ dataset was Low, Middle, and High income, for donations 0 to 50, 51 to 80, and 81 to 100, respectively.  This same classification outcome is used for the perturbed dataset to investigate whether the classifier would categorize the perturbed dataset correctly. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 16
  • 17. Results (Cont’d)  The training dataset from the original data showed that the classification error dropped from 0.25 to 0 with increased weak decision tree learners.  The results changed with the training dataset on the differentially private data when the classification error dropped from 0.588 to 0.58. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 17
  • 18. Results (Cont’d)  When the same procedure is applied to the test dataset of the original data the classification error dropped from 0.03 to 0.  However, when this procedure perform on the differentially private data, the error rate did not change even with increased number of weak decision tree. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 18
  • 19. Conclusion  In this study, we found that while differential privacy might guarantee strong confidentiality, providing data utility still remains a challenge.  However, this study is instructive in a variety of ways:  The level of Laplace noise does affect the classification error.  Increasing the number of weak learners is not too significant.  Adjusting the Laplace noise parameter, ε, is essential for further study.  However, accurate classification means loss of privacy.  Tradeoffs must be made between privacy and utility.  We plan on investigating optimization approaches for such tradeoffs. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 19
  • 20. Complex Adaptive Systems 2012 – Washington DC USA, Questions? November 14-16 Contact: Kato Mivule: kmivule@gmail.com Thank You. 20