SlideShare a Scribd company logo
1 of 1
Download to read offline
The x-CEG Conceptual Approach
ABSTRACT METHODOLOGY RESULTS DISCUSSION
REFERENCES
An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge
Kato Mivule
Doctoral Candidate, Computer Science Department
Bowie State University
Advisor: Claude Turner, Ph.D.
Associate Professor, Computer Science Department
Bowie State University
1. R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei, “Minimality Attack in Privacy
Preserving Data Publishing,” Proceedings of the 33rd international conference on Very
large data bases, pp. 543–554, 2007.
2. Krause and E. Horvitz, “A Utility-Theoretic Approach to Privacy in Online Services,”
Journal of Artificial Intelligence Research, vol. 39, pp. 633–662, 2010.
3. J. Kim, “A Method For Limiting Disclosure in Microdata Based Random Noise and
Transformation,” in Proceedings of the Survey Research Methods, American Statistical
Association,, 1986, vol. Jay Kim, A, no. 3, pp. 370–374.
4. M. Banerjee, “A utility-aware privacy preserving framework for distributed data mining
with worst case privacy guarantee,” University of Maryland, Baltimore County, 2011.
5. B. Liú, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Datacentric
Systems and Applications. Springer, 2011, pp. 124–125.
6. C. Dwork, “Differential Privacy,” in Automata languages and programming, vol. 4052, no.
d, M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, Eds. Springer, 2006, pp. 1–12.
7. K. Mivule, C. Turner, and S.-Y. Ji, “Towards A Differential Privacy and Utility Preserving
Machine Learning Classifier,” in Procedia Computer Science, 2012, vol. 12, pp. 176–181.
8. K. Mivule, “Utilizing Noise Addition for Data Privacy, an Overview,” in Proceedings of
the International Conference on Information and Knowledge Engineering (IKE 2012),
2012, pp. 65–71.
9. Frank, A., Asuncion, A. Iris Data Set, UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml/datasets/Iris]. Department of Information and Computer
Science, University of California, Irvine, CA (2010).
Acknowledgement
Special thanks to Dr. Soo-Yeon Ji, Dr. Hoda El-Sayed, Dr. Darsana Josyula, and the
Computer Science Department at Bowie State university.
THE EXPERIMENT
• Organizations by law have to safeguard the privacy of
individuals when handling data containing personal
identifiable information (PII).
• During the process of data privatization, the utility or
usefulness of the privatized data diminishes.
Data Privacy verses Data Utility
• Achieving an optimal balance between data privacy
and utility is an intractable problem.
• “Perfect privacy can be achieved by publishing
nothing at all, but this has no utility; perfect utility can
be obtained by publishing the data exactly as received,
but this offers no privacy” (Cynthia Dwork, 2006)
PRELIMINARIES
KNN classification of the original Iris dataset with classification error at
0.0400 (4 percent misclassified data)
KNN classification of the privatized Iris dataset with noise addition between
the mean and standard deviation.
KNN classification of the privatized Iris dataset with reduced noise addition
between mean = 0 and standard deviation = 0.1
A second run of the KNN classification of the privatized Iris
dataset with reduced noise addition between mean = 0 and
standard deviation = 0.1.
• The initial results from our investigation show that a
reduction in noise levels does affect the classification
error rate.
• However, this reduction in noise levels could lead to
low risky privacy levels.
• Finding the optimal balance between data privacy and
utility needs is still problematic.
• The level of noise does affect the classification error.
• Adjusting noise parameters is essential for lower
classification error.
• Precise classification (better utility) might mean low
privacy.
• Tradeoffs must be made between privacy and
utility.
IKE 2013 Conference on Information
and Knowledge Engineering
Las Vegas, NV, USA

More Related Content

What's hot

Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...Han Woo PARK
 
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
Plant leaf identification system using convolutional neural network
Plant leaf identification system using convolutional neural networkPlant leaf identification system using convolutional neural network
Plant leaf identification system using convolutional neural networkjournalBEEI
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 

What's hot (6)

Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...
 
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
Plant leaf identification system using convolutional neural network
Plant leaf identification system using convolutional neural networkPlant leaf identification system using convolutional neural network
Plant leaf identification system using convolutional neural network
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 

Viewers also liked (6)

Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
 
Machine learning clisification algorthims
Machine learning clisification algorthimsMachine learning clisification algorthims
Machine learning clisification algorthims
 
Knn
KnnKnn
Knn
 
Knn
KnnKnn
Knn
 
ML KNN-ALGORITHM
ML KNN-ALGORITHMML KNN-ALGORITHM
ML KNN-ALGORITHM
 
KNN
KNN KNN
KNN
 

Similar to Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge

An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationciakov
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
 
IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK Tom Moritz
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014Right to Research
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research DataRoss Mounce
 
Data Citation in The Dataverse Network
Data Citation in The Dataverse NetworkData Citation in The Dataverse Network
Data Citation in The Dataverse NetworkMicah Altman
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...Micah Altman
 
Evolution or revolution? The changing data landscape
Evolution or revolution? The changing data landscapeEvolution or revolution? The changing data landscape
Evolution or revolution? The changing data landscapeLizLyon
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Kato Mivule
 
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…ASIS&T
 
Secure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewSecure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewPhilip Bourne
 
Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13DataDryad
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgmandatascienceiqss
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital AgeEric Meyer
 
Participant-centered research design and “equal access” data sharing practice...
Participant-centered research design and “equal access” data sharing practice...Participant-centered research design and “equal access” data sharing practice...
Participant-centered research design and “equal access” data sharing practice...Jason Bobe
 

Similar to Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge (20)

An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovation
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Oess NCRM Festival
Oess NCRM FestivalOess NCRM Festival
Oess NCRM Festival
 
Data Citation in The Dataverse Network
Data Citation in The Dataverse NetworkData Citation in The Dataverse Network
Data Citation in The Dataverse Network
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...
 
Evolution or revolution? The changing data landscape
Evolution or revolution? The changing data landscapeEvolution or revolution? The changing data landscape
Evolution or revolution? The changing data landscape
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
 
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
 
Secure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewSecure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH View
 
Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital Age
 
Participant-centered research design and “equal access” data sharing practice...
Participant-centered research design and “equal access” data sharing practice...Participant-centered research design and “equal access” data sharing practice...
Participant-centered research design and “equal access” data sharing practice...
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 

More from Kato Mivule

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization Kato Mivule
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialKato Mivule
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoostKato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule
 
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierTowards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierKato Mivule
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...Kato Mivule
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Kato Mivule
 
Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaApplying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 
Utilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an OverviewUtilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an OverviewKato Mivule
 

More from Kato Mivule (19)

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy Engineering
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
 
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierTowards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview
 
Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaApplying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 
Utilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an OverviewUtilizing Noise Addition For Data Privacy, an Overview
Utilizing Noise Addition For Data Privacy, an Overview
 

Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge

  • 1. The x-CEG Conceptual Approach ABSTRACT METHODOLOGY RESULTS DISCUSSION REFERENCES An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge Kato Mivule Doctoral Candidate, Computer Science Department Bowie State University Advisor: Claude Turner, Ph.D. Associate Professor, Computer Science Department Bowie State University 1. R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei, “Minimality Attack in Privacy Preserving Data Publishing,” Proceedings of the 33rd international conference on Very large data bases, pp. 543–554, 2007. 2. Krause and E. Horvitz, “A Utility-Theoretic Approach to Privacy in Online Services,” Journal of Artificial Intelligence Research, vol. 39, pp. 633–662, 2010. 3. J. Kim, “A Method For Limiting Disclosure in Microdata Based Random Noise and Transformation,” in Proceedings of the Survey Research Methods, American Statistical Association,, 1986, vol. Jay Kim, A, no. 3, pp. 370–374. 4. M. Banerjee, “A utility-aware privacy preserving framework for distributed data mining with worst case privacy guarantee,” University of Maryland, Baltimore County, 2011. 5. B. Liú, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Datacentric Systems and Applications. Springer, 2011, pp. 124–125. 6. C. Dwork, “Differential Privacy,” in Automata languages and programming, vol. 4052, no. d, M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, Eds. Springer, 2006, pp. 1–12. 7. K. Mivule, C. Turner, and S.-Y. Ji, “Towards A Differential Privacy and Utility Preserving Machine Learning Classifier,” in Procedia Computer Science, 2012, vol. 12, pp. 176–181. 8. K. Mivule, “Utilizing Noise Addition for Data Privacy, an Overview,” in Proceedings of the International Conference on Information and Knowledge Engineering (IKE 2012), 2012, pp. 65–71. 9. Frank, A., Asuncion, A. Iris Data Set, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/datasets/Iris]. Department of Information and Computer Science, University of California, Irvine, CA (2010). Acknowledgement Special thanks to Dr. Soo-Yeon Ji, Dr. Hoda El-Sayed, Dr. Darsana Josyula, and the Computer Science Department at Bowie State university. THE EXPERIMENT • Organizations by law have to safeguard the privacy of individuals when handling data containing personal identifiable information (PII). • During the process of data privatization, the utility or usefulness of the privatized data diminishes. Data Privacy verses Data Utility • Achieving an optimal balance between data privacy and utility is an intractable problem. • “Perfect privacy can be achieved by publishing nothing at all, but this has no utility; perfect utility can be obtained by publishing the data exactly as received, but this offers no privacy” (Cynthia Dwork, 2006) PRELIMINARIES KNN classification of the original Iris dataset with classification error at 0.0400 (4 percent misclassified data) KNN classification of the privatized Iris dataset with noise addition between the mean and standard deviation. KNN classification of the privatized Iris dataset with reduced noise addition between mean = 0 and standard deviation = 0.1 A second run of the KNN classification of the privatized Iris dataset with reduced noise addition between mean = 0 and standard deviation = 0.1. • The initial results from our investigation show that a reduction in noise levels does affect the classification error rate. • However, this reduction in noise levels could lead to low risky privacy levels. • Finding the optimal balance between data privacy and utility needs is still problematic. • The level of noise does affect the classification error. • Adjusting noise parameters is essential for lower classification error. • Precise classification (better utility) might mean low privacy. • Tradeoffs must be made between privacy and utility. IKE 2013 Conference on Information and Knowledge Engineering Las Vegas, NV, USA