Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

A data view of the data science process

136 visualizaciones

Publicado el

Presentation at NUI Galway Business Analytics seminar.

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

A data view of the data science process

  1. 1. A data-view of the data science process Mathieu d’Aquin - @mdaquin Data Science Institute Insight Centre for Data Analytics NUI Galway
  2. 2. A data-view of the data science process Mathieu d’Aquin - @mdaquin Data Science Institute Insight Centre for Data Analytics NUI Galway
  3. 3. A data-view of the data science process Mathieu d’Aquin - @mdaquin Data Science Institute Insight Centre for Data Analytics NUI Galway Why am I talking to you about ?
  4. 4. Healthcare and medicine IoT and Smart-cities FinTech Education and Learning Digital humanities Media and social Media Agritech Environment and Sustainability Government and public sector Customer services Entertain. / creative sector
  5. 5. A data-view of the data science process Mathieu d’Aquin - @mdaquin Data Science Institute Insight Centre for Data Analytics NUI Galway ?
  6. 6. A data-view of the data science process Mathieu d’Aquin - @mdaquin Data Science Institute Insight Centre for Data Analytics NUI Galway ? As in Biology? Simplifying, the observation of naturally occurring phenomenons and principles in relation to data? As in Physics? Again simplifying, the theorisation and experimental verification of fundamental laws of data? As in Social Sciences? Really simplifying, the investigation and the social, economic or cultural implications of data on individuals, groups and society?
  7. 7. Hypo. / Question Plan Collect data Analyse data Extract results Exploit results
  8. 8. Hypo. / Question Plan Collect data Analyse data Extract results Exploit results Data Models New info What- ever was the goal
  9. 9. Hypo. / Question Plan Collect data Analyse data Extract results Exploit results Data Models New info What- ever was the goal The study of this process and its characteristics
  10. 10. Hypo. / Question Plan Collect data Analyse data Extract results Exploit results Data Models New info What- ever was the goal The study of those things and their characteristics
  11. 11. Dataset
  12. 12. Dataset Source Dataset Characteristics obtained from with derived from
  13. 13. Dataset License Regulation Source Dataset Characteristics associated with obtained from with derived from
  14. 14. Dataset License Regulation Source Dataset Characteristics Data Science Task associated with obtained from with derived from used for
  15. 15. Dataset License Regulation Source Dataset Characteristics Data Science Task Technique Parameters ... associated with obtained from with derived from used for implemented by using produced
  16. 16. Dataset License Regulation Source Dataset Characteristics Data Science Task Technique Model Model Parameters ... associated with obtained from with derived from used for implemented by using produced version of produced
  17. 17. Dataset License Regulation Source Dataset Characteristics Data Science Task Technique Model Model Parameters ... associated with obtained from with derived from used for implemented by using produced version of produced
  18. 18. Example: Describing a data process with ontologies (The Datanode ontology - E. Daga) A vocabulary to describe the relationships between input data set, intermediary data assets and the outputs of a data process.
  19. 19. Dataset License Regulation Source Dataset Characteristics Data Science Task Technique Model Model Parameters ... associated with obtained from with derived from used for implemented by using produced version of produced
  20. 20. Smart meter data Anonymisation Solar panel monitoring Anonymisation Weather data Location data Electricity tariff data analysisAnon data Anon data Model prediction/ recommendation Results
  21. 21. Smart meter data Anonymisation Solar panel monitoring Anonymisation Weather data Location data Electricity tariff data analysisAnon data Anon data Model prediction/ recommendation Results Data prot. Corp lic. 1 Corp lic. 2 Data prot. Data prot. User T&C OGL Corp lic. 3
  22. 22. Smart meter data Anonymisation Solar panel monitoring Anonymisation Weather data Location data Electricity tariff data analysisAnon data Anon data Model prediction/ recommendation Results Data prot. Corp lic. 1 Corp lic. 2 Data prot. Data prot. User T&C OGL Corp lic. 3 ?
  23. 23. Example: Machine readable policies and inference rules for their propagation (E. Daga)
  24. 24. Dataset License Regulation Source Dataset Characteristics Data Science Task Technique Model Model Parameters ... associated with obtained from with derived from used for implemented by using produced version of produced
  25. 25. Example: Studying large Data Science platforms (ongoing work - M. Adel)
  26. 26. Thousands of datasets used in thousands of data science processes. Allows us to better understand the tasks of data science, how they occur, in what contexts… As well as what characteristics of datasets lead to what use in data science processes.
  27. 27. Data Ethics Hypo. / Question Plan Collect data Analyse data Extract results Exploit results Where ethical implications are (might be) considered Where they are important
  28. 28. Towards a methodology for Ethics by Design in Data Science (with P. Troullinou) ‘Ethics by Design’ for Data Science Dialectic The process is based on a conversational approach between data and critical social scientists throughout the project’s life-cycle. Reflective Ethical concerns are not pre-fixed; they may emanate from any stage of the project; thus, constant reflexivity on activities and researchers is needed. Creative, not disruptive The objective of this process is to achieve a positive impact on the research, increase its value addressing ethics throughout the project’s life-cycle. All- encompassing Ethical concerns appear as much in the research activities as in their outcomes, their use and exploitation; the process needs to expand on all stages.
  29. 29. Using science fiction to guide ethical thinking Used/controlled by a small number of individuals Used/controlled by all Usedaccuratelyaccordingtointended purpose Hacked,biased,inaccurate S3E1: Nosedive S3E5: Men against fire S3E6: Hated in the nation S4E2: Arkangel S4E3: Crocodile S4E5: Metalhead S3E2: Playtest S2E1: Be right back S1E3: The Entire history of you
  30. 30. Using science fiction to guide ethical thinking Write scenarios, short stories, based on the following four premisses: In a near future, what I am developing/the results I will obtain will be... Used as intended by millions/most people/many people Used as intended a small group with control/power Abused, hacked, inaccurate or biased, while used by millions/most people/many people Abused, hacked, inaccurate or biased, while used by a small group with control/power What could possibly go wrong? (see Re-coding Black Mirror workshops)
  31. 31. Conclusion Data Science has grown very quickly as a discipline, to reach huge economic and societal impact. And it is not stopping. This is leading to the creation of a very large number of datasets, techniques, tools, models, approaches, methods, that are driven by practices and applications in various domains. The study of those artefacts is becoming critical, to extract the fundamental principles that guide data science as a discipline and a process. Understanding those principles is essential to drive the impact of data science in an informed way. Data science practice can support data science theory, but this is not a job for the data/computer scientist alone. It needs to be a conversation with social scientists, business experts, legal experts...
  32. 32. Mathieu d’Aquin @mdaquin mdaquin.net mathieu.daquin@nuigalway.ie

×