Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Adding Open Data Value to 'Closed Data' Problems

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 45 Anuncio

Adding Open Data Value to 'Closed Data' Problems

Descargar para leer sin conexión

Drawing on cutting edge examples from the University of Bristol and the City of Bristol, Simon will discuss innovative applications of data science that derive business value from open data through enriching and integrating with confidential 'closed data'. He also highlights recent technological advances that are enabling open data science on highly sensitive closed data.

Drawing on cutting edge examples from the University of Bristol and the City of Bristol, Simon will discuss innovative applications of data science that derive business value from open data through enriching and integrating with confidential 'closed data'. He also highlights recent technological advances that are enabling open data science on highly sensitive closed data.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Adding Open Data Value to 'Closed Data' Problems (20)

Anuncio

Más de Simon Price (20)

Más reciente (20)

Anuncio

Adding Open Data Value to 'Closed Data' Problems

  1. 1. Adding Open Data Value to 'Closed Data' Problems Dr Simon Price Research Fellow, University of Bristol Data Scientist, Capgemini Insights & Data
  2. 2. Who am I? • 30 years software development and leadership roles • Moved into Data Science via PhD in Machine Learning (2014) • Research Fellow in Machine Learning group  ~20 Machine Learning researchers • Led project to establish Bristol’s open research data repository • One of the organisers of Open Data Institute (ODI) Bristol • Data Scientist in Big Data Analytics team  ~100 Data Scientists, Big Data Engineers and Data Analysts • Focus on Open Source and Big Data technologies to solve client problems
  3. 3. Outline 1. Case study: open data + ‘closed data’ 2. Deriving value from open data 3. Data Science with ‘closed data’
  4. 4. Case study: SubSift Conferences using SubSift • ECML-PKDD: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases • KDD: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • PAKDD: Pacific-Asia Conference on Knowledge Discovery and Data Mining • SDM: SIAM International Conference on Data Mining Journals using SubSift • Machine Learning • Data Mining and Knowledge Discovery https://doi.org/10.1145/2979672
  5. 5. Initial problem addressed by SubSift Matching submitted conference papers to possible reviewers in Programme Committee
  6. 6. confidential ‘closed data’ open data
  7. 7. Initial SubSift workflow
  8. 8. Generic SubSift workflow
  9. 9. Personalised session recommendations
  10. 10. Expert finding
  11. 11. Why did SubSift recommend this person?
  12. 12. Profiling our organisation
  13. 13. Profiling staff at meetings
  14. 14. Open data opportunities?
  15. 15. Open research data • data.bris.ac.uk • Research data storage facility • Each researcher gets 10TB "forever"
  16. 16.  140+ datasets live on opendata.bristol.gov.uk  Mostly static but some real-time data  Examples • Government: Elections since 2007 • Community: Quality of Life survey • Education: School Results • Energy: Installed PV, Energy Use in Council Buildings • Environment: Real time & Historic Air Quality, Flood Alerts (EA) • Land use: 2013 Planning applications • Health: Life expectancy/ Mortality, Obesity, NHS Spend Open government data
  17. 17. Deriving value from open data 1. Data Science 2. Using open data to enrich and connect ’closed data’
  18. 18. statistics software engineering machine learning data science
  19. 19. statistics software engineering machine learning data science application domains research domains
  20. 20. Big Data Analytics Insights & Data www.capgemini.com/insights-data
  21. 21. 25Copyright © Capgemini 2017. All Rights Reserved June 2017 Example Data Science application Assurance Scoring http://ow.ly/4nbEUI Using existing enterprise data plus any useful open data, detect potentially fraudulent transactions
  22. 22. 26Copyright © Capgemini 2017. All Rights Reserved June 2017 Example Data Science application Assurance Scoring http://ow.ly/4nbEUI
  23. 23. 27Copyright © Capgemini 2017. All Rights Reserved June 2017 Machine Learning Transform Selection Model Training Validation Test Feature Extraction and Selection Model Building Variety of output files: logs, graphics, saved models, etc. Testing: Unit tests, monitoring tests and integration tests Vector Build Input Data Manipulate, Explore Data Machine Learning Framework (Python, Scala, Spark)
  24. 24. 28Copyright © Capgemini 2017. All Rights Reserved June 2017 Graph Links - Matching Key part of assurance scoring – bringing data together from disparate sources Probability of Match: 80% Attribute Data Source 1 Data Source 2 Name Richard Smith Rich Smith Phone Number 07123 456 789 07123 456 798 Favourite Sport Football Cricket
  25. 25. 29Copyright © Capgemini 2017. All Rights Reserved June 2017 Related to: - record linkage - duplicate detection - reference resolution - object identity - entity matching Connect graph descriptions using background knowledge from open data sources. e.g. Linked Open Data Advanced matching
  26. 26. 30Copyright © Capgemini 2017. All Rights Reserved June 2017 Linked Open Data
  27. 27. Data Science with ‘closed data’  The information contained in this presentation is proprietary. © 2012 Capgemini. All rights reserved. www.capgemini.com About Capgemini With more than 120,000 people in 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2011 global revenues of EUR 9.7 billion. Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business ExperienceTM, and draws on Rightshore ®, its worldwide delivery model. Rightshore® is a trademark belonging to Capgemini
  28. 28. Problems of opening up ‘closed data’
  29. 29. Research data now open by default - including sensitive data Funders Journals data.bris has 3 levels of access:
  30. 30. Data Science with ‘closed data’
  31. 31. Data science with ‘closed data’ • Custom R server running inside secure data repository / warehouse • Enables non-disclosive, remote analysis of sensitive research data.
  32. 32. Number of Letters NumberofWords Non-disclosive Disclosive
  33. 33. Non-disclosive visualisation
  34. 34. Single-partition DataSHIELD
  35. 35. Multiple-partition DataSHIELD
  36. 36. DataSHIELD partition models horizontal verticalideal
  37. 37. http://www.simonprice.info simon.price@capgemini.com @simonprice_info

×