Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Dark data

1.016 visualizaciones

Publicado el

A Short Presentation on Dark Data containing an introduction on tools and frameworks one needs to utilize it.

Publicado en: Datos y análisis
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • how can i verify my acc.?
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Dark data

  1. 1. Amir Sedighi February 2017 Dark Data Risks and Opportunities @amirsedighi
  2. 2. Speaker Amir Sedighi Software Engineer
 Data Solutions Architect Founder at recommender.ir twitter: @amirsedighi
  3. 3. By even the most conservative estimates, the amount of data in the world doubles every two years. Data Era
  4. 4. May Venn Diagram helps us! Big Data
  5. 5. May Venn Diagram helps us! Tabular/ Relational/ RDBMS Data Big Data
  6. 6. May Venn Diagram helps us! Dark Data Tabular/ Relational/ RDBMS Data Big Data
  7. 7. May Venn Diagram helps us! Dark Data Tabular/ Relational/ RDBMS Data (Structured/Unstructured) (Almost Unstructured) (Structured) Big Data
  8. 8. May Venn Diagram helps us! Dark Data Tabular/ Relational/ RDBMS Data (Structured/Unstructured) (Almost Unstructured) (Structured) Big Data Almost can’t be processed or analyzed
  9. 9. Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Dark Data Definition by Gartner
  10. 10. Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Dark Data Definition by Gartner
  11. 11. Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value. Dark Data Definition by Gartner
  12. 12. Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value. Dark Data Definition by Gartner
  13. 13. Dark Data - A more Sensible Definition
  14. 14. Dark Data - A more Sensible Definition Organizations Generate and Gather Data
  15. 15. Dark Data - A more Sensible Definition Organizations Generate and Gather Data A large portion of the collected data are never even analyzed!
  16. 16. Dark Data - A more Sensible Definition Organizations Generate and Gather Data A large portion of the collected data are never even analyzed! 90% of the data are never analyzed
  17. 17. Dark Data - A more Sensible Definition Organizations Generate and Gather Data A large portion of the collected data are never even analyzed! 90% of the data are never analysed. • Customer Information • Log Files • Previous Employee Information • Previous Webpages • Sensor Data • Email Correspondences • Account Information • Notes or Presentations • Old Versions of Relevant Documents
  18. 18. 80%..90% is Dark Data
  19. 19. Does Your Org have any Dark Data? I am just going to check if we have any dark data in the cellar…
  20. 20. Brining Dark Data into Light 1. Gathering 2. Storing/Processing 3. Analyzing and Bringing it into decisions
  21. 21. Brining Dark Data into Light
  22. 22. Brining Dark Data into Light
  23. 23. Brining Dark Data into Light
  24. 24. Brining Dark Data into Light
  25. 25. Brining Dark Data into Light
  26. 26. Brining Dark Data into Light
  27. 27. Question All companies know data is going to provide value. Why there is so much of dark data?
  28. 28. Why there is so much of dark data? • Lack of insight about data • Lack of ambitions to improve • Disconnect among departments • Lopsided priorities • Lack of technologies to Capture and Store • Lack of resources/infrastructures to make it available • Lack of CPU and technics to analyze the data
  29. 29. The issues you face with Dark Data • Legal and Regulatory Issues • Loss of Reputation • Intelligence Risk • Operation Costs • Opportunity Costs
  30. 30. Some essential questions • What can we gather? • What may we extract from it? • How we may prune it? • How long should we keep it? • What are the storage options? • What are the processing options? • How much is the value of each block of data (Approximately) • Running limited boundary scenarios
  31. 31. Software Tools & Frameworks on DD
  32. 32. Software Tools & Frameworks on DD
  33. 33. Software Tools & Frameworks on DD Log Management
  34. 34. Software Tools & Frameworks on DD Indexing and Search
  35. 35. Software Tools & Frameworks on DD Data Streaming
  36. 36. Software Tools & Frameworks on DD
  37. 37. Software Tools & Frameworks on DD
  38. 38. Software Tools & Frameworks on DD Machine Learning and Graph Processing • Mahout • MLLib • FlinkMK • Theano • Torch • TensorFlow • GraphX • Gelly
  39. 39. A common Pipeline Machine Learning Steam Processing Query Already Processed Data Real World RT Events
  40. 40. A common Pipeline Machine Learning Steam Processing Query Already Processed Data Real World RT Events New Pipeline
  41. 41. Questions? Keep in touch: twitter: @amirsedighi
  42. 42. 1. http://www.gartner.com/it-glossary/dark-data/ 2. http://www.itproportal.com/2016/03/07/5-benefits-of-putting-dark-data-to-work/ 3. http://www.kdnuggets.com/2015/11/importance-dark-data-big-data-world.html 4. https://www.youtube.com/watch?v=_fBMmQo-Z4E 5. http://confluent.io 6. https://www.ecmconnection.com/doc/the-various-shades-of-dark-data-0001 7. https://www.datanami.com/2015/11/30/spark-streaming-what-is-it-and-whos-using-it/ References

×