Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Are we reaching a data science singularity ?

649 visualizaciones

Publicado el

This year I have been so kindly invited for a keynote talk at Big Data Spain which will be held in Madrid 17-18 of November. This time, rather than diving into a specific technology or tool, I am reflecting on the state of data analytics, and how cloud, technology and data science are brewing a possible recipe for analytics at scale, towards ai, prescriptive analytics and cognitive processing.

Although it might be a bit ahead of the current state of development in analytical solutions and databases, I am starting to see clear early signals that something amazing is hatching in the realm of data processing, and I would like to share some of these facts/elements with the audience of Big Data Spain. I would like to stay grounded to the current technology developments but also let the imagination soar by showing that today in data analytics the sum is much more than the union of its parts.
Are we reaching a Data Science Singularity? - How Cognitive Computing is emerging from Machine Learning Algorithms, Big Data Tools, and Cloud Services

Prescriptive analytics is the ultimate analytical step which goes beyond predictions into the realm of goal-oriented recommendations. As such, we could consider prescriptive analytics as a particular sort of cognitive computing. In 2016, how far are we from cognitive computing actually? In this talk, I will describe the latest advances in machine learning algorithms, big data tools and cloud engineering practices.

These are the ingredients which are blended together to brew modern AI, prescriptive analytics and cognitive processing solutions. As data, and algorithms are made available into large cloud computing clusters, higher-level, cognitive-like services will solve real-world, complex and often ambiguous cases.

Finally, I will touch on the topic of meta-data science and how automated data science could (re)define the role of the data scientist in the coming years.

Applications

http://www.wsj.com/articles/googles-self-driving-car-program-odometer-reaches-2-million-miles-1475683321
http://www.nature.com/articles/srep26286

Why is AI so difficult?

http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html
http://www.forbes.com/sites/gilpress/2016/10/31/12-observations-about-artificial-intelligence-from-the-oreilly-ai-conference/
http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/
https://www.safaribooksonline.com/library/view/oreilly-ai-conference/9781491973912/video260721.html

Videos on AI

Yann LeCunn: https://youtu.be/_1Cyyt-4-n8
Andrej Karpathy: https://youtu.be/u6aEYuemt0M
Nando de Freitas: https://youtu.be/bEUX_56Lojc
Richard Socher: https://youtu.be/oGk1v1jQITw

for more info see:
https://www.linkedin.com/pulse/data-science-singularity-natalino-busa

Publicado en: Datos y análisis
  • Sé el primero en comentar

Are we reaching a data science singularity ?

  1. 1. 1 Natalino Busa - @natbusa Natalino Busa Head of Data Science Teradata Are we reaching a data science singularity?
  2. 2. 2 Natalino Busa - @natbusa
  3. 3. 3 Natalino Busa - @natbusa
  4. 4. 4 Natalino Busa - @natbusa
  5. 5. 5 Natalino Busa - @natbusa
  6. 6. 6 Natalino Busa - @natbusa What about (data) science? - technologies and tools are driving innovation in data analytics -
  7. 7. 7 Natalino Busa - @natbusa Man - Machine as integrated cognitive systems
  8. 8. 8 Natalino Busa - @natbusa Learning: The Scientific Method Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method observation hypothesis deduction synthesis Hans Christian Ørsted experiment Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
  9. 9. 9 Natalino Busa - @natbusa Innovation in Data Analytics Cloud Community AI & ML
  10. 10. 10 Natalino Busa - @natbusa Cloud
  11. 11. 11 Natalino Busa - @natbusa “we live in an age of open source datacenters, so we can stack all these things together and we have open source from the ground to ceiling.” Sam Ramji, CEO of Cloud Foundry https://www.youtube.com/watch?v=7oCSFcUW-Qk
  12. 12. 12 Natalino Busa - @natbusa Analytics in the cloud Bare Metal: Physical Machines IAAS: Virtual Resources CAAS: Containers, dPAAS: Datastores, Data Engines iPAAS: Tools Integration, Flows & Processes DAAAS: Data Analytics as a Service
  13. 13. 13 Natalino Busa - @natbusa DAAAS: AI and ML API’s Cloud Computing for Deep Neural Networks > Models, Compute (Train, Score), and Data AI and ML models for: ● Speech (audio) ● Language (text) ● Vision (images/video) ● Data (classification, regression, clustering, anomaly detection)
  14. 14. 14 Natalino Busa - @natbusa Ephemeral Computing Clusters on a Cloud data create load compute store timeline destroy
  15. 15. 15 Natalino Busa - @natbusa dPaaS: Analytical clusters Ephemeral Short-Lived Data Exploration Isolated, Personal Simple Access Management Permanent Long Lived Production / Operations Co-Ordinated Complex Access Management vs
  16. 16. 16 Natalino Busa - @natbusa GPU’s and Distributed Computing GPU support is coming in Kubernetes, Mesos, Spark https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark out up CPU R,Python Spark TensorFrames
  17. 17. 17 Natalino Busa - @natbusa Community
  18. 18. 18 Natalino Busa - @natbusa Community Develop - Use - Share
  19. 19. 19 Natalino Busa - @natbusa Sharing is caring … speed github.com + Jupyter notebooks, share ideas, code, and data arxiv.org share innovation and scientific results
  20. 20. 20 Natalino Busa - @natbusa Artificial Intelligence Machine Learning
  21. 21. 21 Natalino Busa - @natbusa Google: open-sources NLP parser scoring 95% in grammar accuracy https://github.com/tensorflow/models/tree/master/syntaxnet
  22. 22. 22 Natalino Busa - @natbusa Deep Learning in Language Parsing https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png
  23. 23. 23 Natalino Busa - @natbusa Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vec https://arxiv.org/pdf/1405.4053v2.pdf https://arxiv.org/pdf/1301.3781v3.pdf
  24. 24. 24 Natalino Busa - @natbusa Lip reading LipNet achieves 93.4% accuracy, on GRID corpus. https://arxiv.org/pdf/1611.01599v1.pdf
  25. 25. 25 Natalino Busa - @natbusa Ask me Anything Dynamic Memory Networks for Natural Language Processing https://arxiv.org/pdf/1603.01417v1.pdf https://youtu.be/oGk1v1jQITw Caiming Xiong, Stephen Merity, Richard Socher
  26. 26. 26 Natalino Busa - @natbusa Ask me Anything http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial Dynamic Memory Networks for Natural Language Processing https://arxiv.org/pdf/1603.01417v1.pdf http://www.socher.org/ Local context Wider context NLP, Attention Masks Semantic Embeddings from Text, Images
  27. 27. 27 Natalino Busa - @natbusa Network Traffic Patterns Classification
  28. 28. 28 Natalino Busa - @natbusa Network Intrusion Detection http://billsdata.net/?p=105 It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release). Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes Techniques: TDA, Dimensionality Reduction https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
  29. 29. 29 Natalino Busa - @natbusa Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur, Kaggle Grandmaster - data labels raw data: tables, files Useful dataData munging Feature Engineering Tabular Data ready for ML
  30. 30. 30 Natalino Busa - @natbusa AutoML challenge - based on scikit-learn - 15 classifiers, - 14 feature preprocessing methods - 4 data preprocessing methods - 110 hyperparameters - Supervised classification challenge: 100 different datasets https://arxiv.org/abs/1611.03824v1 Natalino Busa - @natbusa
  31. 31. 31 Natalino Busa - @natbusa Artificial + Human Intelligence
  32. 32. 32 Natalino Busa - @natbusa Human cognitive biases : Too much information Not enough meaning What should we remember? Need to act fast https://en.wikipedia.org/wiki/List_of_cognitive_biases
  33. 33. 33 Natalino Busa - @natbusa Man vs Machine cognitive limits Model generation Explanation Unsupervised Planning Too much information Not enough meaning Need to act quickly Memory limits
  34. 34. 34 Natalino Busa - @natbusa Theorems often tell us complex truths about the simple things, but only rarely tell us simple truths about the complex ones Marvin Minsky K-Linesː A Theory of Memory (1980)
  35. 35. 35 Natalino Busa - @natbusa Data Science: wear the AI/ML Lenses We are entering a new era of intelligent machines Boost our understanding of data Focus on higher level analyses
  36. 36. 36 Natalino Busa - @natbusa Intelligent Data Systems: Long live the “database” Wikipedia: A database is an organized collection of data. DATA New-SQL ML AI SQL Python - Scala - R NLP UX Speech COG
  37. 37. 37 Natalino Busa - @natbusa The Database. is never going to be the same.
  38. 38. 38 Natalino Busa - @natbusa Thank you. @natbusa
  39. 39. 39 Natalino Busa - @natbusa Credits Cover: courtesy of Big Data Spain - https://www.bigdataspain.org/ Pictures: https://commons.wikimedia.org/wiki/File:PurportedUFO2.jpg https://commons.wikimedia.org/wiki/File:Amazing_Stories_October_1957.jpg https://commons.wikimedia.org/wiki/File:DJI_Phantom_2_Vision%2B_V3_hovering_over_Weissfluhjoch_(cropped).jpg https://commons.wikimedia.org/wiki/File:Leonard_Nimoy_as_Spock_1967.jpg https://en.wikipedia.org/wiki/File:STUltimate_Cp.jpg https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png http://billsdata.net/wordpress/wp-content/uploads/2015/11/wikimap2.jpg http://billsdata.net/wordpress/wp-content/uploads/2015/11/netflow.png https://commons.wikimedia.org/wiki/File:Girls_learning_sign_language.jpg https://arxiv.org/pdf/1603.01417v1.pdf http://www.socher.org/uploads/Main/RichardSocher.jpg https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf https://commons.wikimedia.org/wiki/File:Cognitive_Bias_Codex_-_180%2B_biases,_designed_by_John_Manoogian_III_(jm3).jpg Visualizations: https://github.com/caffeinalab/siriwavejs https://gist.github.com/AnanthaRajuC/91beee3eb04d11cb3af5 https://dribbble.com/shots/1714369-Cortana-Animation Icons: Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
  40. 40. 40 Natalino Busa - @natbusa bonus slides
  41. 41. 41 Natalino Busa - @natbusa AI & ML: curated list of links Applications http://www.wsj.com/articles/googles-self-driving-car-program-odometer-reaches-2-million-miles-1475683321 http://www.nature.com/articles/srep26286 Why is AI so difficult? http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html http://www.forbes.com/sites/gilpress/2016/10/31/12-observations-about-artificial-intelligence-from-the-oreilly-ai-conference/ http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/ https://www.safaribooksonline.com/library/view/oreilly-ai-conference/9781491973912/video260721.html You Tube, great videos on AI Yann LeCunn: https://youtu.be/_1Cyyt-4-n8 Andrej Karpathy: https://youtu.be/u6aEYuemt0M Nando de Freitas: https://youtu.be/bEUX_56Lojc Richard Socher:https://youtu.be/oGk1v1jQITw
  42. 42. 42 Natalino Busa - @natbusa AI & ML: curated list of links NLP https://github.com/tensorflow/models/tree/master/syntaxnet https://arxiv.org/abs/1405.4053v2 https://arxiv.org/abs/1603.06042 https://arxiv.org/abs/1301.3781v3 Video, Images, Hybrid Deep Learning Networks https://arxiv.org/abs/1611.01599v1 https://arxiv.org/abs/1603.01417v1 Topological Data Analysys (TDA), Dim Reduction: https://en.wikipedia.org/wiki/Topological_data_analysis https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction Meta Learning: http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/ https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf https://arxiv.org/abs/1611.03824v1
  43. 43. 43 Natalino Busa - @natbusa Curated list of links Cognitive sciences: https://en.wikipedia.org/wiki/History_of_scientific_method https://en.wikipedia.org/wiki/List_of_cognitive_biases Cloud: The Making of a Cloud Native Application Platform - Sam Ramji https://www.youtube.com/watch?v=7oCSFcUW-Qk https://en.wikipedia.org/wiki/Ephemerality http://conferences.oreilly.com/oscon/oscon2011/public/schedule/detail/19812 GPU and distributed Computing: https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark Collaborative coding and research: https://github.com/tensorflow/models https://github.com/jupyter http://www.arxiv-sanity.com/

×