Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

183 visualizaciones

Publicado el

Talk at the Big Data Europe SC6 workshop number 3 taking place on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: The Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce.

Publicado en: Datos y análisis
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

  1. 1. Big Data Europe Apps, challenges, goals Ir. Aad Versteden, TenForce SC6 workshop
  2. 2. Platform Goals “Your data has value, why don’t you unlock it?”
  3. 3. ◎ What is Big Data? o Volume o Velocity o Variety o Veracity Platform Goals
  4. 4. Key actors
  5. 5. Platform Goals ◎ Easy to o Install o Develop o Deploy o Integrate
  6. 6. Societal Challenges Different domains with pilot cases validating the platform
  7. 7. Societal Challenges ◎ Health ◎ Food ◎ Energy ◎ Transport ◎ Climate ◎ Social Sciences ◎ Security
  8. 8. SC4: Transport ◎ Show and predict traffic jams ◎ ~ taxi fleet shares GPS data ◎ Big Data? o Velocity o [Volume]
  9. 9. SC4: Transport
  10. 10. SC3: Energy ◎ Preventative maintenance by vibration analysis ◎ Big Data? o High Volume (batch) o High Velocity (live)
  11. 11. SC7: Security ◎ Detect change in human constructions, link to news events ◎ Big Data? o Volume
  12. 12. SC7: Security
  13. 13. SC1: Health ◎ Can we use open source to answer Pharma questions? ◎ Large semantic graph, complex questions ◎ Big Data? o Variety
  14. 14. SC2: Food ◎ Mine viticulture research & share semantic information ◎ Big Data? o Variety
  15. 15. SC2: Food
  16. 16. SC5: Climate ◎ Where did an airborne risk come from? ◎ Precalculate emission spots with common weather patterns ◎ Big Data? o Volume
  17. 17. SC5: Climate
  18. 18. SC6: Social Sciences Martin will tell you later :-)
  19. 19. Platform architecture
  20. 20. Key actors
  21. 21. Platform Architecture 21
  22. 22. 22 Platform Architecture
  23. 23. Platform Architecture 23
  24. 24. Platform Architecture Support Layer Init Daemon GUIs Monitor App Layer Traffic Forecast Satellite Image Analysis Platform Layer Spark Flink Semantic Layer Ontario SANSA Semagrow Kafka Real-time Stream Monitoring ... ... Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, …) Data Layer Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
  25. 25. Supported Frameworks Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 25
  26. 26. Platform Architecture 26
  27. 27. Making Big Data Accessible How do we make it easy? 27
  28. 28. Platform Goals ◎ Easy to o Install o Develop o Deploy o Integrate
  29. 29. Actors ◎ Install stack ◎ Develop ◎ Deploy ◎ Monitor results 29
  30. 30. Platform installation ◎ Manual installation guide ◎ Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal ◎ Screencasts 30
  31. 31. BDI Stack Lifecycle
  32. 32. BDI Stack Lifecycle Developing Custom Applications
  33. 33. ◎ High level picture o docker-compose.yml describes pipeline topology ◎ BDE provided components o extend template image with your code ◎ New components o build a Docker image for your component o this is your own little Virtual Machine for your component ◎ Sharing o publish topology as git repository o publish new components on docker hub Platform development
  34. 34. Development ◎ Base Docker images o Serve as a template for a (Big Data) technology o Easily extendable custom algorithm/data ◎ Published components o Image repositories on GitHub o Automated builds on DockerHub o Documentation on BDE Wiki 34
  35. 35. BDI Stack Lifecycle Docker Images
  36. 36. BDI Stack Lifecycle BDI Stack (workflow) builder
  37. 37. BDI Stack Lifecycle Custom Components *Init Daemon *Integrator UI
  38. 38. Enhancing the Component ◎ Orchestrator required for initialization process (init_daemon) o Components may depend on each other o Components may require manual intervention ◎ User Interface Integration o Standard Interfaces from components o Combine and align the interfaces 38
  39. 39. BDI Stack Lifecycle Deploy BDE Platform/Stack to the Cluster
  40. 40. Deploying a Big Data Stack ◎ Stack o collection of communicating components o to solve a specific problem ◎ Described in Docker Compose o Component configuration o Application topology 40
  41. 41. BDI Stack Lifecycle Stack/Cluster Monitor
  42. 42. User Interfaces ◎ Make it easy to use ◎ Available interfaces o Stack Builder o Swarm UI o Workflow Builder o BDI Integrator 42
  43. 43. BDE Workflow Builder 43
  44. 44. BDE Workflow Monitor 44
  45. 45. Swarm UI
  46. 46. Swarm UI 46
  47. 47. Integrator UI 47
  48. 48. Beyond the state of the art ... Smart Big Data Increase the value of Big Data by adding meaning to it! 48
  49. 49. Semantic Data Lake (Ontario) ◎ Data Swamp o Repository of data in its raw format o Structured, semi-structured, unstructured o Schema-less ◎ Data Lake o Add a Semantic layer on top of the source datasets o The data is semantically lifted using existing ontology terms 49
  50. 50. 51 SANSA Stack
  51. 51. Check it out 52 @impulsater
  52. 52. 53
  53. 53. BDE vs Hadoop distributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 54
  54. 54. BDE vs Hadoop distributions ◎ BDE is not built on top of existing distributions ◎ Targets o Communities o Research institutions ◎ Bridges scientists and open data ◎ Multi Tier research efforts towards Smart Data 55