Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Beyond Hadoop and MapReduce

1.346 visualizaciones

Publicado el

Talk about the evolving analytical systems, Hadoop, Spark and IaaS, PaaS

Publicado en: Tecnología
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Beyond Hadoop and MapReduce

  1. 1. BEYOND  HADOOP  AND  MAPREDUCE 1 Alexander Alten-Lorenz mapredit.blogspot.com
  2. 2. 2 HDFS (NN) small files / large namespaces incompatibility Hadoop as an Platform is not multi-tenant ready in a secure way Traditional Hadoop importance (M/R) moves behind the world needs BigData apps rely more and more on predictive as well as reactive platforms PaaS and IaaS are not compatible with the concept of Hadoop WHY  WE  NEED  TO  LOOK  BEYOND
  3. 3. AVAILABLE  DISTRIBUTED  FILE  SYSTEMS Beyond  Hadoop  and  MapReduce 3
  4. 4. 4 In Memory Distributed Shared Filesystem Written in Java In-memory checkpointing and caching Use underlaying FS to provide fault tolerance Pluggable FS layer Spark compatible HDFS / MR / Hive compatible Native support for raw tables Supports multi-columned tables Memory pinning for certain hot tables Fast facts: Web: http://tachyon-project.org Source: https://github.com/amplab/ tachyon
  5. 5. 5 Distributed file system Written in C++ Provides Object, File and Block storage as well as kernel support (experimental) Metadata principe, offers High Availability and Replication Self healing mechanisms, „no downtime“ mechanisms supports B-tree (incl. k-value) model Hadoop compatible Native Hadoop integration (hadoop-cephfs.jar) RH Gluster compatible (OpenStack) S3 compatible Fast facts: Web: http://ceph.com/docs/next/ Source: https://github.com/ceph/ceph
  6. 6. 6 Distributed file system scale-out network-attached storage file system. => Write once, read everywhere RDMA interconnect as one large parallel network file system client and server component (glusterfsd / glusterfs ) GlusterFS relies on an elastic hashing algorithm, rather than using either a centralised or distributed metadata model Hadoop compatible Hadoop FileSystem plugin for GlusterFS (glusterfs-hadoop) available Fast facts: Web: http://www.gluster.org/ Source: http://download.gluster.org/pub/ gluster/glusterfs/3.5/
  7. 7. PAAS  /  SAAS  SOLUTIONS 7 Beyond  Hadoop  and  MapReduce
  8. 8. 8 Cluster manager and PaaS layer Provide ressource isolation Provide ressource sharing between applications or frameworks run many different applications on a dynamic shared pool of nodes DCOS (Data Center Operating System) Support HDFS (DFS Layer) Ceph will be supported soon Tachyon support experimental Runs on local file system too Spark, Cassandra, Storm, Docker, HDFS, Hadoop support Fast facts: Web: http://mesos.apache.org/ DCOS: http://mesosphere.com/ Source: https://github.com/mesos/ mesos Tachyon: https://github.com/mesosphere/ tachyon-mesos
  9. 9. 9 Public / Private cloud and IaaS provider VM based cloud computing Enables horizontal scaling by using unused resources or spinning up new services Wide broaden industry committers Abstract HW layer Supported by RedHat Fast facts: Web: http://www.openstack.org/ OpenStack Foundation: http://www.openstack.org/ foundation Source: https://github.com/openstack/ openstack
  10. 10. HADOOP  PAAS 10 -­‐  EXPERIMENTAL  -­‐
  11. 11. 11 Mesos HDFS (Gluster) Spark NoSQL MR Search IngestETLSQL Docker This approach uses Mesos as PaaS layer for the DFS infrstructure. Mesos supports HDFS as well as GlusterFS, HDFS seems to be more robust in that approach. Hadoop and Spark will be maintained by docker containers. This solution is not so powerful as JBOD hardware in terms of traditional MapReduce, but plays well within an Spark / InMemory environment. Hadoop is used as an transition layer to move more and more applications to Spark. Datacenter
  12. 12. HADOOP  IAAS 12 -­‐  EXPERIMENTAL  -­‐
  13. 13. 13 Apache SparkSahara OpenStack Spark SQL NoSQL MR Search KafkaETLHive Spark Stream Spark ML GlusterFS Apache Hadoop Apache Flink Tachyon Storm The hardware layer will be completely abstracted by OpenStack. Tachyon closes the gap between IO and Virtualisation (Cloud) and works as an data layer bridge between Hadoop projects, Ingest and Spark. Sahara manages Spark workers as well as Hadoop nodes. Datacenter
  14. 14. THANK  YOU! 14 Alexander Alten-Lorenz mapredit.blogspot.com @mapredit

×