Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Hadoop & distributed cloud computing

1.376 visualizaciones

Publicado el

Hadoop and Distributed Cloud Computing

Publicado en: Tecnología, Empresariales
  • Sé el primero en comentar

Hadoop & distributed cloud computing

  2. 2. CLOUD COMPUTING ?Cloud computing is a virtual setup box that includesfollowing- Delivery of computing as a service rather than product - Shared resources are software, utility, hardware provided over a network ( TypicallyInternet ) Delivery of computing Public Utilities Shared Resources
  3. 3. DISTRIBUTED CLOUD COMPUTINGAs the name explains : Distributed computing in cloudExamples:• Distributed computing is nothing more than utilizing many networked computers to partition(split it into many smaller pieces) a question or problem and allow the network to solve theissue piecemeal• Software like Hadoop. Written in Java, Hadoop is a scalable, efficient, distributed softwareplatform designed to process enormous amounts of data. Hadoop can scale to thousands ofcomputers across many clusters.• Another instance of distributed computing, for storage instead of processing power, isbittorrent. A torrent is a file that is split into many pieces and stored on many computersaround the internet. When a local machine wants to access that file, the small pieces areretrieved and rebuilt.• P2P network, that send communication/data packages into multiple pieces across multiplenetwork routes. Then assemble them in receivers end.Distributed computing on cloud is nothing but next generation framework to utilize themaximum value of resources over distributed architecure
  4. 4. WHAT IS HADOOPFlexible infrastructure for large scale computation and data processing on a network ofcommodity hardware.Why Hadoop?A common infrastructure pattern extracted from building distributed systems•Scale • Open Source project•Incremental growth • Yahoo !, Facebook, Google, Fox, Amazon, IBM,•Cost NY times uses it for their core infrastructure•Flexibility • Widely Adopted A valuable and reusable skill set• Distributed File System Taught at major universities• Distributed Processing Framework Easier to hire for Easier to train on Portable across projects, groups
  5. 5. HOW IT WORKSHDFS: Hadoop Distributed File SystemA distributed file system for large data• Your data in triplicate ( one local and two remote copies)• Built-in redundancy, resiliency to large scale failures (automated restart and re-allocation )• Intelligent distribution, striping across racks• Accommodates very large data sizes On commodity hardware
  6. 6. PROGRAMMING MODELThere are various programming model for Hadoopdevelopments. I personally like & experienced withMap/ReduceWhy Map/Reduce:•Simple programming technique: • Map(anything)->key, value • Sort, partition on key • Reduce(key,value)->key, value• No parallel processing / message passing semantics• Programmable in Java or any other language Continued …
  7. 7. PROGRAMMING MODEL Gather output ofCreate/Allocate Move computation map, sort or cluster to Data partition on key Put Data Run Results of job Program reduce stored on into File Execution task HDFS System Your Map code Data is split is copied to the into allocated nodes, blocks, store preferring nodes d in triplicate that contain across your copies of your data cluster
  8. 8. PRACTICESPut large data source into HDFSPerform aggregations, transformations, normalizations onthe dataLoad into RDBMS
  9. 9. THANK YOUThank you for reading this. I hope you find it useful. Please contact me if you have any queries/feedback. My Name is RajanKumar Upadhyay, I have more than 10 years of collective IT experience as atechie.If you have anything to share/looking for consulting etc. Please feel free to contactme.