Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Unit 3 intro.pptx

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Getting started big data
Getting started big data
Cargando en…3
×

Eche un vistazo a continuación

1 de 6 Anuncio

Más Contenido Relacionado

Similares a Unit 3 intro.pptx (20)

Más reciente (20)

Anuncio

Unit 3 intro.pptx

  1. 1. HADOOP • Using the solution provided by Google, Doug Cutting and his team developed an Open Source Projectcalled HADOOP. • Hadoop runs applications using the MapReduce algorithm, where the data is processed inparallel withothers. • In short, Hadoop is used to develop applications that could perform complete statistical analysisonhugeamountsof data.
  2. 2. HADOOP • Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. • The Hadoop framework application works in an environment that provides distributed storageandcomputationacross clusters of computers. • Hadoop is designed to scale up from single server to thousands of machines, each offering localcomputation andstorage.
  3. 3. History of Hadoop
  4. 4. History of Hadoop • Hadoopisanopen-sourcesoftwareframeworkforstoringand processinglargedatasets ranginginsizefromgigabytestopetabytes • HadoopwasdevelopedattheApacheSoftwareFoundation. • In2008,Hadoopdefeatedthesupercomputersandbecamethefastest systemontheplanetforsortingterabytesofdata • TherearebasicallytwocomponentsinHadoop: 1.HadoopDistributedFileSystem(HDFS) -Itallowsyoutostoredataofvariousformatsacrossacluster 2.Yarn -ForresourcemanagementinHadoop. -Itallowsparallelprocessingoverthedatathat isstoredacross HDFS
  5. 5. Basics of Hadoop • Hadoopisanopen sourcesoftware frameworkfor storingdataandrunningapplicationsoncluster ofcommodityhardware • Itprovidesmassivestorage foranykindofdata,enormous processing power andtheabilityto handlevirtually limitless concurrenttasksorjobs • Adataresidinginalocalfilesystem ofapersonal computersystem, inHadoop,dataresides ina distributedfilesystem whichiscalled asa HadoopDistributedFileSystem-HDFS • TheprocessingmodelisbasedonDataLocality’conceptwhereincomputationallogicissent to clusternodes(server) containingdata • Thiscomputationallogicisnothing,butacompiledversion of aprogramwritten inahigh-level languagesuchasJava. • Suchaprogram,processes datastored inHadoopHDFS
  6. 6. Advantages and Disadvantages of Hadoop • Varied Data Source • Cost-effective • Performance • Fault-Tolerant • Highly available • Low Network Traffic • High throughput • Open source • Scalable • Ease of use • Compatibility • Multiple Language supported • Issue with small file • Vulnerable by Nature • Processing Overhead • Supports on Batch processing • Iterative Processing • Security Advantages Disadvantages

×