Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data Analysis on a Cloud Ecosystem-PATW 2013

My presentation on Big Data for PATW 2013, IET(UK)

  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Big Data Analysis on a Cloud Ecosystem-PATW 2013

  1. 1. Big Data Analysis usingHadoop on a EucalyptusCloudHow secure is our cloud?PRESENTED BY: ABHISHEK DESTUDENT, CSE 2ND YEAR, BPPIMT
  2. 2. Contents: The Big Data Crisis Let’s embrace Cloud Computing Benefits of cloud Establishing an IaaS using Eucalyptus A word on Virtualization Hadoop as a Platform MapReduce and HDFS Typical algorithms Benefits we achieve How secure is the system?PREPARED BY: ABHISHEK DE206-Apr-13
  3. 3. The drifting era: BIG DATA and crisis• YouTube users upload 48 hours ofnew video every minute of theday.• 100 terabytes of data uploadeddaily to Facebook.• Twitter sees roughly 175 milliontweets every day, and has morethan 465 million accounts.• Walmart handles more than 1million customer transactionsevery hour, and databases morethan 2.5 petabytes of data.PREPARED BY: ABHISHEK DE306-Apr-13
  4. 4. DATA isprecious, tooprecious..We needInfrastructure, whichcomes easily as aService06-Apr-13PREPARED BY: ABHISHEK DE4
  5. 5. Solution: Cloud Computing Conventional Computing:You data gets processed in your owncomputer. Cloud computing:You send your data to some othercomputer. It gets processed there and itcomes back to you.“Cloud Computing is the useof computing resources (hardware andsoft ware) that are delivered as a serviceover a network (typically the Internet)”--WIKIPEDIAPREPARED BY: ABHISHEK DE506-Apr-13
  6. 6. Benefits of Cloud Computing:Highreliability.Highly scalable andfault tolerant.Reduced Cost: Onlypay for what youneed.Efficient management ofresources.ImprovedSecurity.Achieved out ofcommodityhardware.PREPARED BY: ABHISHEK DE606-Apr-13
  7. 7. Why Eucalyptus?“Elastic Utility Computing Architecture Linking Your Programs To Useful System”Eucalyptus is the worlds most widely deployed software platform for on-premise(private) Infrastructure as a Service (IaaS) clouds.It uses existing infrastructure to create a scalable, secure web services layer thatabstracts compute, network and storage to offer IaaS.Eucalyptus can be dynamically scaled up or down depending on applicationworkloads.PREPARED BY: ABHISHEK DE706-Apr-13
  8. 8. Architecture of Eucalyptus:FRONT END:• Users login tothe cloudusingcredentials• The user isredirected tothe back endof thecloud, i.e., theStorage andthe Resourcepooluser1user1@nc1:BACK END:• Runs the NodeController.• Mountsimages asVirtualMachines orinstancesusing XEN orKVM.• Hosts theresource pool.FRONT END BACK ENDPREPARED BY: ABHISHEK DE806-Apr-13
  9. 9. XEN: Virtualize your resources XEN, is the under laying technology used byeucalyptus. Xen hypervisor allows several guestoperating systems to be executed on the samecomputer hardware concurrently. Xen partitions a single physical machine intomultiple virtual machines, to provide serverconsolidation and utility computing. Existingapplications and binaries run unmodified. The hypervisor controls the MMU, CPUscheduling, and interrupt controller, presenting avirtual machine to guests.PREPARED BY: ABHISHEK DE906-Apr-13
  10. 10. HADOOP: Solution to BIG DATAPREPARED BY: ABHISHEK DE10 Roughly how long does it take to read 1TB from a commodity hard disk: That is roughly around 4 hours. With HADOOP it takes around :06-Apr-13
  11. 11. Birth of HADOOP: Opensourcealternative to GFS Pre-2004 : Cutting and Cafarella develop open source projects for web-scaleindexing, crawling and search. 2004: Jeffrey Dean and Sanjay Ghemawat introduce map reduce model used internallyat Google. 2006: Hadoop becomes official Apache project, Cutting joins Yahoo! Yahoo adoptsHadoop.06-Apr-13PREPARED BY: ABHISHEK DE11
  12. 12. HDFS: Hadoop Distributed File System Files split into 128MB (or 64MB) blocks Blocks replicated across several datanodes(usually 3) Single namenode stores metadata (file names, blocklocations, etc.) Optimized for large files, sequential reads Clients read from closest replica available.(note:locality of reference.) If the replication for a block drops below target, it isautomatically re-replicated.Datanodes1234124213143324Namenode06-Apr-13PREPARED BY: ABHISHEK DE12
  13. 13. Data FlowWeb Servers ScribeServersNetworkStorageHadoop ClusterOracleRACMySQL06-Apr-13PREPARED BY: ABHISHEK DE13
  14. 14. HADOOP and MapReduce:PREPARED BY: ABHISHEK DE14InputMapShuffle/SortReduceOutput06-Apr-13
  15. 15. Word Count: A typical ExamplePREPARED BY: ABHISHEK DE1506-Apr-13
  16. 16. Implementation: HardwarePREPARED BY: ABHISHEK DE16Move code to data (localcomputation)Allow programs to scaletransparently w.r.t size of inputAbstract away fault tolerance,synchronization, etc.06-Apr-13
  18. 18. Social Networking Analysis: Problem: recommend new friends (friend-of-a-friend, FOAF) Map task:– U (target user) is fixed and its friends list copied to all cluster nodes (“copy join”); each cluster nodestores part of the social graph– In: (X, <friendsX>), i.e. the local data for the cluster node– Out:if (U, X) are friends => (U, <friendsXfriendsU>), i.e. the users who are friends of X but not alreadyfriends of Unil otherwise Reduce task:– In: (U, <<friendsAfriendsU>,<friendsBfriendsU>, … >), i.e. the FOAF lists for all users A, B, etc. whoare friends with U– Out (U, <(X1, N1), (X2, N2), …>), where each X is a FOAF for U, and N is its total number ofoccurrences in all FOAF lists (sort/rank the result!)06-Apr-13PREPARED BY: ABHISHEK DE18
  19. 19. Pro’s and Con’s Batch, offline jobs Write-once, read-many across fulldata set Usually, though not always, simplecomputations I/O bound by disk/networkbandwidthPREPARED BY: ABHISHEK DE19What it’s not: High-performanceparallel computing, e.g.MPI Low-latency randomaccess relationaldatabase Always the right solution06-Apr-13
  20. 20. Cloud Security: Threats unveiledXML SIGNATURE ATTACK: The original SOAP body element is moved to a newlyadded bogus wrapper element in the SOAP securityheader. Note that the moved body is still referencedby the signature using its identifier attribute Id="body".The signature is still cryptographically valid, as thebody element in question has not been modified (butsimply relocated). Subsequently, in order to make theSOAP message XML schema compliant, the attackerchanges the identifier of the cogently placed SOAPbody (in this example he uses Id="attack"). The fillingof the empty SOAP body with bogus content cannow begin, as any of the operations denied by theattacker can be effectively executed due to thesuccessful signature verification.06-Apr-13PREPARED BY: ABHISHEK DE20
  21. 21. Script Injection Attack targets only the AWS management console users. exploits the shared credentials between the amazon shop interface and AWS. The first vulnerability is exploits the GET parameters in the download link usersutilize for downloading their X.509 certificates issued by Amazon. However thepreconditions for the attack are rather high including use of UTF-7 encoding forthe injected script to bypass server logic to encode standard HTML charactersas well as the exploitation of features in specific IE versions. The second script injection attack uses a persistent cross site scripting attack byexploiting the login session that is initiated with AWS the first time a user logs intothe Amazons hop interface06-Apr-13PREPARED BY: ABHISHEK DE21
  22. 22. Who uses it? Applications andInnovationsProjects under Hadoop: HBase ZooKeeper Pig Zombie Hive SqoopPREPARED BY: ABHISHEK DE2206-Apr-13
  23. 23. References: BY: ABHISHEK DE23
  24. 24. That’s the end..But the beginning of a newhorizon..Special thanks to the entireteam that helped me in thisendeavor.ALL QUERIES, PLEASE CONTACT ME AT: abhishekde@hotmail.comQUESTIONS?