This document summarizes a cloud computing workshop presented by Students@doslab from IIT Madras. The workshop covered techniques for big data analytics, including Hadoop MapReduce, Pig, Spark, and processing big graphs. It addressed challenges in big data such as processing large amounts of data across thousands of servers. The goals were to provide an overview of different big data technologies and enable students to select project topics in areas of interest related to working with large datasets in cloud computing.
2. Big Data
• “100 hours of video are uploaded to YouTube
every minute”
• “Average daily facebook likes: 4.5 billion”
– “Size of user data stored by facebook: more than
300 petabytes”
• 307200 terabytes
• Data analytics brings huge profits
DOS lab, IIT Madras
3. Public clouds
• On-demand and elastic
– Using 1000 machines for 1 hour costs same as
using 1 machine for 1000 hours
• Enables low budget large scale computing
4. Challenges in Big Data analytics
• Processing in 1 machine
insufficient
– Need 1000s of servers for
quick answers
• Distributed computing!
• Handling of server failures
an issue for long running
tasks
• Programming model should
be simple
DOS lab, IIT Madras
5. Topics covered
• Techniques for Big Data
analytics
1. Hadoop MapReduce and
Pig
2. In-memory processing
using Spark
3. Processing big graphs
4. Addressing remaining
challenges
30 mins each
DOS lab, IIT Madras
6. Desired outcome
• Give a flavour of different Big Data technologies
– Enable students to identify areas of interest
• Enable selection of project topics
– Working with students having similar interests
DOS lab, IIT Madras
7. Companion website
• Slides and additional materials will be
uploaded in the companion website
• Please give your feedback in the website
http://cloudtechiitm2014.wordpress.com
DOS lab, IIT Madras