5. What the hype is ..
Cheap commodity hardware with amazing computing and storage
capacity
... but this time software has also catching up with hardware
6. Hype Ingredient list is ..
Cheap commodity hardware
Good network capacity
Software based on principal of "Divide and Conquer"
..thus scale out horizontally
8. Unstructure Storage
Store data reliably, cheaply and scalably
Hadoop Distributed File System (HDFS)
Divide data into smaller chunks
Hetrogenous storage medium support
Similar DFS e.g. Lustre, IBM GPFS, Ceph, MooseFS
9. Structured Storage
Store structured data reliably, scalably and indexed
NoSQL databases to store structured data
HBase, Accumulo stores underlying data in HDFS
Many more in big data zoo: Cassandra, Voltdb, NuoDB...
BlinkDB offers tradeoff between accuracy & response time
Full text search offers by Elasticsearch, Solr
10. Processing
Mapreduce methodology to process data in the distributed fashion
Data locality with Hadoop Mapreduce and HDFS
Spark supports mapreduce and utilize system & cluster's RAM
Support machine learning algorithms
Support python,scala,java
Support R, framework for data scientists
Hive, Shark, Pig to process structure data in distributed way
11. Some performance numbers to
guide..
L1 cache reference 0.5 ns
L2 cache reference 7 ns
RAM reference 100 ns (Queen)
Flash IO card reference 75,000 ns (Princess)
RTT within same datacenter 500,000 ns
Disk reference 10,000,000 ns