TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Apache hadoop & map reduce
1. Apache Hadoop, BigData & MapReduce
WHY BIG DATA:
“More data usually beats better algorithm.”
GOOD NEWS:
“Big data is here.”
BAD NEWS:
We are struggling to store and analyze it.
KEY PROBLEM:
“Storage increased, not Speed.”
SOLUTION:
Parallelism
But, while implementing parallelism we may face some noteworthy problems like;
Hardware failure
Combining data
These problems have been overcome by Hadoop because of use of –
HDFS ( Hadoop Distributed File System)
MapReduce ( use of keys and values)
2. In a nutshell,
Hadoop provides - A reliable Shared Storage (by HDFS)
-A reliable Analysis System (by MapReduce)
MAPREDUCE:
Entire database or a good portion of it is processed for each query.
MapReduce is a batch query processor.
Already used by Mailtrust , Rackspace’s mail division for handling big data.
MAPREDUCE VS RDBMS:
CONCLUSION:
Though a thorough understanding is absent here, more research will make it more clarified and
distinguished as well. Some more valuable information will enrich it in the coming days.