No centralized control.Data Redundancy Data Inconsistency Data can not be sharedStandards can not be enforcedSecurity issues Integrity can not be maintainedData dependenceCentralized control.No Data Redundancy Data Consistency Data can be sharedStandards can be enforcedSecurity can be enforcedIntegrity can be maintainedData independence
Can all the data be structured?Will we be able to store all the data in the tables ie can we model all the data?Should we discard the data after getting the required structured data from the log files or should we archive it?
Take the example of students using the facilities provided by college.
Two Core Components – HDFS & Map-ReduceMachines are un-reliableSeparates distributed fault-tolerant computing code from application logic.No need to worry about identity of a machinelets you interact with a cluster, not a bunch of machines.Analysis workloads span across multiple machinesruns as a cloud(cluster) & possibly on a cloud (EC2)
Consumer interested inSocial NetworkingOnline purchasing/bookingService Provider Interested dataAdvertisements or Revenue generationReporting – For internal house keepingChallenges Recommendation – publishing those advertisements which consumer look at as an information or which he is interested in.