Big data and Hadoop are growing rapidly due to changes in computing scaling laws that enable analytics to scale linearly on commodity hardware. This tipping point has led to widespread adoption of big data technologies by both large enterprises and startups across many applications and data scales. For large companies, compatibility challenges must be addressed to integrate big data with existing systems, while startups face fewer such constraints.
Why is big data sooo fashionable with big and small companies from different industries? What has suddenly changed?
Google searches are up 10x over just four years ago.
Hadoop use is exploding. We chose this example, which shows job trends for Hadoop. Further evidence that you should pay attention during this talk.
But we have seen constant growth for a long time. And simple growth would only explain some kinds of companies starting with big data (probably big ones) and then slow adoption. Databases started with big companies and took 20 years or more to reach everywhere because the need exceeded cost at different times for different companies. The internet, on the other hand, largely happened to everybody at the same time so it changed things in nearly all industries at all scales nearly simultaneously. Why is big data exploding right now and why is it exploding at all?
The different kinds of scaling laws have different shape and I think that shape is the key.
The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
In classical analytics, the cost of doing analytics increases sharply.
The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
This next sequence shows how the net value changes with different slope linear cost models.
Notice how the best net value has jumped up significantly
And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
Constant time implies constantfactor of growth. Thus the accumulation of all of history before 10 time units ago is less than half the accumulation in the last 10 units alone. This is true at all time.
Startups use this fact to their advantage and completely change everything to allow time-efficient development initially with conversion to computer-efficient systems later.
Here the later history is shown after the initial exponential growth phase. This changes the economics of the company dramatically.
The startup can throw away history because it is so small. That means that the startup has almost no compatibility requirement because the data lost due to lack of compatibility is a small fraction of the total data.
A large enterprise cannot do that. They have to have access to the old data and have to share between old data and Hadoop accessible data.This doesn’t have to happen with the proof of concept level, but it really must happen when hadoop first goes to production.