Just how much accuracy do you need in a predictive model? The answer is, interestingly, not as much as you can get. Essentially, any model is at some level a curve fit. So, if you are OK to complicate, you can fit a bit better. While a complex model
is actually going to be more accurate than the simpler models, it is difficult to understand.
1. Data Curry
Higher Accuracy at What Cost?Higher Accuracy at What Cost?
These stories are written by Dr. Dakshinamurthy V Kolluru, Chief Advisor – Data Science,
Usha Martin Education: President, International School of Engineering (http://www.insofe.edu.in)
The best place in the world to learn Applied Engineering.
Just how much accuracy do you need in a predictive model? The answer is, interestingly, not as much as you can get!
Essentially, any model is at some level a curve fit. So, if you are OK to complicate, you can fit a bit better. While a complex model
is actually going to be more accurate than the simpler models, it is difficult to understand.
The following is a feed forward neural net on a customer data to predict whether a customer buys a product or not.
The same can be presented as a bunch of rules:
If (income is very high) and (family size is either 1 or 2 members)
and (education is high school), they will not buy the product.
Now, even if the rules have less accuracy, they can be very easy
for the business user to implement. Get the point?
Netflix is another excellent example. When people rent a video
on their site, they want to recommend other movies to them
that they might like. It is essential for them to make the best
possible recommendations as every additional rental leads to a
swelling of the top line.
They announced a competition to data scientists to beat their
recommendation engine by 10% extra accuracy. The prize
money was a million dollars. It was huge and attracted great
minds.
After the first phase (approximately after 12 months), the
leading algorithm gave around 8% better accuracy and was
fairly easy to engineer. However, the contest went on as the
magic 10% mark was still untouched. The contest ran for two
more years and finally, the BellKor (a team from Bell Labs) won
the contest with an algorithm that gave 10.06% improvement.
It had several hundred algorithms working together to create
the desired improvement which indeed is an intellectual
marvel.
However, ironically, Netflix never used that algorithm in their
production. Here is how their spokesperson explained it:
“We evaluated some of the new methods offline but the
additional accuracy gains that we measured did not seem to
justify the engineering effort needed to bring them into a
production environment.”
So, as a manager interested in data science, you always have to
consider what the additional accuracy is costing you in terms of
the development efforts and usability. The trade-offs have to
be carefully evaluated.
In most cases, your business users are interested in insights.
So, an if-then rule or a simple equation gives them better
understanding than a complex blackbox that somehow spits
the correct answer. You are better off in such situations to
compromise a bit on accuracy for simplicity.
www.datacurry.com