A lot of people think that to do big data you have to use rocket science equipment all the way through, and that Excel because it's installed on every computer in the whole entire world certainly isn't special enough. That's not true at all. There's a few things about Excel. First off, is as a general principle you want to go to where the people are. The analysis is there to serve a purpose, it's there to inform other people.
Excel is still going to be a really good way to share it because it's where people know how to work. Far and away the most common data tool. There are hundreds of millions, perhaps billions, of copies of Excel floating in the world. Now big data and data science have an interesting connection with Excel.
For one thing, Excel, entirely on its own, just the application, is able to do real data science. The best presentation of this is in the book Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman. And he goes all the way through the really advanced capabilities of Excel that make it possible to explore and manipulate data in ways that you probably never thought were possible. But more interestingly using what are called open database connectivity interfaces or ODBC interfaces, you can hook Excel directly to Hadoop and do queries and analyses from the Excel interface.
Let me just go to Excel for a moment. If you come over here to data and bring up that menu, and then go to from other sources, you'll see that our very first thing is from a SQL Server, which is a relational database and a lot of information is going to be there, but you can go down to the Windows Azure Marketplace, so that's going to connect you up with Hadoop and these, the data connectivity wizard and the queries and the Odata, these are methods for connecting to big data. And now Microsoft has their own solutions and other vendors have other ways of hooking up Excel into big data and to Hadoop to make it possible to control the analysis, or at least do the queries and the sortings, right here from the single most familiar interface for working with data.
Finally, I want to mention that Excel is also a great way for sharing the results of the analysis.You can make interactive PivotTables, a great way for exploring the complexities of the data and people know how to work these, and sortable worksheets and the graphics and charts are familiar, they communicate, they're clear, it's a great way to go. In fact, I would say that putting the final results into Excel, which provides a degree of exploration and manipulation to your viewers, is probably the most democratic way of sharing the results of a big data analysis.
And again, the point of any analysis is to provide insight that people can work on, that is actionable, that they can use to improve their own businesses and their own projects.
Learn more: http://www.lynda.com/Hadoop-tutorials/role-Excel-big-data/158656/190805-4.html