The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started.
This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011.
The slides are copyright of Accenture.
Thebadnews? It’snotgoing stop.Largeamounts of data bring a whole set of new challenges, howshouldwegoaboutthem?
It’s not just growing volumes of existing data, it’s also:The recognition of value in previously throw-away dataNew kinds of “data exhaust” – by-product data generated as part of other processes, currently ignored or thrown awayNew kinds of “intentional” dataThe combination of previously separate data
Big Data isnot so muchaboutthe “big”, butaboutfinding new waystohandle and analyze data thatwerenotpossiblebefore. There are a wholelot of new technologiesthat can be usedtodealwithbig data. Are familiar withall of them? Whichoneismostsuitableforyour case?
Let’s stopfor a secondto look at thekeyenablertechnologies in Big Data.MapReduceOriginallydesigned and firstdeveloped in Google as part of theireffortsto more efficientlyindexthe webMapReduce splits input data into smaller chunk that can be processed in parallel.Scales linearly with number of nodes.HadoopOpen sourceimplementation of MapReduce, basedonGoogle’swhitepaper. Started in Yahoo, nowan top-levelproject in the Apache Foundation.Runsoncommodity software (Linux) and hardware (consumer-grade computerswithdirectlyattachedstorage)Ratherstraightforwardtoinstall and administrateLargeecosystem of additional open sourcecomponents: Pig, Hive, Oozie, FlumeLargeecosystem of commercialofferings (bothclosed and open source)
Big Data AnalyticsTechnologyMultiple tools and technologies, sometimes for the same purpose: Hadoop, NoSQL databases, in-memory analytics)Time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologiestraditional data warehousing processes are too slow and limited in scalabilityability to converge data from multiple data sources, both structured and unstructureddecreased that time to informationSkillsThere’sonly so muchwe can do withexploratoryprocesses; theonlywaystoeffectivelyanalyzebig data requiremathematical and statisticalconceptswithwhich more traditionalanalysts are not familiarBusinessanalystsusedto be abletomanagewith Excel and basic SQL knowledge; nowwith data thatdoesnotfollowany particular model (it’sunstructuredafterall), thereis a needto look foranalysisthat are comfortablewithstatisticals and mathematicalconcepts, who are abletodevisetheirownmodelstofindpatters and insightswherethereapparentlywerenone.Processes & OrganizationData must be open and sharedacrosstheenterprise, supportedbyorganizationsthat “own” itData must be madeavailableacrosstheenterprise (i.e. wecan’tfindtrends in data thatwe do nothave)
Source: “Big Data Analytics:Future Architectures,Skills and Roadmapsfor the CIO”, IDC 2012 (http://www.sas.com/resources/asset/BigDataAnalytics-FutureArchitectures-Skills-RomapsfortheCIO.pdf)Thethree Vs:Velocity, Volume and VarietyEverythingwill be analyzed, buthowmuch do wehave, howsoon do weneedit and howfast can we do it?
MapReduce and Hadoop is currently seen as a low-level paradigm on top of which high-level tools must be built that are more intuitive and easy to use non-programmer types (business analysts, data scientists)Big Data technologies have not reached maturity yet and will continue to evolve over the next coming years. IT decision makers must still be realistic about the limits of what can be achieved via these technologies, sometimes waiting instead for the next generation of data technologies.There is also a lot of start-up activity happening (Scalar, MapR). Also, “traditional” large vendors do not want to be left behind: Microsoft SQL Server 2012 will be able to read and write data from Hadoop and HDFS or run Hadoop on Microsoft’s Azure PaaS, IBM has a version of InfoSphereBigInsights ready to be run on their SmartCloud solution and Oracle has recently introduced its own appliance of both a software and hardware solution with Hadoop and in-memory capabilities for handling large amounts of data.
Big Data Analytics is anaugmentation to existinganalytical infrastructure that willallow to scale and drive insights beyond “current capabilities”So the question becomes:how do we add these capabilities to interoperate with traditional tools?
The worlds of structured and unstructured data are rapidly converging. Architects and CIOs must find ways to manage this convergence and enable all forms of datamanagement to coexist, sometimes using bridge technologies, such as using Hadoop to process and import data into traditional systems in ways that wouldn’t be possible with just the RDBMS approach. “Hybrid” landscapes are justthat, where Hadoop isintegratedwithexisting data warehouses, traditionalrelationaldatabases and applications in a waythattheimpactontheenterpriseisminimized.The reality is that the EDW is evolving into a virtualized cloud ecosystem in which all of these database architectures can and will coexist in a pluggable “Big Data” storage layer alongside HDFS, HBase (Hadoop’s columnar database), Cassandra (a sibling Apache project that supports peer-to-peer persistence for complex event processing and other real-time applications), graph databases, and other “NoSQL” platforms behind an abstraction layer with MapReduce as its focusBig Data is not necessarily about its “bigness.” Very few organizations are going to need the type of scale that often makes the Big Data headlines. So, far from rendering the relational database obsolete, the new advances will be incorporated over time into the traditional databases, extending their performance.Adding Hadoop to the enterprise provides a cost effective place to store vast quantities of structured data from operational systems and combine it with both internal and externally sourced unstructured / semi-structured data.Also advanced MapReduce analytical methods can be used directly against that store, or through Hive / Hbase more traditional BI tools can be used to analyze the data.
We’veseenthetools,butwhoisgoingtobuild, run and maintainallthis?TechnologyskillsTheemergence of big data isbasedon new technologiesthatrequireeither training orsourcingadditionalexpertiseData scienceTraditionalanalyticalmodels do notgenerallyscalewelltothetypical “big data-like” volumes; new ways of thinking are needed, waysthathelpfindwhatwewantedtofind as well as whatwedidnotknowwecouldfindData scientists are thenextgeneration of businessanalysts, withstrongstatisticalskills and abletothink “outside of the box” lookingfor new analyticalmodels.
Agile software developmentmethodologies are one of thepotentialanswerstothis.A data strategyisrequired, butwithanapproachthatisaboutmodelingless and iterating more (justlike agile).
Require new tools and technologyBig Data doesn’talwaysgetitright,withorwithoutanalytics (wacky iTunes and Spotifyrecommendations, weirdLinkedInsuggestions)Require new skills in yourworkforceResistanceisfutile – Big Data and analytics are inescapableTheycreatebusinessvalueforthebottom-lineItisthepathtocompetitiveadvantageBig Data isnotonlytransforming IT, itisalsotransformingbusinesses and industries: retailrecommendations, smart meter/gridanalytics
How do wegetstartedwithallthis?Identifywhichbusinessprocessescouldbenefitthemostfromimprovedhandling and processing of largeamounts of data – what are thebusinessdecisionsthatwemakeeachday and thatwe’dliketomake more efficiently and more effectively?Productize data acrossthecompany, makeit a “firstclasscitizen” and providesomekind of data servicelayer so that data isaccessiblethroughouttheenterpriseIdentifytheskill and technology gaps and decide whethertogroworacquire new talent and technologyforthecompany (withorwithoutthecloud)Itisclearthatthisrequiresaninvestment; itisthepath forward, butitrequiresthatyou as decision-makersmake a commitmenttogrowbig data in yourcompany.
Source: http://www.accenture.com/us-en/technology/technology-labs/Pages/insight-accenture-technology-vision-2012.aspx (http://bit.ly/accenturetechvision2012 and http://bit.ly/accenturetechnologyvision2012)