Hadoop’s “Crossing the chasm” will require widespread and ubiquitous adoption by organizations; but the keystone to all of this isn’t the widely-talked about social media like Facebook, Twitter and LinkedIn. The seemingly mundane “dark data” in business which is captured but left unutilized, or under-utilized, will start the transformation away from the standard architectures of old and transform into the brave new work generally associated with “Big Data”.
As members of the Hadoop Community, it is our challenge to bring about that change rapidly and responsibly – bringing order to the “wild west” of the disruptive business intelligence landscape today. BI is the foothold on which to bring Hadoop into mainstream. Success requires linking new technologies with the mature ones in use today to enable the search for value.
Beyond the racks and clusters, we need to bring the science and understanding to enable organizations to leave the past behind and move to the brave new world. This requires bringing along applications, processes, and groups of users – intelligently combining noSQL, relational, predictive, and advanced analytics technologies together to make them easily consumable, even to the business user
5. What is Business Intelligence?
Numbers
Tables
Charts
Indicators
Time
- History
- Lag
Access
- to view (portal)
- to data
- to depth
- Control/Secure
Consumption
- digestion
…with ease and simplicity
8. Got mobile?
200 million
Employees bring their own
device to work
Nearly half
Of the workforce will be made
up of millennials by 2020
50%
Companies BYOD orgs have had
a security breach
1/3
Have broken or would break
corporate policy on BYOD
12. BI tools have plateaued…again
Decision Support (Reporting) in late 90’s
Business Intelligence of 00’s
…led to data mining
…leading to analytics and data science
15. create external script LM_PRODUCT_FORECAST environment rsint
receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES
partition by PRODNO order by PRODNO, ROW_ID
sends ( R_OUTPUT varchar )
isolate partitions
script S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), header=FALSE,row.names
colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")
dim1<-dim(prod1)
daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW),
daily1[,2]<-daily1[,2]/sum(daily1[,2])
basesales<-array(0,c(dim1[1],2))
basesales[,1]<-prod1$ID
basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])
colnames(basesales)<-c("ID","BASESALES")
fit1=lm(BASESALES ~ ID,as.data.frame(basesales))
select Trans_Year, Num_Trans,
count(distinct Account_ID) Num_Accts,
sum(count( distinct Account_ID)) over (partition by Trans_Year
cast(sum(total_spend)/1000 as int) Total_Spend,
cast(sum(total_spend)/1000 as int) / count(distinct Account_ID
rank() over (partition by Trans_Year order by count(distinct A
rank() over (partition by Trans_Year order by sum(total_spend)
from( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,
count(Transaction_ID) Num_Trans,
select dept, sum(sales)
from sales_fact
Where period between date ‘01-05-2006’ and date ‘31-05-2006’
group by dept
having sum(sales) > 50000;
select sum(sales)
from sales_history
where year = 2006 and month = 5 and region=1;
select total_sales
from summary
where year = 2006 and month = 5 and region=1;
Behind the
numbers
16. It’s all about getting work done
Used to be simple fetch of value
Tasks evolving:
Then was compute dynamic aggregate
Now complex algorithms!
17. Time to influence
Reaction – what? – potential value
Action – opportunity - interaction
BI is becoming democratized
19. Business [Intelligence] Desires
in relation to Big Data
More timely
Lower latency
More granularity
More users interactions
Richer data model
Self service
39. MPP everything – get more work done
“No SQL” graduates to “not-only-SQL”
SQL remains preferred data access
language … for business community
SQL can encapsulate other processing
- in-line Python, R, Java etc.
41. Big Data + Hadoop + in-memory for BI
a
aaaaaaaa
aaaaaaaa
aaaaaaaa
aaaaaaaa
aaaaaa a
aaaaaaaa
42. Wild West 1865 to 1890
"The Significance of the Frontier in
American History" (1893) a thesis by
Fredrick Jackson Turner.
The West not as a particular geographic
place, but a frontier process - as a series
of Wests on a receding frontier line - the
point where savagery meets civilization.
For Turner, American history was largely
a tale of people leaving settled areas for
the frontier, and their struggle to survive
in new lands.
A brain we all depend on it – we spend early parts of our lives developing it then a few years pickling it with alcohol (not sure it helps preserve it) and then actually using itCorporations have to build and develop the corporate brain learn, adapt, develop or die!Business Intelligence is key part of that learning process
BI is the digital brain of business – the corporate brain - it’s a collection of tools, process and objectives Ideally an ethos!Like Humans it needs learning, information and experimentation In all the sea of technology the values and reasoning get lost
7-Click build – step through text then arrow then spinSame as human learning occurs within group and context of community Requires acquisition of facts – get the data Ability to view and manipulate - get to see and interact with data Ability to discuss, absorb and review Then take action – in business Pull levers to changeAnd of course action changes things which requires iteration feedback
Very crudely
Its rarely about more charts, more colours, more report stylesLower latency – speed of access to new data - real time accessMore timely also ‘faster’where’s the value – in the data and in the accessBuild and they will come – its more about interactions per user than raw users (concurrency debate)
Note: no click - Progressive build from start!Mobile access is coming alongApplication space broadening BYODCan supply access to BIBut also furiously generate data for BIAccess to dynamic information but every access generates data and possible inferencesSelf-service access
Note: 1-Click progressive buildPurely as an aside - if anyone doubts the rise of mobile…
In the mean time – data does not stop flowingReality check ! The big data fire hose is now full on!
Note: no buildVisokio – omniscopeAlso Microstrategy Insight, SAP Analytics workbenchNew players like Domochanging players like Alteryx
2-click build – extend title, then progressive textPlateaued – what a great word for a run of vowels!loosing momentum – could almost say flat-liningThe enterprise toolsOnly so many variations of charts, tables, colours, layouts etc.Standard fabric
2-Click Build – ‘more’ then R logoThe progression every time from simple fetch and calcTo complex calculationMining aided discoverNew world is about dynamic – real time analyticsR being the torch bearer – cost effective! tool of choice for millennials coming out of university
No BuildBottlenecks caused by platforms and tools unable to cope with demands of complexity, disparity and volumeComplex analyticsMachine learning – fraud detection/gamingWeb Analytics – Dynamic content/bid managementModelling – traditional clustering/behavioural for marketing/product development/resource optimisationInvestigative Reporting (Dashboards and reports with granular data access)Data Model
Note: 1-Click BuildBI mostly focuses (sells) on presentation – Graphics, pictures, VisualisationBUT behind the scenes a lot of heavy lifting has to be doneThis workload has changed over time from the simple to complex
2-Click build – text added then diag addedWhat the business cares about is getting work done DW is now a bottleneck – its rigour and model get in the way!They really don’t care about how it is stored or where it is stored!Some tasks just plain to big to run! Its not about raw individual speed its about throughputAddress the bottlenecksToo many vendors play games that just shift the bottleneck
Tension – Nearly high noon! Two interpretations -time ‘needed’ to influence – reaction - what - the time ‘now ‘to influence – action – opportunityTwo contexts - time to influence peers and managers - time to influence customersFastest draw now counts for a lot!
Lots more debate and arguments like everything today need to be settled quicklyDangerous but exciting timesHowever Loss of control and governance – too much going on around the EDWBusiness and IT in gun fight – Wild West
1-Click BuildSo a quick check point – where are weMore timely – no – too much effort to work out what to do?Batch processing gets in the way of interactive accessSelf-serve if you are knowledgeable enoughWinning in some areas but not in all
No build into swipe transitionOK Let’s not forget the data warehouse!Who couldIn previous presentation drew analogy with castles
(Bodiam Castle – from Eric Star Picture) Consolidate power, protect, stand the test of time, some where safe in difficult timesThe DW built to protect the corporate knowledgeLaw and discipline – structure, trust, safe haven - Control
1-Click buildLots of investment and permanenceControlled access – tour access not full open accessDW starts to overload, starts to be selective,DW is inflexible – its controls get in the way of new data and big data – kills the three ‘V’sWho’s allowed in, what are they allowed to do and access – like visitors to modern castle - but not necessarily with nice guidebookUltimately its queues and delays cannot cope - users initially patiently, later impatientbusiness wants more and fasterIT see’s pressure from a different perspective – trouble and pain – Main inhibitor is complexity and cost
A quick USA – wild west perspective on castlesMore like marts – less edifice, more practical functionWild West Castle – Rapidly constructed from local materials - few long term examplesTime to build – effort expended and time spent – more AgileRapidly moving new frontier just like modern BI – keep movingDisney recreation - Fanghoot
1-Click build – extend with boringDW is policed, it controls what you can have and in some case when you can have itHow many people get excited about their DW or access to a DWYes it gets the job done
Well this little guy certainly woke a few people up! as if a yellow elephant could creep up on you!Hadoop will solve all my BI problems… RIGHT? Many business users still not fully aware of what Hadoop is
1-Click BuildHadoop is not "universal solution“!Way too much hype and hyperbole - great for innovators and start-ups not so good for plain old business
No click – progressive build from startCan debate ‘free’, but substantially reduced $$$
3-click build – Text then two postitsDW demanded ETL to map data into model and ensure logical consistency - upfront prerequisiteStructure is strangling the DW – it was its primary strength, now weaknessHadoop making people lazy – it cuts out thought but leaves future decisions wide open – no lock in, cuts risks of bad decisionsSimplified decisions of what to keep – keep it allBUT hey BI needs structure and discipline!!!!
2-click build – SqoopthenElephant photoIntegration between business infrastructure and systems and hadoop still limitedETL vendors not sure whether to love or hate Hadoop – will eat their lunchSqoop great for moving modelsNot so great for moving big data (or big elephants)Not exactly easy to move elephants on creaky railroads!
3-click build – wanted, scribble, new playersAh yes plugging into Hadoop So much for noSQL revolutionUniversal integration needed – protect the BI investmentLost the gun fight like all revolutions the upstarts died down and got absorbed (subsumed)Business and BI investment demands SQL!Hive now we have drill, impala, Pivotal,Tough game – yes its SQL access but not low latency
1-Click Build – insert ‘still’, pause, then loss…Remember the rise of data discoveryFine for big trawlsNot good for low latency iterations, high frequency accessThere, I have dared to say it!Does not accelerate BI quite in the way business was sold by the EDWLoss of “interactivity”A decade of being sold train-of-thoughtHadoop - Not hands on, not desktop, not agile
1-click build - RamBalance – full spectrum power availableExcellent computing powerUnlimited storageFast networksNo need for single platforms like the traditional DW – stores and analysesThis is why data sciences risesWe did not get this in rise of data mining in the 90’sWe’ll come onto RAM shortly
2-click buildHadoop disk centric – Storage - just like the EDW more parallelism yes, lots more but still batch disk I/O centricSchedulers not designed for rapid responseEssentially a batch queue – BI applications and business users have significantly evolved from batch reportingHadoop infrastructure evolution will drive more CPUs as they get work done!
1-Click BuildFlash is not in-memoryVendors flash-washing products – boosts I/OLimitations – cost high, capacity lowBig vendors of EDW systems just offer switching spinning drives for flash drives!EDW appliance vendors offer this at a premium cost – only makes sense if majority is flashReally its about nanoseconds not millisecondsTraditional EDW software is architected for lots of disk and relatively small amounts of CPUFlash helps – bandaid on problem – buys a little time for the EDW if you can afford it – digital jolt
2-click buildTo Be quick on drawLots of access to data - iterationsAnalytics is about work done – more work needs to be doneSo don’t hold CPUs back! – Highlight the cores – many more to comeCores help open up the bottleneck we saw earlierIn-memory is not cache!Memory is underplayed in Hadoop - its cheap use it!Processors and Ram are true measure of work that can be done – disks just fetchKeep data in memory!!! Don’t swap, don’t wait on disk don’t pick through indexes then data, just access what is needed.Economics of RAM have changed, much lower cost, large volumes readily available
No BuildReal world viewWith better performance than DWAnd considerably better standards support for SQL – like 2011 standard!And full OLAP support both ODBO and XMLAKognitio runs on same technology as Hadoop – work in same farm
Kognitio Hadoop connectorNon-invasive, uses standard HDFS/Map-Reduce access methodsFast to deploy – no coding neededActive selection is Kognitio machine codeMulti-threaded delivery backKognitio can retrieve terabytes – Terabyte in 10 mins – that’s a lot of M&Ms
No SQL Revolution dissipated/absobed – Business wonHadoop will be disk drive of futureHadoop will be data OS of future - data processing ecosystemPlatform for data scienceSQL will be primary access methodParallel execution and low latency will be demandedSupport for running any math or complex process
2-Click BuildGraduate analysis to productionKey future ability is to move rapidly from discovery to productionTaking findings from Data Scientists and within hours or days productionize!Discovery has shelf-life – time to influence is nowcloud computing flexibility, PaaS, SaaS, rapid deployment make this possible (enabler)Hadoop provides the consistent central storeCaneither scale-up and dedicateOr spawn new logical model based system populate at scale and start productionAdaptable
1-Click BuildLogical Data Warehouse components just need processes and SLA
Followed the California gold-rush of 1848/49
marking the completion of the Transcontinental Railroad.Wild West was tamed by infrastructure, by the engineers and naviesSo that the shop keepers, bankers and workers could easily followBusiness infrastructure will only move on when BI and Hadoop and supportingEcosystem comes together – create an information network