SlideShare una empresa de Scribd logo
1 de 44
Big Data and the BI Wild West
Don’t Bring an Elephant
to a Gun Fight!
Paul Groom
Tools
Processes
Objectives
Why Business Intelligence?
View
Learn
Action
Community
Acquire
What is Business Intelligence?
Numbers
Tables
Charts
Indicators
Time
- History
- Lag
Access
- to view (portal)
- to data
- to depth
- Control/Secure
Consumption
- digestion
…with ease and simplicity
Business [Intelligence] Desires
More timely
Lower latency
More granularity
More users interactions
Richer data model
Self service
View and generate
Got mobile?
200 million
Employees bring their own
device to work
Nearly half
Of the workforce will be made
up of millennials by 2020
50%
Companies BYOD orgs have had
a security breach
1/3
Have broken or would break
corporate policy on BYOD
Data flow
Dynamic access
Drill unlimited
Disruption: Data Discovery tools
BI tools have plateaued…again
Decision Support (Reporting) in late 90’s
Business Intelligence of 00’s
…led to data mining
…leading to analytics and data science
More math
…a lot more math
Machine learning
algorithms Dynamic
Simulation
Statistical
Analysis
Clustering
Behaviour
modelling
The drive for deeper understanding
Reporting & BPM
Fraud detection
Dynamic
Interaction
Technology/Automation
AnalyticalComplexity
Campaign
Management
create external script LM_PRODUCT_FORECAST environment rsint
receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES
partition by PRODNO order by PRODNO, ROW_ID
sends ( R_OUTPUT varchar )
isolate partitions
script S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), header=FALSE,row.names
colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")
dim1<-dim(prod1)
daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW),
daily1[,2]<-daily1[,2]/sum(daily1[,2])
basesales<-array(0,c(dim1[1],2))
basesales[,1]<-prod1$ID
basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])
colnames(basesales)<-c("ID","BASESALES")
fit1=lm(BASESALES ~ ID,as.data.frame(basesales))
select Trans_Year, Num_Trans,
count(distinct Account_ID) Num_Accts,
sum(count( distinct Account_ID)) over (partition by Trans_Year
cast(sum(total_spend)/1000 as int) Total_Spend,
cast(sum(total_spend)/1000 as int) / count(distinct Account_ID
rank() over (partition by Trans_Year order by count(distinct A
rank() over (partition by Trans_Year order by sum(total_spend)
from( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,
count(Transaction_ID) Num_Trans,
select dept, sum(sales)
from sales_fact
Where period between date ‘01-05-2006’ and date ‘31-05-2006’
group by dept
having sum(sales) > 50000;
select sum(sales)
from sales_history
where year = 2006 and month = 5 and region=1;
select total_sales
from summary
where year = 2006 and month = 5 and region=1;
Behind the
numbers
It’s all about getting work done
Used to be simple fetch of value
Tasks evolving:
Then was compute dynamic aggregate
Now complex algorithms!
Time to influence
Reaction – what? – potential value
Action – opportunity - interaction
BI is becoming democratized
BI Wild West
Business [Intelligence] Desires
in relation to Big Data
More timely
Lower latency
More granularity
More users interactions
Richer data model
Self service
The Data Warehouse?
Realities
Reports against the DW are just plain dull, boring even!
And then came…
Hadoop ticks many but not all the boxes
a
aaaaaaa
aa a aa
aa aa a
a aa a
aa aaa
a aa aa
Stomped on costs
Made economics of scale practical
No need to pre-process before storage
i.e. no need to align to storage
No need to triage before storage
Early bridge Building
Early Hadoop integration tools
The new bounty hunters:
Drill
Impala
Pivotal
Stinger
The No SQL Posse
Wanted
Dead or Alive
SQL
…but Hadoop too slow
for interactive BI
…loss of train-of-thought
still
For once technology is on our side
…oh and BTW RAM is cheap!
CPU
NetworkStorage
Lots of these
Not so many of these
Hadoop is…
Hadoop inherently disk oriented
Typically low ratio of CPU to Disk
‘Flash’ washing is
not the solution
Analytics needs
low latency, no I/O wait
Analytical Platform Reference Architecture
Analytical
Platform
Layer
Near-line
Storage
(optional)
Application &
Client Layer
All BI Tools All OLAP Clients Excel
Persistence
Layer Hadoop
Clusters
Enterprise Data
Warehouses
Legacy
Systems
Kognitio
Storage
Reporting
Cloud
Storage
SQL MDX
Cognos
Reach out, actively select and pull back
to consume
MPP everything – get more work done
“No SQL” graduates to “not-only-SQL”
SQL remains preferred data access
language … for business community
SQL can encapsulate other processing
- in-line Python, R, Java etc.
Discovery
Production
Big Data + Hadoop + in-memory for BI
a
aaaaaaaa
aaaaaaaa
aaaaaaaa
aaaaaaaa
aaaaaa a
aaaaaaaa
Wild West 1865 to 1890
"The Significance of the Frontier in
American History" (1893) a thesis by
Fredrick Jackson Turner.
The West not as a particular geographic
place, but a frontier process - as a series
of Wests on a receding frontier line - the
point where savagery meets civilization.
For Turner, American history was largely
a tale of people leaving settled areas for
the frontier, and their struggle to survive
in new lands.
Driving the golden spike for Hadoop and BI
connect
kognitio.com
kognitio.tel
kognitio.com/blog
twitter.com/kognitio
linkedin.com/companies/kognitio
tinyurl.com/kognitio
youtube.com/kognitio
contact
Michael Hiskey
VP, Marketing & Business Development
michael.hiskey@kognitio.com
Paul Groom
Chief Innovation Officer
paul.groom@kognitio.com
Steve Friedberg - press contact
MMI Communications
steve@mmicomm.com
Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!

Más contenido relacionado

La actualidad más candente

PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...
CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...
CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...Ryan Le
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteRich Clayton
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure mlKoray Kocabas
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesTony Pearson
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
wasup-robert-hall-presentation-2
wasup-robert-hall-presentation-2wasup-robert-hall-presentation-2
wasup-robert-hall-presentation-2Rob Hall
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsBob Samuels
 
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Databricks
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Dataconomy Media
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in GovernmentDeepak Ramanathan
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
 

La actualidad más candente (20)

PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...
CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...
CISC 525 - Big Data Architecture - Tran (Ryan) Le - Real-time Portfolio and R...
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium Keynote
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure ml
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
ESGYN Overview
ESGYN OverviewESGYN Overview
ESGYN Overview
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
wasup-robert-hall-presentation-2
wasup-robert-hall-presentation-2wasup-robert-hall-presentation-2
wasup-robert-hall-presentation-2
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive Analytics
 
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 

Similar a Big Data and the BI Wild West

Big Data
Big DataBig Data
Big DataNGDATA
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveHyderabad Scalability Meetup
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)Lucas Jellema
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data scienceMahesh Kumar CV
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case studySudhi Seshachala
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acmehooduku
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 

Similar a Big Data and the BI Wild West (20)

Big Data
Big DataBig Data
Big Data
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acme
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Último (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Big Data and the BI Wild West

  • 1. Big Data and the BI Wild West Don’t Bring an Elephant to a Gun Fight! Paul Groom
  • 2.
  • 5. What is Business Intelligence? Numbers Tables Charts Indicators Time - History - Lag Access - to view (portal) - to data - to depth - Control/Secure Consumption - digestion …with ease and simplicity
  • 6. Business [Intelligence] Desires More timely Lower latency More granularity More users interactions Richer data model Self service
  • 8. Got mobile? 200 million Employees bring their own device to work Nearly half Of the workforce will be made up of millennials by 2020 50% Companies BYOD orgs have had a security breach 1/3 Have broken or would break corporate policy on BYOD
  • 10.
  • 12. BI tools have plateaued…again Decision Support (Reporting) in late 90’s Business Intelligence of 00’s …led to data mining …leading to analytics and data science
  • 13. More math …a lot more math
  • 14. Machine learning algorithms Dynamic Simulation Statistical Analysis Clustering Behaviour modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic Interaction Technology/Automation AnalyticalComplexity Campaign Management
  • 15. create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID rank() over (partition by Trans_Year order by count(distinct A rank() over (partition by Trans_Year order by sum(total_spend) from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000; select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; Behind the numbers
  • 16. It’s all about getting work done Used to be simple fetch of value Tasks evolving: Then was compute dynamic aggregate Now complex algorithms!
  • 17. Time to influence Reaction – what? – potential value Action – opportunity - interaction BI is becoming democratized
  • 19. Business [Intelligence] Desires in relation to Big Data More timely Lower latency More granularity More users interactions Richer data model Self service
  • 21.
  • 23.
  • 24. Reports against the DW are just plain dull, boring even!
  • 26. Hadoop ticks many but not all the boxes a aaaaaaa aa a aa aa aa a a aa a aa aaa a aa aa
  • 27. Stomped on costs Made economics of scale practical
  • 28. No need to pre-process before storage i.e. no need to align to storage No need to triage before storage
  • 29. Early bridge Building Early Hadoop integration tools
  • 30. The new bounty hunters: Drill Impala Pivotal Stinger The No SQL Posse Wanted Dead or Alive SQL
  • 31. …but Hadoop too slow for interactive BI …loss of train-of-thought still
  • 32. For once technology is on our side …oh and BTW RAM is cheap! CPU NetworkStorage
  • 33. Lots of these Not so many of these Hadoop is… Hadoop inherently disk oriented Typically low ratio of CPU to Disk
  • 36. Analytical Platform Reference Architecture Analytical Platform Layer Near-line Storage (optional) Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage
  • 38. Reach out, actively select and pull back to consume
  • 39. MPP everything – get more work done “No SQL” graduates to “not-only-SQL” SQL remains preferred data access language … for business community SQL can encapsulate other processing - in-line Python, R, Java etc.
  • 41. Big Data + Hadoop + in-memory for BI a aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaa a aaaaaaaa
  • 42. Wild West 1865 to 1890 "The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner. The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization. For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.
  • 43. Driving the golden spike for Hadoop and BI
  • 44. connect kognitio.com kognitio.tel kognitio.com/blog twitter.com/kognitio linkedin.com/companies/kognitio tinyurl.com/kognitio youtube.com/kognitio contact Michael Hiskey VP, Marketing & Business Development michael.hiskey@kognitio.com Paul Groom Chief Innovation Officer paul.groom@kognitio.com Steve Friedberg - press contact MMI Communications steve@mmicomm.com Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!

Notas del editor

  1. A brain we all depend on it – we spend early parts of our lives developing it then a few years pickling it with alcohol (not sure it helps preserve it) and then actually using itCorporations have to build and develop the corporate brain learn, adapt, develop or die!Business Intelligence is key part of that learning process
  2. BI is the digital brain of business – the corporate brain - it’s a collection of tools, process and objectives Ideally an ethos!Like Humans it needs learning, information and experimentation In all the sea of technology the values and reasoning get lost
  3. 7-Click build – step through text then arrow then spinSame as human learning occurs within group and context of community Requires acquisition of facts – get the data Ability to view and manipulate - get to see and interact with data Ability to discuss, absorb and review Then take action – in business Pull levers to changeAnd of course action changes things which requires iteration feedback
  4. Very crudely
  5. Its rarely about more charts, more colours, more report stylesLower latency – speed of access to new data - real time accessMore timely also ‘faster’where’s the value – in the data and in the accessBuild and they will come – its more about interactions per user than raw users (concurrency debate)
  6. Note: no click - Progressive build from start!Mobile access is coming alongApplication space broadening BYODCan supply access to BIBut also furiously generate data for BIAccess to dynamic information but every access generates data and possible inferencesSelf-service access
  7. Note: 1-Click progressive buildPurely as an aside - if anyone doubts the rise of mobile…
  8. In the mean time – data does not stop flowingReality check ! The big data fire hose is now full on!
  9. Note: no buildVisokio – omniscopeAlso Microstrategy Insight, SAP Analytics workbenchNew players like Domochanging players like Alteryx
  10. 2-click build – extend title, then progressive textPlateaued – what a great word for a run of vowels!loosing momentum – could almost say flat-liningThe enterprise toolsOnly so many variations of charts, tables, colours, layouts etc.Standard fabric
  11. 2-Click Build – ‘more’ then R logoThe progression every time from simple fetch and calcTo complex calculationMining aided discoverNew world is about dynamic – real time analyticsR being the torch bearer – cost effective! tool of choice for millennials coming out of university
  12. No BuildBottlenecks caused by platforms and tools unable to cope with demands of complexity, disparity and volumeComplex analyticsMachine learning – fraud detection/gamingWeb Analytics – Dynamic content/bid managementModelling – traditional clustering/behavioural for marketing/product development/resource optimisationInvestigative Reporting (Dashboards and reports with granular data access)Data Model
  13. Note: 1-Click BuildBI mostly focuses (sells) on presentation – Graphics, pictures, VisualisationBUT behind the scenes a lot of heavy lifting has to be doneThis workload has changed over time from the simple to complex
  14. 2-Click build – text added then diag addedWhat the business cares about is getting work done DW is now a bottleneck – its rigour and model get in the way!They really don’t care about how it is stored or where it is stored!Some tasks just plain to big to run! Its not about raw individual speed its about throughputAddress the bottlenecksToo many vendors play games that just shift the bottleneck
  15. Tension – Nearly high noon! Two interpretations -time ‘needed’ to influence – reaction - what - the time ‘now ‘to influence – action – opportunityTwo contexts - time to influence peers and managers - time to influence customersFastest draw now counts for a lot!
  16. Lots more debate and arguments like everything today need to be settled quicklyDangerous but exciting timesHowever Loss of control and governance – too much going on around the EDWBusiness and IT in gun fight – Wild West
  17. 1-Click BuildSo a quick check point – where are weMore timely – no – too much effort to work out what to do?Batch processing gets in the way of interactive accessSelf-serve if you are knowledgeable enoughWinning in some areas but not in all
  18. No build into swipe transitionOK Let’s not forget the data warehouse!Who couldIn previous presentation drew analogy with castles
  19. (Bodiam Castle – from Eric Star Picture) Consolidate power, protect, stand the test of time, some where safe in difficult timesThe DW built to protect the corporate knowledgeLaw and discipline – structure, trust, safe haven - Control
  20. 1-Click buildLots of investment and permanenceControlled access – tour access not full open accessDW starts to overload, starts to be selective,DW is inflexible – its controls get in the way of new data and big data – kills the three ‘V’sWho’s allowed in, what are they allowed to do and access – like visitors to modern castle - but not necessarily with nice guidebookUltimately its queues and delays cannot cope - users initially patiently, later impatientbusiness wants more and fasterIT see’s pressure from a different perspective – trouble and pain – Main inhibitor is complexity and cost
  21. A quick USA – wild west perspective on castlesMore like marts – less edifice, more practical functionWild West Castle – Rapidly constructed from local materials - few long term examplesTime to build – effort expended and time spent – more AgileRapidly moving new frontier just like modern BI – keep movingDisney recreation - Fanghoot
  22. 1-Click build – extend with boringDW is policed, it controls what you can have and in some case when you can have itHow many people get excited about their DW or access to a DWYes it gets the job done
  23. Well this little guy certainly woke a few people up! as if a yellow elephant could creep up on you!Hadoop will solve all my BI problems… RIGHT? Many business users still not fully aware of what Hadoop is
  24. 1-Click BuildHadoop is not &quot;universal solution“!Way too much hype and hyperbole - great for innovators and start-ups not so good for plain old business
  25. No click – progressive build from startCan debate ‘free’, but substantially reduced $$$
  26. 3-click build – Text then two postitsDW demanded ETL to map data into model and ensure logical consistency - upfront prerequisiteStructure is strangling the DW – it was its primary strength, now weaknessHadoop making people lazy – it cuts out thought but leaves future decisions wide open – no lock in, cuts risks of bad decisionsSimplified decisions of what to keep – keep it allBUT hey BI needs structure and discipline!!!!
  27. 2-click build – SqoopthenElephant photoIntegration between business infrastructure and systems and hadoop still limitedETL vendors not sure whether to love or hate Hadoop – will eat their lunchSqoop great for moving modelsNot so great for moving big data (or big elephants)Not exactly easy to move elephants on creaky railroads!
  28. 3-click build – wanted, scribble, new playersAh yes plugging into Hadoop So much for noSQL revolutionUniversal integration needed – protect the BI investmentLost the gun fight like all revolutions the upstarts died down and got absorbed (subsumed)Business and BI investment demands SQL!Hive now we have drill, impala, Pivotal,Tough game – yes its SQL access but not low latency
  29. 1-Click Build – insert ‘still’, pause, then loss…Remember the rise of data discoveryFine for big trawlsNot good for low latency iterations, high frequency accessThere, I have dared to say it!Does not accelerate BI quite in the way business was sold by the EDWLoss of “interactivity”A decade of being sold train-of-thoughtHadoop - Not hands on, not desktop, not agile
  30. 1-click build - RamBalance – full spectrum power availableExcellent computing powerUnlimited storageFast networksNo need for single platforms like the traditional DW – stores and analysesThis is why data sciences risesWe did not get this in rise of data mining in the 90’sWe’ll come onto RAM shortly
  31. 2-click buildHadoop disk centric – Storage - just like the EDW more parallelism yes, lots more but still batch disk I/O centricSchedulers not designed for rapid responseEssentially a batch queue – BI applications and business users have significantly evolved from batch reportingHadoop infrastructure evolution will drive more CPUs as they get work done!
  32. 1-Click BuildFlash is not in-memoryVendors flash-washing products – boosts I/OLimitations – cost high, capacity lowBig vendors of EDW systems just offer switching spinning drives for flash drives!EDW appliance vendors offer this at a premium cost – only makes sense if majority is flashReally its about nanoseconds not millisecondsTraditional EDW software is architected for lots of disk and relatively small amounts of CPUFlash helps – bandaid on problem – buys a little time for the EDW if you can afford it – digital jolt
  33. 2-click buildTo Be quick on drawLots of access to data - iterationsAnalytics is about work done – more work needs to be doneSo don’t hold CPUs back! – Highlight the cores – many more to comeCores help open up the bottleneck we saw earlierIn-memory is not cache!Memory is underplayed in Hadoop - its cheap use it!Processors and Ram are true measure of work that can be done – disks just fetchKeep data in memory!!! Don’t swap, don’t wait on disk don’t pick through indexes then data, just access what is needed.Economics of RAM have changed, much lower cost, large volumes readily available
  34. No BuildReal world viewWith better performance than DWAnd considerably better standards support for SQL – like 2011 standard!And full OLAP support both ODBO and XMLAKognitio runs on same technology as Hadoop – work in same farm
  35. Kognitio Hadoop connectorNon-invasive, uses standard HDFS/Map-Reduce access methodsFast to deploy – no coding neededActive selection is Kognitio machine codeMulti-threaded delivery backKognitio can retrieve terabytes – Terabyte in 10 mins – that’s a lot of M&amp;Ms
  36. No SQL Revolution dissipated/absobed – Business wonHadoop will be disk drive of futureHadoop will be data OS of future - data processing ecosystemPlatform for data scienceSQL will be primary access methodParallel execution and low latency will be demandedSupport for running any math or complex process
  37. 2-Click BuildGraduate analysis to productionKey future ability is to move rapidly from discovery to productionTaking findings from Data Scientists and within hours or days productionize!Discovery has shelf-life – time to influence is nowcloud computing flexibility, PaaS, SaaS, rapid deployment make this possible (enabler)Hadoop provides the consistent central storeCaneither scale-up and dedicateOr spawn new logical model based system populate at scale and start productionAdaptable
  38. 1-Click BuildLogical Data Warehouse components just need processes and SLA
  39. Followed the California gold-rush of 1848/49
  40. marking the completion of the Transcontinental Railroad.Wild West was tamed by infrastructure, by the engineers and naviesSo that the shop keepers, bankers and workers could easily followBusiness infrastructure will only move on when BI and Hadoop and supportingEcosystem comes together – create an information network 
  41. Kognitio