SlideShare una empresa de Scribd logo
1 de 30
Big Data Applied
David Strom
314 277 7832
STL TDWI December 2013
Agenda
•
•
•
•

Vendor landscape, users, skills
Mapping apps
Real world examples
Some lessons learned
Three skills for big data analysts
• Strategic data planning. Understand how data
is the new raw material for any modern
business.
• Analytical skills. Reporters have always been
smart about asking the right questions, but
now they have to dig through the data too.
• Technology skills. Embrace the technology
and make it a key part of your reporting skill
set.
Riot Games Current BI Stack
• Honu: Streaming log collection and event
processing pipeline
• Platfora: BI analysis and visualization
• Oozie: Workflow job scheduler
• Hive: Data warehouse and queries
• Chef: Code deployment and configuration
management
• GitHub: Versioning and tracking of programs
• Jenkins: Build system management
• Eureka: Service discovery process
Lessons learned
Don’t be religious!
Use maps!
Local Big Data Meetups
Stay in touch
• Copies of this presentation:
http://slideshare.net/davidstrom
• My blog: http://strominator.com
• Follow me on Twitter: @dstrom
• Old school: david@strom.com

http://strominator.com

30

Más contenido relacionado

La actualidad más candente

Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataConnected Data World
 
The New Basics of Business Intelligence Lesson 1: Big Data Exploration
The New Basics of Business Intelligence Lesson 1: Big Data ExplorationThe New Basics of Business Intelligence Lesson 1: Big Data Exploration
The New Basics of Business Intelligence Lesson 1: Big Data ExplorationZoomdata
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Time series analysis of stock
Time series analysis of stockTime series analysis of stock
Time series analysis of stockTuhin Mahmud
 
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Case Studies on Big-Data Processing and Streaming - Iranian Java User GroupCase Studies on Big-Data Processing and Streaming - Iranian Java User Group
Case Studies on Big-Data Processing and Streaming - Iranian Java User GroupAmir Sedighi
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotJen Stirrup
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
introduction to hadoop
introduction to hadoopintroduction to hadoop
introduction to hadoopASIT
 
RubiX ID - Big Data - Ruben Middeljans, Stephan Vos
RubiX ID - Big Data - Ruben Middeljans, Stephan VosRubiX ID - Big Data - Ruben Middeljans, Stephan Vos
RubiX ID - Big Data - Ruben Middeljans, Stephan VosRubiX BV
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Peter Morgan
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
 
Let's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis finalLet's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis finalSajeetharan
 
Bigdata Analytics using Hadoop
Bigdata Analytics using HadoopBigdata Analytics using Hadoop
Bigdata Analytics using HadoopNagamani Gurram
 
Too Big to Ignore
Too Big to IgnoreToo Big to Ignore
Too Big to IgnoreStephan Vos
 

La actualidad más candente (20)

Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open data
 
The New Basics of Business Intelligence Lesson 1: Big Data Exploration
The New Basics of Business Intelligence Lesson 1: Big Data ExplorationThe New Basics of Business Intelligence Lesson 1: Big Data Exploration
The New Basics of Business Intelligence Lesson 1: Big Data Exploration
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Time series analysis of stock
Time series analysis of stockTime series analysis of stock
Time series analysis of stock
 
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Case Studies on Big-Data Processing and Streaming - Iranian Java User GroupCase Studies on Big-Data Processing and Streaming - Iranian Java User Group
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
 
Big Data
Big DataBig Data
Big Data
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
 
Big Data
Big DataBig Data
Big Data
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
introduction to hadoop
introduction to hadoopintroduction to hadoop
introduction to hadoop
 
RubiX ID - Big Data - Ruben Middeljans, Stephan Vos
RubiX ID - Big Data - Ruben Middeljans, Stephan VosRubiX ID - Big Data - Ruben Middeljans, Stephan Vos
RubiX ID - Big Data - Ruben Middeljans, Stephan Vos
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013
 
Books neended
Books neendedBooks neended
Books neended
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
Let's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis finalLet's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis final
 
Bigdata Analytics using Hadoop
Bigdata Analytics using HadoopBigdata Analytics using Hadoop
Bigdata Analytics using Hadoop
 
Too Big to Ignore
Too Big to IgnoreToo Big to Ignore
Too Big to Ignore
 

Destacado

2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_Intel2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_IntelCOMPUTEX TAIPEI
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Matthieu Schapranow
 
Don't Become eCommerce Roadkill: How to Survive Your Next eCommerce Project
Don't Become eCommerce Roadkill: How to Survive Your Next eCommerce ProjectDon't Become eCommerce Roadkill: How to Survive Your Next eCommerce Project
Don't Become eCommerce Roadkill: How to Survive Your Next eCommerce ProjectPolished Geek LLC
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Intel IT Center
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case studySudhi Seshachala
 
Unlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and AnalyticsUnlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and AnalyticsIT@Intel
 

Destacado (8)

Intel and Big Data
Intel and Big DataIntel and Big Data
Intel and Big Data
 
2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_Intel2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_Intel
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
 
Don't Become eCommerce Roadkill: How to Survive Your Next eCommerce Project
Don't Become eCommerce Roadkill: How to Survive Your Next eCommerce ProjectDon't Become eCommerce Roadkill: How to Survive Your Next eCommerce Project
Don't Become eCommerce Roadkill: How to Survive Your Next eCommerce Project
 
SNEAPA 2013 Thursday b1 10_30_tomorrows climate
SNEAPA 2013 Thursday b1 10_30_tomorrows climateSNEAPA 2013 Thursday b1 10_30_tomorrows climate
SNEAPA 2013 Thursday b1 10_30_tomorrows climate
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
Unlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and AnalyticsUnlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and Analytics
 

Similar a Big Data Applied, Data Warehouse Institute St. Louis December 2013 speech

Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdfAyele40
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsBen Blaiszik
 
Technology Planning for River Groups
Technology Planning for River GroupsTechnology Planning for River Groups
Technology Planning for River GroupsSean Larkin
 
UI Dev in Big data world using open source
UI Dev in Big data world using open sourceUI Dev in Big data world using open source
UI Dev in Big data world using open sourceTech Triveni
 
1355 appliedsciencestrack dershewitz
1355 appliedsciencestrack dershewitz1355 appliedsciencestrack dershewitz
1355 appliedsciencestrack dershewitzRising Media, Inc.
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...
Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...
Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...Ohud Saud
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
 

Similar a Big Data Applied, Data Warehouse Institute St. Louis December 2013 speech (20)

Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning Models
 
Technology Planning for River Groups
Technology Planning for River GroupsTechnology Planning for River Groups
Technology Planning for River Groups
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
UI Dev in Big data world using open source
UI Dev in Big data world using open sourceUI Dev in Big data world using open source
UI Dev in Big data world using open source
 
1355 appliedsciencestrack dershewitz
1355 appliedsciencestrack dershewitz1355 appliedsciencestrack dershewitz
1355 appliedsciencestrack dershewitz
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...
Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...
Scaling big data mining infrastructure thetwitte experience - Jimmy Lin and D...
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 

Más de David Strom

Spark Twitter fails Mar2023
Spark Twitter fails Mar2023Spark Twitter fails Mar2023
Spark Twitter fails Mar2023David Strom
 
Getting Your First Cybersecurity Job
Getting Your First Cybersecurity JobGetting Your First Cybersecurity Job
Getting Your First Cybersecurity JobDavid Strom
 
Understanding passwordless technologies
Understanding passwordless technologiesUnderstanding passwordless technologies
Understanding passwordless technologiesDavid Strom
 
What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?David Strom
 
Fears and fulfillment with IT security
Fears and fulfillment with IT securityFears and fulfillment with IT security
Fears and fulfillment with IT securityDavid Strom
 
Protecting your digital and online privacy
Protecting your digital and online privacyProtecting your digital and online privacy
Protecting your digital and online privacyDavid Strom
 
AI and cyber security: new directions, old fears
AI and cyber security: new directions, old fearsAI and cyber security: new directions, old fears
AI and cyber security: new directions, old fearsDavid Strom
 
The legalities of hacking back
The legalities of  hacking backThe legalities of  hacking back
The legalities of hacking backDavid Strom
 
How to market your book in today's social media world
How to market your book in today's social media worldHow to market your book in today's social media world
How to market your book in today's social media worldDavid Strom
 
​Understanding the Internet of Things
​Understanding the Internet of Things​Understanding the Internet of Things
​Understanding the Internet of ThingsDavid Strom
 
How to make your mobile phone safe from hackers
How to make your mobile phone safe from hackersHow to make your mobile phone safe from hackers
How to make your mobile phone safe from hackersDavid Strom
 
Implications and response to large security breaches
Implications and response to large security breaches Implications and response to large security breaches
Implications and response to large security breaches David Strom
 
Using social networks to find your next job (2017)
Using social networks to find your next job (2017)Using social networks to find your next job (2017)
Using social networks to find your next job (2017)David Strom
 
Security v. Privacy: the great debate
Security v. Privacy: the great debateSecurity v. Privacy: the great debate
Security v. Privacy: the great debateDavid Strom
 
Using OpenStack to Control VM Chaos
Using OpenStack to Control VM ChaosUsing OpenStack to Control VM Chaos
Using OpenStack to Control VM ChaosDavid Strom
 
Notable Twitter fails
Notable Twitter failsNotable Twitter fails
Notable Twitter failsDavid Strom
 
How to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computingHow to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computingDavid Strom
 
Listen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better SupportListen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better SupportDavid Strom
 
Network security practice: then and now
Network security practice: then and nowNetwork security practice: then and now
Network security practice: then and nowDavid Strom
 
Biggest startup mistakes
Biggest startup mistakesBiggest startup mistakes
Biggest startup mistakesDavid Strom
 

Más de David Strom (20)

Spark Twitter fails Mar2023
Spark Twitter fails Mar2023Spark Twitter fails Mar2023
Spark Twitter fails Mar2023
 
Getting Your First Cybersecurity Job
Getting Your First Cybersecurity JobGetting Your First Cybersecurity Job
Getting Your First Cybersecurity Job
 
Understanding passwordless technologies
Understanding passwordless technologiesUnderstanding passwordless technologies
Understanding passwordless technologies
 
What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?
 
Fears and fulfillment with IT security
Fears and fulfillment with IT securityFears and fulfillment with IT security
Fears and fulfillment with IT security
 
Protecting your digital and online privacy
Protecting your digital and online privacyProtecting your digital and online privacy
Protecting your digital and online privacy
 
AI and cyber security: new directions, old fears
AI and cyber security: new directions, old fearsAI and cyber security: new directions, old fears
AI and cyber security: new directions, old fears
 
The legalities of hacking back
The legalities of  hacking backThe legalities of  hacking back
The legalities of hacking back
 
How to market your book in today's social media world
How to market your book in today's social media worldHow to market your book in today's social media world
How to market your book in today's social media world
 
​Understanding the Internet of Things
​Understanding the Internet of Things​Understanding the Internet of Things
​Understanding the Internet of Things
 
How to make your mobile phone safe from hackers
How to make your mobile phone safe from hackersHow to make your mobile phone safe from hackers
How to make your mobile phone safe from hackers
 
Implications and response to large security breaches
Implications and response to large security breaches Implications and response to large security breaches
Implications and response to large security breaches
 
Using social networks to find your next job (2017)
Using social networks to find your next job (2017)Using social networks to find your next job (2017)
Using social networks to find your next job (2017)
 
Security v. Privacy: the great debate
Security v. Privacy: the great debateSecurity v. Privacy: the great debate
Security v. Privacy: the great debate
 
Using OpenStack to Control VM Chaos
Using OpenStack to Control VM ChaosUsing OpenStack to Control VM Chaos
Using OpenStack to Control VM Chaos
 
Notable Twitter fails
Notable Twitter failsNotable Twitter fails
Notable Twitter fails
 
How to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computingHow to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computing
 
Listen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better SupportListen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better Support
 
Network security practice: then and now
Network security practice: then and nowNetwork security practice: then and now
Network security practice: then and now
 
Biggest startup mistakes
Biggest startup mistakesBiggest startup mistakes
Biggest startup mistakes
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Big Data Applied, Data Warehouse Institute St. Louis December 2013 speech

  • 1. Big Data Applied David Strom 314 277 7832 STL TDWI December 2013
  • 2. Agenda • • • • Vendor landscape, users, skills Mapping apps Real world examples Some lessons learned
  • 3.
  • 4.
  • 5.
  • 6. Three skills for big data analysts • Strategic data planning. Understand how data is the new raw material for any modern business. • Analytical skills. Reporters have always been smart about asking the right questions, but now they have to dig through the data too. • Technology skills. Embrace the technology and make it a key part of your reporting skill set.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Riot Games Current BI Stack • Honu: Streaming log collection and event processing pipeline • Platfora: BI analysis and visualization • Oozie: Workflow job scheduler • Hive: Data warehouse and queries • Chef: Code deployment and configuration management • GitHub: Versioning and tracking of programs • Jenkins: Build system management • Eureka: Service discovery process
  • 21.
  • 22.
  • 23.
  • 24.
  • 27.
  • 29. Local Big Data Meetups
  • 30. Stay in touch • Copies of this presentation: http://slideshare.net/davidstrom • My blog: http://strominator.com • Follow me on Twitter: @dstrom • Old school: david@strom.com http://strominator.com 30

Notas del editor

  1. I will have a link to these slides if you want to download them at the end of the presentation
  2. The Big Data tent is getting bigger, and this is just a small snapshot of the hundreds of vendors who are involved.
  3. As an example of how big data is moving into the mainstream, take a look at this conference earlier this year put on by the Regional Arts Commission and several community groups on how the arts can tap into better analytic tools.
  4. Your car has become a data hub, with USB ports, a SD card reader, Bluetooth connections to your phone and even a mobile Wifi hotspot. This next picture is a shot of the latest Ford My Touch dashboard that can be found in many of their cars. It provides all sorts of controls on what music you listen to, the indoor climate controls of your car, and a connection to your phone to dial your address book. Currently, Ford collects and aggregates data from the 4 million vehicles that use in-car sensing and remote app management software to create a virtuous cycle of information. The data allows Ford engineers to glean information on a range of issues, from how drivers are using their vehicles, to the driving environment, to electromagnetic forces affecting the vehicle, and feedback on other road conditions that could help them improve the quality, safety, fuel economy and emissions of the vehicle. Drivers willing to share how many miles they’ve traveled could get discounts between 10 and 40 percent in exchange for providing State Farm with a more accurate picture of their vehicle-use habits, which they obtain from directly accessing the Sync telematics systems in the cars electronically.
  5. Like Paul has posted in one of his blog entries earlier this year, it is time we started thinking that each of us develop all three of these sides and fill out our skills so we can become more valuable to our organizations. Paul posted:Why don't we put stronger emphasis on one person having the breadth of skills to play multiple roles on a given project?http://walkingoncoals.blogspot.com/2013/08/data-modelers-model-what-do-data.htmlhttp://www.readwriteweb.com/cloud/2012/02/strata-2012-3-essential-skills.phpDiego Saenz of Data Driven CEO
  6. Let’s move on to talking about maps.Maps can be extremely useful analysis tools, being able to spot corporate trends ahead of other methods and can be a part of a broader data analysis project that can win over your management for new business investments. This is a historic case -- A doctor used a map of cholera outbreaks in central London in the 1850s as a way to identify the sources of infected water pumps, this was used in Tufte’s book on visualizing data. Surprisingly, mapping disease transmission you think would have taken off after this example but it only recently has emerged as something that epidemeologists use for their own analyses. There isn’t much understanding about the spatial factors for disease risk today, and it is a rich field of study.
  7. And while Google Maps is certainly popular, there are other sites making it even more powerful that combine the wisdom of the crowds. These efforts includeCrowdmap and OpenStreet Maps. Here is a map that was crowd sourced of a neighborhood outside of Nairobi Kenya which until this effort was pretty much an uncharted territory, what mappers call outdoor white spaces. Thanks to this citizen effort, the community put together a map with all sorts of resources located such as water pumps and grocery stores. Other humanitarian efforts have been aided by open maps using crowds to help people get more control over their local government and make their politicians more accountable. This illustrates a big trend in online mapping where we are getting better and higher definition maps all the time. For example, once mapping specialists didn't care about where abandoned car tires were sitting on the ground by the sides of roads or in otherwise vacant lots. However, in certain parts of the world, these tires collect standing water and are places where insects can breed and carry disease. Now they are included in some maps.
  8. StreetRx.com can be used to find the least expensive medications in your local area.
  9. Let me put up the next slide showing you something a bit more palatable. David Smith put this map together from about 400 wineries in the Napa Valley area. Not only can you scroll and zoom the map, but clicking on one of the winery markers will tell you its address and whether an appointment is required for tastings. He worked with Barry Rowlingson who used OpenStreetMaps and his own R package to build this. And while 400 data points doesn't sound like a very big collection of data, what these guys did is noteworthy since they used a collection of APIs and open source code to produce the final product.
  10. Some of the firewall vendors have taken mapping a step further. When you set up their firewall rules, you can exclude or monitor traffic based on the country of origin. This can be helpful if you examine your firewall logs and see unexpected and unwanted traffic, such as exploits, coming from these countries. For example, let's say you are prohibited by law from doing business in certain export-controlled countries such as Cuba or North Korea. Wouldn't you like to know if your staff is handling support requests from Cubans? This could be a good indication that your products are entering those countries through grey markets. They have also integrated geofencing with their own reputation management systems so they can tie in their protection and identify particular domains that are known to send malware or to be able to locate where lots of exploits originate. Here is an example using the McAfee Firewall and its TrustedSource.org reputation management service. You can select particular countries to deny or allow traffic, using a simple series of menus. McAfee comes with some preset groups, such as countries with US export controls
  11. But that is just the great outdoors. The firm Aisle411.com is working with major retailers to produce custom indoor maps, to make it easier for shoppers to track down that odd piece of hardware at Lowe's or find the half-price jar of olives at the local supermarket. And others are in the process of creating inexpensive portable indoor sensors that can be distributed to building owners and occupants to collect information that could ultimately be used to improve business processes that happen in their buildings, such as changing production lines or environmental factors. What used to be done manually and took a lot of time and effort can be done digitally and can provide more insights and take less time
  12. This article ran in Restaurant News earlier this year and spoke about how several chains, including Boston Market, are using Big Data techniques to focus on particular store promotions to offer repeat customers prepaid debit cards as incentives to return.
  13. Big Data is also being used in some of the world's largest corporations. We are looking at Proctor and Gamble’s Business Sphere big data situation room in their Cincinnati HQ. A big data analyst drives these large screens that display data visualizations on sales, market share, ad spending and the like, so everyone in the meeting is seeing the same information based on 4 billion daily transactions of P&G products. P&G isn’t after new data types; it still wants to share and analyze point-of-sale, inventory, ad spending, and shipment data. What’s new is the higher frequency and speed at which P&G gets that data, and the finer granularity. Even with all this gear, P&G has about two-thirds of the real-time data it needs.
  14. This is an article in Forbes published last summer about Farmeron, a Web data service that farmers can use to aggregate the troves of information produced about their animals: It was started by two Croatian computer scientists. You can track animal physical characteristics along with milk production, medical treatments, and even particular feeding group schedules. You can view how the weight of your animals has changed based on certain feedlot procedures or keep up with the particulars of your animals' breeding schedules. So as soon as your animal is born or enters your farm, you can track all of these details in their database.And John Deere, the leading tractor company, isn’t just operating on idle either. Today’s tractors are pretty high tech affairs: a farmer can operate the machine without having hands on the steering wheel because the product is driven according to GPS coordinates. This improves precision in seeding, fertilizing, and allows for improved harvesting. Deere's tractors can collect significant data in what crops are planted, how they are fertilized and how much yield any portion of the field produces. You can even input curfew hours that you don't expect your machinery to be operating. There is even a web portal to monitor all this data.
  15. Traditional dairy operations are fairly labor intensive, requiring consistent milking regimens several times a day and seven days a week. That may be a thing of the past, thanks to a number of automatic milking machines that are available, such as this one from a Swedish company DeLaval. The machines have various arms that handle different tasks, such as sterilization, the actual milking process, and tracking the RFID tags that are placed in each cow's ear. They come with optical sensors to place their milking collectors at just the right place on the cow (we'll let you imagine the anatomical details on your own). And given that there are more than eight million Holstein dairy cows in the United States so the potential Big Data uses are huge.You can see the small computer control station on the right and there is even an Internet connection so that farmers can monitor the milk collection remotely and running their herd from a laptop. They can also milk their cows 24x7, which helps to increase production and is less stressful on both the farmer and the cows!
  16. We will be hearing from Jeff Melching first hand, but here is a little preview.Monsanto is using Hadoop in many Big Data efforts besides keeping track of their crop genomes and other biological plant properties. They also have photographic imagery of crop fields. All told, there are several tens of petabytes that need storage and analysis, a number that’s doubling roughly every 16 months. They also have invested in FarmCare, which sends mobile phone alerts about real-time weather threats to farmers, and North Star, a global supply chain transportation management system that has saved millions of dollars in overhead costs, and Precision Planting, which uses software to support farming techniques;
  17. FieldScripts is the first in an evolution of farming software tools that will provide a lot more intelligence and real-time information to farmers. In the past, farmers stored their agronomic data on USB sticks that they mailed to Monsanto for analysis—a cumbersome process, to say the least. Now the cloud has become part of the equation, with Monsanto considering how to best leverage the growth in mobile connections to send data from a farm for analysis.
  18. Riot Games began its operations with a monolithic SQL platform for its data warehouse. It required a great deal of manual, custom-coded processes. Queries were written in MySQL and most of the reporting was done in Excel. As you can imagine, this was causing them issues. the daily data extract update was approaching 24 hours to complete. Plus, debugging software errors meant digging deep into log dumps to figure out what went wrong.
  19. They replaced their system with Hadoop along with a cloud-based data warehouse and an end-to-end automated software development pipeline, using some of these tools shown here. They now have 7 PB of data!
  20. Germany’s largest online retailer, the Otto Group, gets about a million daily visitors to its fifty different Web storefronts. They set out last year on a project to better track their customers. Through a combination of tools including Hadoop and a massive Teradata data warehouse connected to their Intershop ecommerce system, they were able to sift through terabytes of website log files. They came up with what they call “Customer DNA” to identify how their customers come and go on their sites. Through a combination of tools including Hadoop and a massive Teradata data warehouse connected to their Intershop ecommerce system, Otto Group was able to mix SQL and NoSQL data collections effectively to focus their websites and boost traffic and salesThey pull more than 20 different databases into hadoop for this analysis.
  21. Hallmark cards introduces 10,000 new greeting cards are each year and their BI team is trying to become more data-driven. They say data is something that marketing needs to use in its business processes.
  22. PKO, Poland’s largest bank, was looking to roll out a new epayment app for its smartphone users and needed to identify those customers that were Internet-savvy and had the appropriate smartphones and were also comfortable with downloading apps. They used a combination of tools to comb their data warehouse and target the first 37,000 customers that fit their profile. But more importantly, they were able to measure the number of activations of their app by particular marketing campaigns to see which ones brought in the largest number of customers.
  23. Williams Sonoma is a classy retailer that has tried to make their online presence just as commanding and satisfying for their customers. Their site has various triggers that have been programmed to respond to particular customer actions, such as recent browsers of a particular item that is put on sale in its stores are notified via email of the sales by geography. It could be borderline creepy but it worksTheir goal is to match great looking Web pages with top-shelf analytics to keep track of customers.“Data science is brand building here, said one IT manager.The more online visitors buy a particular items, the more the company stocks them at retail outlets. The BI team analyzes these purchases over time to help improve each store’s inventory moving forward.
  24. So what are some important lessons to be learned from some of these examples that I have shown you today? Let’s cover a few recommendations on how you can improve your Big Data use.First, keep your Hadoop etc. stacks current. As you can see from the slide with all the software that Riot Games uses, there is a lot of new software to deal with. The community is constantly making updates and you don’t want to be the one asking about a bug in the forums that has already been fixed in a later update.
  25. Second, don’t be afraid to mix SQL and noSQL data. No need to be religious about it. I think John from SpliceMachine will have something to say along these lines a bit later. Also automate your software development pipeline. Complexity only introduces error, so eliminate manual methods wherever you can.
  26. The customer is always king, and data only serves to improve customer satisfaction. Many of the Big Data projects that I mentioned here were done for this purpose, rather than some rogue IT project.
  27. When in doubt, use a map. Here is one from Healthmap.org tracks modern day disease outbreaks
  28. Finally, don’t be afraid to get help (meetups, various wiki documentation, and github too)
  29. Thanks everyone for listening to me and good luck with your own Big Data explorations.