SlideShare a Scribd company logo
1 of 34
Getting to Insights Faster:
A Framework for Agile Big Data
@TimGasper
Director of Product
Infochimps, a CSC Big Data Business
Agenda
(1) IT’S ALL ABOUT THE APP
(2) WHAT IS A BIG DATA APP

(3) TRADITIONAL VS AGILE APPROACH
(4) ENABLERS OF AGILE BIG DATA
(5) DEMONSTRATION
What problem are
you trying to solve?
It’s all about the apps.
Poll Question 1
What is a Big Data app?

?

+

Critical
Business
Problems

=
Impactful
Analytic
Applications
Smart Meter
Monitoring for
Customer Value Add

Predictive Inventory
Levels to Minimize
Warehousing Costs

Personalized
Medicine Treatment
Programs

Trade Options and
Futures Pricing
Platform

Source: PARC

Customer Churn Analysis for
Increased Customer Lifetime
Value
Poll Question 2
It’s all about the apps.
Source: Tableau
Predictive Manufacturing +
Smart Manufacturing & Energy

Ad Publisher Campaign Analytics

360 Customer Experience Management

Social Media Monitoring & Analytics
The Traditional Way

Business
Discovery

Info
Discovery

Logical Data
Model

Physical Data
Model

System
Staging

Data Ingestion,
Transformation, ETL

Application
Development
Analytics

Data Warehouse Project
12-24 Months to Reach Production

Production
Staging
Big Data: A New Hope

Business
Discovery

Info
Discovery

Logical Data
Model

Physical Data
Model

System
Staging

Data Ingestion,
Transformation, ETL

Application
Development

Production
Staging

Analytics

Data Warehouse Project
12-24 Months to Reach Production
App Dev
Business
Discovery

Info
Discovery

Sys.
Stag.

Initial
Data
Ingest

Analytics

Schema on Read

App Dev
Prod.
Stag.

App Dev

App Dev

App Dev

Analytics

Analytics

Analytics

Analytics

Schema
on Read

Schema
on Read

Schema
on Read

Schema
on Read

Big Data Project
3-6 Months to Reach Production
Application Development Timelines
6
2 Developers

Months

5
2 Developers

Months

3
1 Developer

Months

4
2 Developers

Months
Speed to Value: A Case Study
HGST, a Western Digital company, is improving
customer support and product quality by collecting,
analyzing, and acting on massive quantities of
machine and sensor data.
 Greatly diminished operational burden with
ability to focus on analysis and driving business
action
 Fast project delivery and success
 Expertise with Big Data technologies like
Hadoop

KEY STATS
Industry Storage Technology
Solution Machine Data Analysis
Engine
Channel B2B
Cloud Services Cloud::Queries
Cloud::Hadoop
Users Application Developers,
Data Scientists, Analysts

Deployment Amazon Web Services
Poll Question 3
Enablers of Agile Big Data
1.

​Managed infrastructure means focusing on Big Data apps

2.

The community tech itself and what it enables

3.

​Our customer engagement framework for choosing use
cases that have impact and designing successful solutions

1.

​Agile, iterative analytics app dev lifecycle

1.

​Our application reference design framework for kick starting
application development
A Managed Platform
Technologies Under the Hood
PART 1

HADOOP
​• Java ​MapReduce
​• Streaming MapReduce
​• SQL on Hadoop, Pig, Hive
​NOSQL DATABASES
​• ​ HBase/Accumulo
​• ​ Elasticsearch
​• ​ Cassandra, MongoDB
​STREAM PROCESSING, MESSAGE QUEUES
​• Storm
​• Kafka
Technologies Under the Hood
PART 2

HADOOP INTERFACES
​• Hue
​• Command Line
​STATISTICAL TOOLS
• R, SAS, SPSS
​BUSINESS INTELLIGENCE AND DATA VIZ
• Legacy: Cognos, Biz Objects, OBIEE, Microsoft BI
• New Gen: Tableau, Qlikview, SiSense, Kibana
Our Unique Toolset Addition
SaaS

Develop & Test Locally with
App/Analytics Scripting &
“Deploy Pack” Orchestration

PaaS

Real-time Analytics
With Cloud::Streams

Interactive Analytics
With Cloud::Queries

Batch Analytics
With Cloud::Hadoop

Abstract to any cloud with
Orchestration DSL

IaaS

Public Cloud

Virtual Private Cloud

Private Cloud
Customer Engagement Framework
Service Requirements
Week 1-2
Discovery

Design & Build
Week 3-4
Technical Design

Production
Ongoing
Iterative App Development

Week 5-8+
Platform Rollout

Build Data
Flows

Interview Key
Business
Stakeholders

Define
Business
Benefits

Design Data
Flows

Interview Key
Technical
Stakeholders

Define Target
Use Case

Define
Architecture

Define
Objectives &
Challenges

Develop HighLevel
Approach &
Costs

Identify Data
Sources

Agree to
Project
Plan/Rollout

Real-Time
Data Flow

Architecture
Validation

Standup /
Connect
Environment

Tuning

Solution

Historical
Data

MAJOR ACTIVITIES
• Run 2-4 hour Design Thinking
Workshop
• Review current state metrics
• Review business pain points &
opportunities
• Review application & infrastructure
environment
• Define target use case

• Identify data sources for target
use case
• Develop high level tech
approach and costs
• Define high level benefits
• Develop initial case for action
• Develop go forward plan

• Develop Data Model
• Technical architecture &
integration design
• Stand up environment
• Dashboard design workshops
• Data mapping

• Build prototype dashboard
• Configure prototype
application
• Data load
• Run solution iterations
• Analytical modeling
Agile Iteration for App Dev

::
App Reference Design Framework

• A use-case-driven reference design
• A code repository with:
o
o
o
o

Domain-specific sample data sets/sources
Sample data flows
Sample data processors/analytics
Simple data visualization
App Reference Designs
Predictive Manufacturing +
Smart Manufacturing & Energy

Ad Publisher Campaign Analytics

360 Customer Experience Management

Social Media Monitoring & Analytics
Social Media App Reference Design
Demonstration
Big Data Benefits
ENABLED BY
• ​Unstructured data and semi-structured data allow for faster path to data integration
• ​Real-time analysis and batch analysis with scripting tools
• ​Schema on read for app-driven data models and data structures
• ​Local to cloud, small data to big data… tools can talk to each other​

New Use Cases

New Analytics
and Analytical
Techniques

More
Data

Time to Value

Faster Iteration
Faster
Data

Increased
Flexibility
What is Your First Big Data App?
Learn More »
sales@infochimps.com
1-855-328-2386
Request a Demo:
http://infochimps.com/demo

Q&A

More Related Content

What's hot

Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureMicrosoft
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDataWorks Summit/Hadoop Summit
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine LearningLynn Langit
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 

What's hot (20)

Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 

Viewers also liked

A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)
A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)
A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)Dan Frechtling
 
De klant van hoofd naar hart met persona's
De klant van hoofd naar hart met persona'sDe klant van hoofd naar hart met persona's
De klant van hoofd naar hart met persona'sRalph Poldervaart
 
Effect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge Managers
Effect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge ManagersEffect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge Managers
Effect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge ManagersKoen Marichal
 
Het opstellen van een persona
Het opstellen van een personaHet opstellen van een persona
Het opstellen van een personaPatrick Klerks
 
Cusomer Intelligence : empathy at scale
Cusomer Intelligence : empathy at scaleCusomer Intelligence : empathy at scale
Cusomer Intelligence : empathy at scaleCharles Lafage
 
Personas ... Hoe maak je personas, en hoe pas je Personas toe
Personas ... Hoe maak je personas, en hoe pas je Personas toePersonas ... Hoe maak je personas, en hoe pas je Personas toe
Personas ... Hoe maak je personas, en hoe pas je Personas toeEvert Moolhuijsen
 
Eduvision - Webinar Marketing Analytics en Intelligence
Eduvision - Webinar Marketing Analytics en IntelligenceEduvision - Webinar Marketing Analytics en Intelligence
Eduvision - Webinar Marketing Analytics en IntelligenceEduvision Opleidingen
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Viewers also liked (12)

A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)
A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)
A_Complete_Approach_to_KYC_With_Business_Customer_Intelligence (1)
 
De klant van hoofd naar hart met persona's
De klant van hoofd naar hart met persona'sDe klant van hoofd naar hart met persona's
De klant van hoofd naar hart met persona's
 
Effect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge Managers
Effect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge ManagersEffect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge Managers
Effect Van 10 Wijze Mentors Op Leiderschap Van 10 Jonge Managers
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Doelgroepen en persona's
Doelgroepen en persona'sDoelgroepen en persona's
Doelgroepen en persona's
 
Het opstellen van een persona
Het opstellen van een personaHet opstellen van een persona
Het opstellen van een persona
 
CV - Russell Beattie
CV - Russell BeattieCV - Russell Beattie
CV - Russell Beattie
 
Cusomer Intelligence : empathy at scale
Cusomer Intelligence : empathy at scaleCusomer Intelligence : empathy at scale
Cusomer Intelligence : empathy at scale
 
Personas ... Hoe maak je personas, en hoe pas je Personas toe
Personas ... Hoe maak je personas, en hoe pas je Personas toePersonas ... Hoe maak je personas, en hoe pas je Personas toe
Personas ... Hoe maak je personas, en hoe pas je Personas toe
 
Eduvision - Webinar Marketing Analytics en Intelligence
Eduvision - Webinar Marketing Analytics en IntelligenceEduvision - Webinar Marketing Analytics en Intelligence
Eduvision - Webinar Marketing Analytics en Intelligence
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to [Webinar] Getting to Insights Faster: A Framework for Agile Big Data

C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyNeo4j
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsSense Corp
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikBardess Group
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleBardess Group
 

Similar to [Webinar] Getting to Insights Faster: A Framework for Agile Big Data (20)

C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph Technology
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern Analytics
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
 

More from Infochimps, a CSC Big Data Business

Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...Infochimps, a CSC Big Data Business
 

More from Infochimps, a CSC Big Data Business (13)

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data

  • 1. Getting to Insights Faster: A Framework for Agile Big Data @TimGasper Director of Product Infochimps, a CSC Big Data Business
  • 2. Agenda (1) IT’S ALL ABOUT THE APP (2) WHAT IS A BIG DATA APP (3) TRADITIONAL VS AGILE APPROACH (4) ENABLERS OF AGILE BIG DATA (5) DEMONSTRATION
  • 3. What problem are you trying to solve?
  • 4. It’s all about the apps.
  • 6.
  • 7. What is a Big Data app? ? + Critical Business Problems = Impactful Analytic Applications
  • 8. Smart Meter Monitoring for Customer Value Add Predictive Inventory Levels to Minimize Warehousing Costs Personalized Medicine Treatment Programs Trade Options and Futures Pricing Platform Source: PARC Customer Churn Analysis for Increased Customer Lifetime Value
  • 10.
  • 11. It’s all about the apps.
  • 13.
  • 14. Predictive Manufacturing + Smart Manufacturing & Energy Ad Publisher Campaign Analytics 360 Customer Experience Management Social Media Monitoring & Analytics
  • 15. The Traditional Way Business Discovery Info Discovery Logical Data Model Physical Data Model System Staging Data Ingestion, Transformation, ETL Application Development Analytics Data Warehouse Project 12-24 Months to Reach Production Production Staging
  • 16. Big Data: A New Hope Business Discovery Info Discovery Logical Data Model Physical Data Model System Staging Data Ingestion, Transformation, ETL Application Development Production Staging Analytics Data Warehouse Project 12-24 Months to Reach Production App Dev Business Discovery Info Discovery Sys. Stag. Initial Data Ingest Analytics Schema on Read App Dev Prod. Stag. App Dev App Dev App Dev Analytics Analytics Analytics Analytics Schema on Read Schema on Read Schema on Read Schema on Read Big Data Project 3-6 Months to Reach Production
  • 17. Application Development Timelines 6 2 Developers Months 5 2 Developers Months 3 1 Developer Months 4 2 Developers Months
  • 18. Speed to Value: A Case Study HGST, a Western Digital company, is improving customer support and product quality by collecting, analyzing, and acting on massive quantities of machine and sensor data.  Greatly diminished operational burden with ability to focus on analysis and driving business action  Fast project delivery and success  Expertise with Big Data technologies like Hadoop KEY STATS Industry Storage Technology Solution Machine Data Analysis Engine Channel B2B Cloud Services Cloud::Queries Cloud::Hadoop Users Application Developers, Data Scientists, Analysts Deployment Amazon Web Services
  • 20.
  • 21. Enablers of Agile Big Data 1. ​Managed infrastructure means focusing on Big Data apps 2. The community tech itself and what it enables 3. ​Our customer engagement framework for choosing use cases that have impact and designing successful solutions 1. ​Agile, iterative analytics app dev lifecycle 1. ​Our application reference design framework for kick starting application development
  • 23. Technologies Under the Hood PART 1 HADOOP ​• Java ​MapReduce ​• Streaming MapReduce ​• SQL on Hadoop, Pig, Hive ​NOSQL DATABASES ​• ​ HBase/Accumulo ​• ​ Elasticsearch ​• ​ Cassandra, MongoDB ​STREAM PROCESSING, MESSAGE QUEUES ​• Storm ​• Kafka
  • 24. Technologies Under the Hood PART 2 HADOOP INTERFACES ​• Hue ​• Command Line ​STATISTICAL TOOLS • R, SAS, SPSS ​BUSINESS INTELLIGENCE AND DATA VIZ • Legacy: Cognos, Biz Objects, OBIEE, Microsoft BI • New Gen: Tableau, Qlikview, SiSense, Kibana
  • 25. Our Unique Toolset Addition SaaS Develop & Test Locally with App/Analytics Scripting & “Deploy Pack” Orchestration PaaS Real-time Analytics With Cloud::Streams Interactive Analytics With Cloud::Queries Batch Analytics With Cloud::Hadoop Abstract to any cloud with Orchestration DSL IaaS Public Cloud Virtual Private Cloud Private Cloud
  • 26. Customer Engagement Framework Service Requirements Week 1-2 Discovery Design & Build Week 3-4 Technical Design Production Ongoing Iterative App Development Week 5-8+ Platform Rollout Build Data Flows Interview Key Business Stakeholders Define Business Benefits Design Data Flows Interview Key Technical Stakeholders Define Target Use Case Define Architecture Define Objectives & Challenges Develop HighLevel Approach & Costs Identify Data Sources Agree to Project Plan/Rollout Real-Time Data Flow Architecture Validation Standup / Connect Environment Tuning Solution Historical Data MAJOR ACTIVITIES • Run 2-4 hour Design Thinking Workshop • Review current state metrics • Review business pain points & opportunities • Review application & infrastructure environment • Define target use case • Identify data sources for target use case • Develop high level tech approach and costs • Define high level benefits • Develop initial case for action • Develop go forward plan • Develop Data Model • Technical architecture & integration design • Stand up environment • Dashboard design workshops • Data mapping • Build prototype dashboard • Configure prototype application • Data load • Run solution iterations • Analytical modeling
  • 27. Agile Iteration for App Dev ::
  • 28. App Reference Design Framework • A use-case-driven reference design • A code repository with: o o o o Domain-specific sample data sets/sources Sample data flows Sample data processors/analytics Simple data visualization
  • 29. App Reference Designs Predictive Manufacturing + Smart Manufacturing & Energy Ad Publisher Campaign Analytics 360 Customer Experience Management Social Media Monitoring & Analytics
  • 30. Social Media App Reference Design
  • 32. Big Data Benefits ENABLED BY • ​Unstructured data and semi-structured data allow for faster path to data integration • ​Real-time analysis and batch analysis with scripting tools • ​Schema on read for app-driven data models and data structures • ​Local to cloud, small data to big data… tools can talk to each other​ New Use Cases New Analytics and Analytical Techniques More Data Time to Value Faster Iteration Faster Data Increased Flexibility
  • 33. What is Your First Big Data App?
  • 34. Learn More » sales@infochimps.com 1-855-328-2386 Request a Demo: http://infochimps.com/demo Q&A

Editor's Notes

  1. quick stories about the transformative effect big data can have on a business... or the world​an app is a use case! big data is not a toy. exploration is great... but to what end? focus leads to value faster.​diagram of all the different use cases and industries that big data affectsTHIS WAS A LITTLE LONG, KEEP IT SHORT AND SWEET
  2. quick stories about the transformative effect big data can have on a business... or the world​an app is a use case! big data is not a toy. exploration is great... but to what end? focus leads to value faster.​diagram of all the different use cases and industries that big data affects
  3. where are you in terms of adoption of big data applications?already applications in productionapps currently under developmentplanning and evaluation phaseresearching / early explorationI don’t know / No current plans
  4. where are you in terms of adoption of big data applications?already applications in productionapps currently under developmentplanning and evaluation phaseresearching / early explorationI don’t know / No current plans
  5. our use of the terms analytics and analysis is extremely broad. i would consider it both simple statistics as well as more advanced modeling. when i want to call out modeling, i usually use the term "modeling" specifically ... or i will use the phrase "advanced analytics" to differentiate it from simpler analytics. the phrase "analytic application" is essentially meant to mean data-oriented, use case -driven applications.
  6. Have you identified your first big data application use case (or next one)?YesNoI don’t know
  7. quick stories about the transformative effect big data can have on a business... or the world​an app is a use case! big data is not a toy. exploration is great... but to what end? focus leads to value faster.​diagram of all the different use cases and industries that big data affects
  8. re-emphasize iterative design here… it’s an organization change and a technology changeone of Jims diagrams that has traditional data analysis application cycles, including the long time spent upfront doing data modeling and ETL transformation​build the diagram from one step to the next via animations​this is problematic for three reasons:​time to value is slower​takes longer to determine first checkpoint of success or failure of the project​difficult to iterate​
  9. re-emphasize iterative design here… it’s an organization change and a technology changeone of Jims diagrams that has traditional data analysis application cycles, including the long time spent upfront doing data modeling and ETL transformation​build the diagram from one step to the next via animations​this is problematic for three reasons:​time to value is slower​takes longer to determine first checkpoint of success or failure of the project​difficult to iterate​
  10. diagram of the four customers and how fast they developed apps and how few developers it took to create themDON’T DWELL HERE, DON’T TALK TO EVERY SINGLE USE CASE… STAY HIGH LEVEL
  11. what does HGST do? saying they are a part of western digital isn’t enough.
  12. Poll 3: What is your biggest challenge to realizing the value of Big Data applications? Talent gap/experience Cost of capital investment Big Data technology risk Failed prior projects Other/N/A
  13. java compilingetc versus scripting approaches… we really like using scripting tools
  14. wukong and ironfan are both open source, and we’ve contributed them back!- - - -similar to the slide that shows how Wukong is the DSL for big data app dev, and Ironfan is the DSL for big data infrastructure dev, except incorporate the broader picture of Tachyon the orchestrator and the Deploy Pack application code vesselSo now let’s drill in and look at how we actually deliver a solutionThe ProblemThere are two complementary ways to process Big Data: batch processing and real-time (or stream) processing. These are traditionally viewed as very different approaches to solving problems, especially in a Big Data context, where the toolsets for each kind of processing differ greatly. Typically, for cross-platform there are several issues that slow down analytic development:You Need to Run the Whole Thing – that means that the entire infrastructure has to be running in order to test small changes.You Will Wait 10 Minutes Every Time You Make a Mistake – Compiling Jar files, transferring code, launching jobs, and finding log files is time consuming.You will Disrupt Production Traffic – If you are doing any testing at scaleHadoop Does Not Understand Storm and Storm Does Not Understand Hadoop– Same language, different paradigms, different base classes.The SolutionWukong is a Domain Specific Language (DSL) designed specifically for data analytics, processing, and flow. It abstracts the platform that the analytics are running on (like Hadoop, Storm, or your local command line) and allows you to focus on writing analytics.A simple wukong script could easily be written in a few lines in a plain text file on your hard drive. It can then be run as a simple command line application, or used as a large Hadoop job or as part of a real-time Storm topology. The same analytics can be leveraged over and over again across your enterprise.Wukong enables its users to:Write and test code locally – from the command lineAvoid Disrupting Others – your deploy packDebug Rapidly – see results in real-timeSeamlessly move between contexts – like real-time (with Storm) and batch (with Hadoop)This allows for every rapid iteration of analytics, and allows your data scientists to be as agile as your business demands.QuestionsIf you develop real-time analytics, how would you run those against historical data?Does every developer in your organization have their own Hadoop cluster? Referenceshttp://www.infochimps.com/infochimps-cloud/tools/wukong/https://github.com/infochimps-labs/wukong/tree/3.0.0/
  15. need to make shape be all the way up through project planplatform rollout has a lot more too it including QA/testing, analytics development, production rollout of first application, training, acceptance/success testingproduction should have infrastructure support, application/analytics support, SLA, managed services (training and acceptance testing could be part of the bottom part)
  16. probably the diagram that shows the loop from local to cloud... except updated and made more powerful... maybe have the animations build as well
  17. Call on the audience to figure out their first application and begin the path toward success by following the framework in this webinar​If big data projects are already underway, are you finding business value? do you feel like you are iterating through use cases? are your personnel utilizing their existing talents and strengths?
  18. I invite you to let us know what your use case is, and we can help you evaluate which tools and architecture is appropriate to solve it. Now we are open to questions!