SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
making enterprise elephants dance
(gangnam style)
Andy Palmer, Co-Founder & CEO Tamr
Career is a mash-up of:
start-ups + enterprise
customer + vendor
data + application
technical + business
The View from 30,000 Feet … ok - from low earth orbit
The time has come to manage information across the enterprise for strategic benefit.
Be the “Googler” of your enterprise
Simply put : manage a company’s information as an asset - at least as well as
Google tries to manage the world’s information as an asset
Assume your information assets are as diverse as the modern web - but not the
same - data matters more than documents.
What does this mean?
However…
MOST OF US ARE NOT GOOGLE in
the level of quality and quantity of
engineering resources
Google makes it look easy sometimes
because they have much of the best
talent in the world
Data Silos are a primary bottleneck
Viz tools are democratizing analysis - D3.org, Tableau, Spotfire, etc
“Big Data Mania” represents an opportunity to re-architect for flexibility + agility
Monolithic, hard-coded warehouses & ETL constrain experimentation, collaboration and agility
Entities do not have perfect definitions - don’t try to force it...
Static schemas/data structures are great for collection but have “drag coefficient” for analytics
Embrace data variety as a reality - leave the monolithic vendors pretending they can lock us in
Semantic approaches allow access to diverse data and agile integration to solve specific questions
Data marts should be available “on demand” using tech “@ target” that suits the analytic
“You can’t get there from here” : NOT Enterprise Data “Business as Usual”
Part of the Answer:
The 3 V’s?
important but...
…not enough...
Try this …
Start with the Questions, not the Answer - “Analytic context will set you free”
● Ask aspirational/transformational analytic questions
● Use them as context for defining all the work you do
● Build your infrastructure to answer the analytical questions
In the process….
● Get a broad and dynamic inventory of all your data
● Match workload to appropriate engine/tech
● Use Distributed Systems - radically lower cost vs. traditional
● Expect modern and dynamic visualization - iterative vs. reporting
● Treat Cloud as a first-order resource - not just ancillary
● Modern DevOps - core capability
● JSON sources will proliferate...embrace it
● Bottom-up data/metadata management
● Internal and external data - both valuable but not same
Start with the Questions, not the Answer….
….but sometimes it’s not simple….
...embrace the ambiguity...
Same but Different - Identity depends on the question:
● Gleevec, Glivec and Imatinib
● Same INCHI Key
● Formulation vs. Substance
● Product versus compound
● Regional naming difference
● Canonicalization depends on context
InChI=1S/C29H31N7O/c1-21-5-10-25(18-27(21)34-29-
31-13-11-26(33-29)24-4-3-12-30-19-24)32-28(37)23-8-
6-22(7-9-23)20-36-16-14-35(2)15-17-36/h3-13,18-19H,
14-17,20H2,1-2H3,(H,32,37)(H,31,33,34)
Pick a problem that is:
- greenfield
- well-defined
- valuable
DO NOT BOIL THE OCEAN
Great Viz has never been more accessible
Distributed Systems
For data science at scale, we can’t afford to pay the
“enterprise IT tax”
Need to build an enterprise infrastructure as
inexpensive, scalable and persistent as that of modern
web companies
Mindset: Put tight spending limits on storage and
systems infrastructure … and it will take you toward a
place similar to the modern internet consumer
companies - this is a good place :)
Facebook CIO talking about Vertica
The Cloud
A first-level citizen in the enterprise infrastructure
Fact...not opinion: The world’s largest high-
performance computing and persistence infrastructure
is available for you to rent on-demand
Let’s drop the hubris of on-prem enterprise data
centers much like we don’t generate our own electricity
anymore….
DevOps
DevOps matters as much for data as for software
DevOps is to the Cloud as Systems Management was
to Client-Server computing
● Couldn’t live without Systems Management then
● Can’t live without DevOps now
Getting to scale (managing hundreds/thousands of
machines) on demand requires automated tools and a
modern DevOps infrastructure.
JSON
JSON is now a primary tool to access data
Ultimate evolution of relational and object-oriented
technologies coming together
Provides a loose, flexible coupling between data access
and applications
Definition of flexibility: As long as it’s JSON, we don’t
need to care what’s behind it
Variety - how to tackle the enterprise data silo problem
Standardization and Aggregation are necessary but not
sufficient to solve the challenges of Enterprise
Analytics 3.0
Bottom-Up + Top Down Data Modeling & “Collaborative Curation”
Time to embrace the reality of extreme data variety across
the entire enterprise - “Unified Data”
Requires a bottom-up, probabilistic approach to data
curation and integration (compliment deterministic)
● mix of 80% probabilistic & 20% deterministic
● Tamr’s primary design pattern
Back to the future:
● 1990’s web: probabilistic search and website connection
● 2020’s enterprise: probabilistic data source connection &
curation
Internal and External Data
Internally and externally generated data are
now BOTH important
If our orgs are going to become truly data-
driven, we have to embrace external data
We need to get to the point that, a la Google,
we don’t care where it comes from
Google Maps, for example
● Seamless integration of internal Google
and external data
● And Google just doesn’t care
In Summary
● Manage your information as an asset
● Start with a broad inventory of all your data
● Embrace ambiguity/variety of enterprise data
● Throw the “one schema to rule them all” into the
fires of Mordor…
● Embrace modern viz & iterative analytics
● Don’t ignore the Cloud - it’s inevitable
● DevOps is cool - and fun :)
● JSON is the future of data access - it’s ok
● True shared nothing distributed systems are the
only way out of the “Enterprise IT Tax”
Discussion

Más contenido relacionado

La actualidad más candente

Data science tips for data engineers
Data science tips for data engineersData science tips for data engineers
Data science tips for data engineersIBM Analytics
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIMC Institute
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesBen Siscovick
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
 
Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?Thanakrit Lersmethasakul
 
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...Neo4j
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Brad Culbert
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
Modern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the IndustryModern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the IndustryTableau Software
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?panoratio
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsSystems Limited
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 

La actualidad más candente (20)

Data science tips for data engineers
Data science tips for data engineersData science tips for data engineers
Data science tips for data engineers
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA Ventures
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
 
Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROI
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?
 
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015Business Analytics & Big Data Trends and Predictions 2014 - 2015
Business Analytics & Big Data Trends and Predictions 2014 - 2015
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Modern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the IndustryModern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the Industry
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data Analytics
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 

Destacado

Tamr presentation
Tamr presentationTamr presentation
Tamr presentationAdam Hasler
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
 
Mit Romney 1040 tax return 2011
Mit Romney 1040 tax return 2011Mit Romney 1040 tax return 2011
Mit Romney 1040 tax return 2011Kit Seeborg
 
Tech M&A Monthly: 10 Ways to Increase Your Company's Value
Tech M&A Monthly: 10 Ways to Increase Your Company's ValueTech M&A Monthly: 10 Ways to Increase Your Company's Value
Tech M&A Monthly: 10 Ways to Increase Your Company's ValueCorum Group
 
Open Source Software for Data Scientists -- Great Wide Open 2014
Open Source Software for Data Scientists -- Great Wide Open 2014Open Source Software for Data Scientists -- Great Wide Open 2014
Open Source Software for Data Scientists -- Great Wide Open 2014Charlie Greenbacker
 
Battling Drug Cartels with Big Data Using Lumify
Battling Drug Cartels with Big Data Using LumifyBattling Drug Cartels with Big Data Using Lumify
Battling Drug Cartels with Big Data Using LumifyAll Things Open
 
Travel Security 10 30 09
Travel Security 10 30 09Travel Security 10 30 09
Travel Security 10 30 09James Kane
 
Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)
Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)
Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)Ari Zoldan
 
잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서
잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서
잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서잡코리아 글로벌 프런티어
 
Media Visie 2015 (ABN AMRO)
Media Visie 2015 (ABN AMRO)Media Visie 2015 (ABN AMRO)
Media Visie 2015 (ABN AMRO)Jim Stolze
 

Destacado (15)

Tamr presentation
Tamr presentationTamr presentation
Tamr presentation
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Mit Romney 1040 tax return 2011
Mit Romney 1040 tax return 2011Mit Romney 1040 tax return 2011
Mit Romney 1040 tax return 2011
 
Introduction to Exponentials Insights 2016
Introduction to Exponentials Insights 2016Introduction to Exponentials Insights 2016
Introduction to Exponentials Insights 2016
 
Tech M&A Monthly: 10 Ways to Increase Your Company's Value
Tech M&A Monthly: 10 Ways to Increase Your Company's ValueTech M&A Monthly: 10 Ways to Increase Your Company's Value
Tech M&A Monthly: 10 Ways to Increase Your Company's Value
 
Revista gm
Revista gmRevista gm
Revista gm
 
东吴-费森尤斯
东吴-费森尤斯东吴-费森尤斯
东吴-费森尤斯
 
Open Source Software for Data Scientists -- Great Wide Open 2014
Open Source Software for Data Scientists -- Great Wide Open 2014Open Source Software for Data Scientists -- Great Wide Open 2014
Open Source Software for Data Scientists -- Great Wide Open 2014
 
Chicago Safety Conference Presentation 2009
Chicago Safety Conference Presentation 2009Chicago Safety Conference Presentation 2009
Chicago Safety Conference Presentation 2009
 
Battling Drug Cartels with Big Data Using Lumify
Battling Drug Cartels with Big Data Using LumifyBattling Drug Cartels with Big Data Using Lumify
Battling Drug Cartels with Big Data Using Lumify
 
Travel Security 10 30 09
Travel Security 10 30 09Travel Security 10 30 09
Travel Security 10 30 09
 
Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)
Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)
Proxim Tsunami MP11 Series Datasheet(www.quantumwimax.com)
 
잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서
잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서
잡코리아 글로벌 프런티어 1기_노점순_탐방 계획서
 
Media Visie 2015 (ABN AMRO)
Media Visie 2015 (ABN AMRO)Media Visie 2015 (ABN AMRO)
Media Visie 2015 (ABN AMRO)
 
Abn Amro
Abn AmroAbn Amro
Abn Amro
 

Similar a Tamr | Making enterprise elephants dance @ boston data festival

Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Boris Adryan
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Aligning Corporate Business Goals with Technology
Aligning Corporate Business Goals with TechnologyAligning Corporate Business Goals with Technology
Aligning Corporate Business Goals with TechnologyInnoTech
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?Snowplow Analytics
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014Adam Ferrari
 
Going to the SP2013 Cloud - what does a business need to make it successful?
Going to the SP2013 Cloud - what does a business need to make it successful?Going to the SP2013 Cloud - what does a business need to make it successful?
Going to the SP2013 Cloud - what does a business need to make it successful?Matt Groves
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategyIBM Sverige
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataKai Wähner
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfssuserd23711
 
Toigo Critical Convergence
Toigo  Critical ConvergenceToigo  Critical Convergence
Toigo Critical Convergencehypknight
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?DATAVERSITY
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 

Similar a Tamr | Making enterprise elephants dance @ boston data festival (20)

Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Aligning Corporate Business Goals with Technology
Aligning Corporate Business Goals with TechnologyAligning Corporate Business Goals with Technology
Aligning Corporate Business Goals with Technology
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014
 
Going to the SP2013 Cloud - what does a business need to make it successful?
Going to the SP2013 Cloud - what does a business need to make it successful?Going to the SP2013 Cloud - what does a business need to make it successful?
Going to the SP2013 Cloud - what does a business need to make it successful?
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategy
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 
Toigo Critical Convergence
Toigo  Critical ConvergenceToigo  Critical Convergence
Toigo Critical Convergence
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Tamr | Making enterprise elephants dance @ boston data festival

  • 1. making enterprise elephants dance (gangnam style) Andy Palmer, Co-Founder & CEO Tamr
  • 2. Career is a mash-up of: start-ups + enterprise customer + vendor data + application technical + business
  • 3. The View from 30,000 Feet … ok - from low earth orbit The time has come to manage information across the enterprise for strategic benefit. Be the “Googler” of your enterprise
  • 4. Simply put : manage a company’s information as an asset - at least as well as Google tries to manage the world’s information as an asset Assume your information assets are as diverse as the modern web - but not the same - data matters more than documents. What does this mean? However… MOST OF US ARE NOT GOOGLE in the level of quality and quantity of engineering resources Google makes it look easy sometimes because they have much of the best talent in the world Data Silos are a primary bottleneck
  • 5. Viz tools are democratizing analysis - D3.org, Tableau, Spotfire, etc “Big Data Mania” represents an opportunity to re-architect for flexibility + agility Monolithic, hard-coded warehouses & ETL constrain experimentation, collaboration and agility Entities do not have perfect definitions - don’t try to force it... Static schemas/data structures are great for collection but have “drag coefficient” for analytics Embrace data variety as a reality - leave the monolithic vendors pretending they can lock us in Semantic approaches allow access to diverse data and agile integration to solve specific questions Data marts should be available “on demand” using tech “@ target” that suits the analytic “You can’t get there from here” : NOT Enterprise Data “Business as Usual”
  • 6. Part of the Answer: The 3 V’s? important but... …not enough...
  • 7. Try this … Start with the Questions, not the Answer - “Analytic context will set you free” ● Ask aspirational/transformational analytic questions ● Use them as context for defining all the work you do ● Build your infrastructure to answer the analytical questions In the process…. ● Get a broad and dynamic inventory of all your data ● Match workload to appropriate engine/tech ● Use Distributed Systems - radically lower cost vs. traditional ● Expect modern and dynamic visualization - iterative vs. reporting ● Treat Cloud as a first-order resource - not just ancillary ● Modern DevOps - core capability ● JSON sources will proliferate...embrace it ● Bottom-up data/metadata management ● Internal and external data - both valuable but not same
  • 8. Start with the Questions, not the Answer…. ….but sometimes it’s not simple…. ...embrace the ambiguity... Same but Different - Identity depends on the question: ● Gleevec, Glivec and Imatinib ● Same INCHI Key ● Formulation vs. Substance ● Product versus compound ● Regional naming difference ● Canonicalization depends on context InChI=1S/C29H31N7O/c1-21-5-10-25(18-27(21)34-29- 31-13-11-26(33-29)24-4-3-12-30-19-24)32-28(37)23-8- 6-22(7-9-23)20-36-16-14-35(2)15-17-36/h3-13,18-19H, 14-17,20H2,1-2H3,(H,32,37)(H,31,33,34)
  • 9. Pick a problem that is: - greenfield - well-defined - valuable DO NOT BOIL THE OCEAN
  • 10. Great Viz has never been more accessible
  • 11. Distributed Systems For data science at scale, we can’t afford to pay the “enterprise IT tax” Need to build an enterprise infrastructure as inexpensive, scalable and persistent as that of modern web companies Mindset: Put tight spending limits on storage and systems infrastructure … and it will take you toward a place similar to the modern internet consumer companies - this is a good place :) Facebook CIO talking about Vertica
  • 12. The Cloud A first-level citizen in the enterprise infrastructure Fact...not opinion: The world’s largest high- performance computing and persistence infrastructure is available for you to rent on-demand Let’s drop the hubris of on-prem enterprise data centers much like we don’t generate our own electricity anymore….
  • 13. DevOps DevOps matters as much for data as for software DevOps is to the Cloud as Systems Management was to Client-Server computing ● Couldn’t live without Systems Management then ● Can’t live without DevOps now Getting to scale (managing hundreds/thousands of machines) on demand requires automated tools and a modern DevOps infrastructure.
  • 14. JSON JSON is now a primary tool to access data Ultimate evolution of relational and object-oriented technologies coming together Provides a loose, flexible coupling between data access and applications Definition of flexibility: As long as it’s JSON, we don’t need to care what’s behind it
  • 15. Variety - how to tackle the enterprise data silo problem Standardization and Aggregation are necessary but not sufficient to solve the challenges of Enterprise Analytics 3.0
  • 16. Bottom-Up + Top Down Data Modeling & “Collaborative Curation” Time to embrace the reality of extreme data variety across the entire enterprise - “Unified Data” Requires a bottom-up, probabilistic approach to data curation and integration (compliment deterministic) ● mix of 80% probabilistic & 20% deterministic ● Tamr’s primary design pattern Back to the future: ● 1990’s web: probabilistic search and website connection ● 2020’s enterprise: probabilistic data source connection & curation
  • 17. Internal and External Data Internally and externally generated data are now BOTH important If our orgs are going to become truly data- driven, we have to embrace external data We need to get to the point that, a la Google, we don’t care where it comes from Google Maps, for example ● Seamless integration of internal Google and external data ● And Google just doesn’t care
  • 18. In Summary ● Manage your information as an asset ● Start with a broad inventory of all your data ● Embrace ambiguity/variety of enterprise data ● Throw the “one schema to rule them all” into the fires of Mordor… ● Embrace modern viz & iterative analytics ● Don’t ignore the Cloud - it’s inevitable ● DevOps is cool - and fun :) ● JSON is the future of data access - it’s ok ● True shared nothing distributed systems are the only way out of the “Enterprise IT Tax”