SlideShare una empresa de Scribd logo
1 de 121
Descargar para leer sin conexión
Foundations for Successful Data
Projects
Strata Data Conference, San Francisco 2019
Ted Malaska | @ted_malaska
Jonathan Seidman | @jseidman
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
About the presenters
▪ Capital One: Director of Enterprise Architecture
▪ Blizzard Ent: Director of Engineering of Global Insights
▪ Cloudera: Principal Solution Architect
▪ FINRA: Lead Architect
▪ Contributor: Apache Spark, Hadoop, Hive, Sqoop, Yarn, Flume, others
Ted Malaska
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
About the presenters
▪ Software Engineer at Cloudera
▪ Previously Technical Lead on the big data team at Orbitz
▪ Co-founder of the Chicago Hadoop User Group and Chicago Big Data
Jonathan Seidman
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Ted Malaska & Jonathan Seidman
Foundations
forArchitecting
Data Solutions
MANAGING SUCCESSFUL DATA PROJECTS
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Foundations of Successful Data Projects
Understand the problem Select software Manage risk Build effective teams Build maintainable architectures
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Agenda
▪ Understanding the key data project types
▪ Selecting data management solutions
▪ Building effective teams
▪ Managing risk in projects
▪ Ensuring data integrity
▪ Metadata management
▪ Using abstractions
Understanding the Key Data Project Types
Understanding the key data project types
● - Major Data Project Types
● - Primary Considerations & Risk Management
● - Team Makeup
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Major Data Project Types
▪ Data Pipelines and Data Staging
▪ Data Processing and Analysis
▪ Application Development
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Pipelines and Data Staging
▪ Sourcing Data
▪ Transmitting Data
▪ Staging Data
▪ Accessibility Options
▪ Discovery
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Processing and Analysis
▪ Curating Data
▪ Cultivating Ideas
▪ Data Product Generation
- Reports, Models, Insight, Charts, ...
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Application Development
▪ Traditional or Model Serving
▪ Inner Loop
▪ Outer Loop
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Primary Considerations
▪ Data Pipelines and Data Staging
▪ Data Processing and Analysis
▪ Application Development
Primary Considerations
Data Pipelines and Data Staging
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Pipelines and Data Staging – Considerations
▪ On boarding paths for Data Suppliers
- Files
- Embedded code
- APIs (Rest, WebSocket, GRPC, Syslog, ...)
- Agents
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Pipelines and Data Staging – Considerations
▪ Transmission
- At Least Once, Duplication, Latency, and Ordering
▪ Tokenization & Auditing & Governance
- GDPR, CA Protection Laws, Misuse, Data Breach
▪ Quality
- Schema Validation, Rules Validation, Carnality Verance
▪ Access
- Security, Matching the use case to the storage system
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Pipelines and Data Staging – Considerations
▪ Meta Management
- New and mutated Datasets
- Security
▪ Access
- Matching the use case to the storage system
- SQL is King
- No one tool
- Trade Offs
- Cost vs Time to Value vs Value of Data
Primary Considerations
Data Processing and Analysis
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Processing and Analysis – Considerations
▪ Curating Data
- Working with Producers
- Joining
- Time series
- CDC
▪ Undering Quality of Data
- SLAs
- Correctness of the Data
- Stability of the Data
- Coupling
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Processing and Analysis – Considerations
▪ Cultivating Ideas
- Defining Real Goals
- Evaluating ROI
▪ Productionization of Pipelines
- Service Reliability Engineering
▪ Culture
- ML vs AI vs Engineer
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Processing and Analysis – Considerations
▪ Understanding
- Explainable Outcomes
- Defendable Solutions
▪ Promotion Paths
- Deploying Products
- Historical Evaluation
- Up to Date Auditing
Primary Considerations
Application Development
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Application Development – Considerations
▪ Availability and Failure
- How will it fail
- How will failure impact customers
- What level of fail should be tested for
- Levels of failure design
▪ State Locality and Consistency
- What are the requirements
- Speed, cost, or truth
- Transactions and Locking
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Application Development
▪ Latency and Throughput
- Expectations and Throughput
- Is it really big data?
- Inner and Outer Looping
▪ Granularity of Deployments
- Monolith single deployment
- Monolith microservices
▪ Culture
- Development Towers
- Over the wall
- Development Granularity
Team Makeup
Team Makeup
● - Data Pipelines and Data Staging
● - Data Processing and Analysis
● - Application Development
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Pipelines and Data Staging – Team Makeup
▪ Data Engineers
▪ Site Reliability Engineers (SRE)
▪ Application Engineers
▪ Data Architects
▪ Governance
▪ Solution Engineers/Architects
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Processing and Analysis – Team Makeup
▪ Visionaries
▪ The Brains
▪ Problem Seekers
▪ Engineers
▪ Duct Tapers
▪ Tech Debt Payers
▪ Site Reliability Engineers (SRE)
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Application Development – Team Makeup
▪ Web Developers
▪ Front end Developers
▪ Data Engineers (DBAs)
▪ Performance Focused Engineers
▪ SOA / Queue Engineers
▪ Site Reliability Engineers (SRE)
Evaluating and Selecting Data Solutions
Evaluating and Selecting Data Solutions
● - Solution Life Cycles
● - Tipping Point Considerations
● - Considerations for Technology Selection
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Solution Life Cycles
▪ Private Incubation Stage
▪ Release Stage
▪ “Curing Cancer” Stage
▪ Broken Promises Stage
▪ Hardening Stage
▪ Enterprise Stage
▪ Decline and Slow Death Stage
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Private Incubation Stage
▪ Technology Trigger
▪ Vision
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Release Stage
▪ Changes
- Inviting People In
- Documentation
- Marketing
▪ Reasons for Releasing
- Money
- Hiring
- Culture
- Future Building
▪ Big Promises
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
“Curing Cancer” Stage
▪ Big Promise
▪ Maybe outside area of expertise
- Promise to push internally
- Promises to gain influence
- Promises to get attriations
▪ Promises can be good and bad
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Broken Promises Stage
▪ Cracks in the Dream
- Scale
- Usability
- Use Case
- Security
- Practicality
- Skill Requirements
- Auditability
- Maintainability
- Integration
- Quality
- Lies
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Hardening
▪ Balance Features
▪ Technical Debt
▪ Partnering
▪ Corp Partnerships
▪ Leadership Stories
▪ Easy Success Paths
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Enterprise Stage
▪ Stable
▪ Predictable
▪ Easy to hire for
▪ Supportable / Maintainable
▪ Pragmatists outnumber innovators
▪ No longer cool, but still very lucrative
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Slow Decline Stage
▪ Not Worth Retiring
▪ Not worth Investing In
▪ Good Enough
Tipping Point Considerations
- Mavericks
- Connectors
- Salesman
- Stickiness
- Context
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mavericks
▪ Passion Driven
▪ Helpful
▪ Bottom Up Power
▪ They see the future or may see shadows
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Connectors
▪ High triangles
▪ Trusted weak ties
▪ Gateways for pain, needs, and opportunities
▪ Considering the towered companies
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Sales Man
▪ Make the Deal Happen
▪ Right or wrong doesn’t matter as much as action
▪ Momentum starters
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Stickiness
▪ Think about gravity
- Data
- Code
- User’s Favor
- Results
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Context
▪ Where is the company
▪ Looking for Opportunities
- Holding down the fort
- Lower cost
- Play around
▪ The Swing Pendulum Effort
- Where is the ball now and where is it heading
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Tipping Point Considerations
- Mavericks
- Connectors
- Salesman
- Stickiness
- Context
Considerations for Technology Selection
● - Demand
● - Fit
● - Visibility
● - Risks
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Evaluating the Demand
▪ Business Needs
▪ Internal Demand
▪ Desire to live on the edge
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Evaluating the Fit
▪ Primary Capability
▪ Skill Sets
▪ Level of Commitments
▪ Level of Alignment
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Evaluating the Visibility
▪ Benchmarks
- Hidden biases, Motivated Biases, Unfair Comparisons
▪ Fundamentals
- There is no magic
▪ Leaders Success
▪ Market Trends
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reviewing Fundamentals
▪ Relative Location of Data to Readers
▪ Compression formats and rates
▪ Data Structures
▪ Partitioning, Replication, and Failure
▪ API and Interfaces
▪ Resource Allocations and Tuning
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reviewing Market Trends
Google Trends
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reviewing Market Trends
Github activity
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reviewing Market Trends
Jira Counts and Charts
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reviewing Market Trends
Conferences and meetups
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reviewing Market Trends
▪ Also:
- Community Interest
- Email Lists and Forums
- Contributors
- Follow the Money $$$
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Evaluating the Risks
▪ Risk Tolerance
▪ Stress Tolerance
▪ Leader vs follower
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Future Proofing
▪ Assume Change
▪ Interface Design
▪ Producer & Consumer Experience
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Assume Change
▪ Remember the Logic and Physical
▪ Think Logical and Implementation
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Interface Design
▪ Standards
▪ SQL
▪ DataFrames / DataSets
▪ REST, GRPC
▪ AVRO, Parquet, Protobuf, Thrift, JSON, CSV
Building Successful Teams
Lessons Learned Building Big Data Teams
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Build well rounded teams
Sysadmins Developers Analysts Data Scientists
Other roles:
Data Protection Officer Network/Systems
Engineers
Product Managers SRE
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
How to find people?
Start with people you already have, but make sure you invest in
training…
▪Linux, network, DBAs –> sysadmins
▪Developers –> developers
- Easy if you’re at a company like Orbitz, otherwise maybe not so much
▪Analysts –> analysts
▪It’s not an easy path though
- Set goals instead of micro-managing development
- Be prepared to iterate, don’t be afraid to fail
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Also don’t forget other teams
Communication is key
DBAs Other Project Teams
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Also, don’t do this:
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data
Scientists Admins
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Differing Skill Sets
Detail-
Oriented
Experimental The
Communicator
…
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Think beyond just skills
▪ Also look for complementary personalities
▪ And avoid toxic personalities
- But what if they’re really talented?
- See above.
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Customer Engagement
▪ Your teams should work closely with your customers, whether they’re external or
internal
Managing Project Risk
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Managing Risk
Shark photo: http://www.travelbag.co.uk/
1 in 11.5 million 1 in 4292
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Managing Risk – Risk Categories
Technology Risk Team Risk Requirements Risk
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Managing Risk – Categorizing Risk
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Managing Risk – Categorizing Risk
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Managing Risk – Categorizing Risk
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Risk Weighting
▪ Technology Risk
- How much experience do we have with this technology?
- Do we have production experience with the technology?
- We know SQL, but what about Cassandra CQL?
- …
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Risk Weighting
▪ Team Risk
- Experience level of team members
- Team skill sets
- Size of team
- …
▪ Don’t forget about other teams
- System dependencies
- …
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Risk Weighting
▪ Requirements Risk
- Vaguely defined requirements
- Novel requirements (e.g. stringent latency requirements)
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Managing Risk – Categorizing Risk
▪ Cassandra
- Limited technical experience (team risk)
- Need to validate data model (reqs risk)
- Stringent uptime requirements (tech risk)
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mitigating Risk
▪ Requirements Risk
- Ensure good functional requirements
- Break requirements up – don’t boil the ocean
- Share requirements and get buy-in from all stakeholders
- Get agreement on scope
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mitigating Risk
▪ Technology Risk
- Tackle important/complex components first
- Use external resources to help fill knowledge gaps
- Consider replacing riskier technologies with more familiar ones
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mitigating Risk
▪ Technology Risk
- Use proofs of concept
- Than throw them away
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mitigating Risk
▪ Technology Risk
- Use abstractions to minimize dependencies
- Ensure repeatable build, deployment, monitoring processes
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mitigating Risk
▪ Technology Risk
- Start building early
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Mitigating Risk
▪ Team Risk
- Build well rounded teams
- Ensure communication with other teams
- But work to reduce coupling
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Communicating Risk
▪ Make sure stakeholders are aware of risks
- But remember there can be risks to overstating risk
▪ Collaborate and get buy-in
▪ Share risk
▪ Risk can be a negotiation tool
Ensuring Data Integrity
Ensuring Data Integrity
● - Pre-defined vs Derived via Discovery
● - Path of Fidelity
● - Validation of Quality
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Pre-Defined vs Derived via Discovery
▪ Producer - Productivity vs Audit
▪ Consumer - Consistency
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Producer - Productivity vs Audit
Red Tape Predefined
Flexible/Limited Predefined
After the Fact Discovery
EasytoAudit
Short-TermProductivity
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Predefined Traps
▪ Centralized Reviewing Org
▪ High bar to on board
▪ Unclear schema evolution paths
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Discovery Traps
▪ Uncommon output
▪ Data quality standards
▪ Uncommon SLAs
▪ The balloon problem
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Consumers Point of View
▪ Consistency is Key
▪ Access to Powerful Tools
▪ Multiple Landing Areas is Key
- Long Term
- Indexed
- Lucene Indexed
- Streams
▪ Future Proofing
Path of Fidelity
● - What is Fidelity
● - What can we mutate
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
What is Full Fidelity
▪ The cells and their values are preserved
▪ Field names and definitions are preserved
▪ No matter where or how you access the data
▪ No Filtering
▪ No Irreversible Mutations
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
What can we mutate
▪ Tokenization
▪ Underlining files structions
▪ Storage system
▪ Access Path
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Validate Quality
▪ Validation of Fidelity
▪ Validation of Quality
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Validation of Fidelity
▪ Row Counts
▪ Check Sums
▪ Reversible byte by byte check
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Validation of Quality
▪ Column level rules
▪ Null counts
▪ Field carnality
▪ Record counts
Metadata Management
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Metadata Management
▪ What do we mean?
- Understanding what data you have
- Knowing what the data is
- Knowing where the data is
▪ This is complex
- Large number of data sources, storage systems, processing…
- Ease of data access and creation of new data sets
- Start planning at the beginning of your project!
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Why Do We Care?
▪ Visibility – know what data you collect and how to access it
- Faster time to market
- Avoid duplication of work
- Derive more value from data
- Identify gaps
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Why Do We Care?
▪ Relationships
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Why Do We Care?
Regulations
GDPR, etc.
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Types of Metadata
▪ Data at rest
▪ Data in motion
▪ Source data
▪ Data processing
▪ Reports, dashboards, etc.
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data At Rest
▪ Files, database tables, Lucene indexes, etc.
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data At Rest – Database Table Example
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data At Rest – Other Metadata Types
Audit Logs
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data At Rest – Other Metadata Types
Comments
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data At Rest – Other Metadata Types
Tags
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data At Rest – Other Metadata Types
Lineage
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data In Motion
▪ This is data that’s moving through the system
- Batch or streaming ingestion
- Data processing
- Derived data
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data in Motion – What to Capture
▪ Paths
▪ Sources
▪ Transformations
▪ Destinations
▪ Reports/Dashboards
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data in Motion – Paths
▪ How does the data move through the system?
- Source systems
- Data collection systems
- Routing
- Transformations
- Etc.
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Source Data
▪ External systems
▪ Internal systems
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data In Motion – Transformations
▪ Data format changes, for example JSON to protocol buffers
▪ Data fidelity – is the data filtered or changed?
▪ Metadata about processing – job names, technologies, inputs, outputs, etc.
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Data Processing – Machine Learning
▪ More complex algorithms can require special considerations
- Purpose of a model
- Technologies, algorithms, etc.
- Features
- Datasets – training, test, etc.
- Goals of the model
- Who owns the model?
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Reports and Dashboards
▪ Data sources
▪ Any data transformations
▪ Information on the report’s creator
▪ Log of modifications
▪ Purpose of report
▪ Tags
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
Approaches to Metadata Collection
▪ Declarative
- Require and enable metadata to be created as data is added to the system
▪ Discovery
- After the fact cataloging of data
tiny.cloudera.com/sf2019questions
tiny.cloudera.com/sf2019slides
How?
▪ Create your own solution
▪ Use tools provided by your vendor
▪ Use third party tools
Thank you!
Ted Malaska | @ted_malaska
Jonathan Seidman | @jseidman

Más contenido relacionado

La actualidad más candente

Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
shuwutong
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24
Martin Bém
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 

La actualidad más candente (15)

KScope14 - Real-Time Data Warehouse Upgrade - Success Stories
KScope14 - Real-Time Data Warehouse Upgrade - Success StoriesKScope14 - Real-Time Data Warehouse Upgrade - Success Stories
KScope14 - Real-Time Data Warehouse Upgrade - Success Stories
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
DATA FORUM MICROPOLE 2015 - Forrester - Data Gouvernance Valuation
 DATA FORUM MICROPOLE 2015 - Forrester - Data Gouvernance Valuation DATA FORUM MICROPOLE 2015 - Forrester - Data Gouvernance Valuation
DATA FORUM MICROPOLE 2015 - Forrester - Data Gouvernance Valuation
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
 
DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting
 
Solving Content Chaos
Solving Content ChaosSolving Content Chaos
Solving Content Chaos
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Best Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyBest Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse Quickly
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 

Similar a Foundations strata sf-2019_final

Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 
Linthicum state of-the-art-cloud-platforms
Linthicum state of-the-art-cloud-platformsLinthicum state of-the-art-cloud-platforms
Linthicum state of-the-art-cloud-platforms
David Linthicum
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
ERwin Modeling
 

Similar a Foundations strata sf-2019_final (20)

Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018
 
How to Empower a Platform With a Data Pipeline At a Scale
How to Empower a Platform With a Data Pipeline At a ScaleHow to Empower a Platform With a Data Pipeline At a Scale
How to Empower a Platform With a Data Pipeline At a Scale
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
Hadoop Migration Partner Videov2.pptx
Hadoop Migration Partner Videov2.pptxHadoop Migration Partner Videov2.pptx
Hadoop Migration Partner Videov2.pptx
 
Building a Cross Cloud Data Protection Engine
Building a Cross Cloud Data Protection EngineBuilding a Cross Cloud Data Protection Engine
Building a Cross Cloud Data Protection Engine
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
Dale M. Forrester's Resume... Jan 2016
Dale M. Forrester's Resume... Jan 2016Dale M. Forrester's Resume... Jan 2016
Dale M. Forrester's Resume... Jan 2016
 
Dale M. Forrester's Resume... Jan 2016
Dale M. Forrester's Resume... Jan 2016Dale M. Forrester's Resume... Jan 2016
Dale M. Forrester's Resume... Jan 2016
 
Linthicum state of-the-art-cloud-platforms
Linthicum state of-the-art-cloud-platformsLinthicum state of-the-art-cloud-platforms
Linthicum state of-the-art-cloud-platforms
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Ready, Set, SD-WAN: Best Practices for Assuring Branch Readiness
Ready, Set, SD-WAN: Best Practices for Assuring Branch ReadinessReady, Set, SD-WAN: Best Practices for Assuring Branch Readiness
Ready, Set, SD-WAN: Best Practices for Assuring Branch Readiness
 
Building Business Success from Buzz Words
Building Business Success from Buzz WordsBuilding Business Success from Buzz Words
Building Business Success from Buzz Words
 
IDERA Live | Databases Don't Build and Populate Themselves
IDERA Live | Databases Don't Build and Populate ThemselvesIDERA Live | Databases Don't Build and Populate Themselves
IDERA Live | Databases Don't Build and Populate Themselves
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 

Más de Jonathan Seidman

Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Jonathan Seidman
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Jonathan Seidman
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010
Jonathan Seidman
 

Más de Jonathan Seidman (13)

Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Extending the Data Warehouse with Hadoop - Hadoop world 2011
Extending the Data Warehouse with Hadoop - Hadoop world 2011Extending the Data Warehouse with Hadoop - Hadoop world 2011
Extending the Data Warehouse with Hadoop - Hadoop world 2011
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 

Último (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Foundations strata sf-2019_final