SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
TECHNICAL OVERVIEW:
Pivotal Big Data Suite
Les Klein
Field CTO Data
Pivotal
@LesKlein #PivotalForum #Dubai #DigitalTransformation
Forward Looking Statements
This presentation contains “forward-looking statements” as defined under the Federal Securities Laws. Actual results could differ materially
from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in
general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of
product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including
but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in
VMware’s Inc.’s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of
customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth
of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and
achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory;
(xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and networks; (xiii) our ability to protect
our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed previously and
from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S. Securities and Exchange Commission. EMC
and Pivotal disclaim any obligation to update any such forward-looking statements after the date of this release.
4© 2016 Pivotal Software, Inc. All rights reserved.
Pivotal Big Data Suite
Complete
platform
Hadoop Native
SQL
Deployment
options
Based on open
source
Flexible
licensing
Advanced data
services
PIVOTAL GREENPLUM
DATABASE
Data warehouse database
based on open source
Greenplum Database
PIVOTAL HDB
Open source analytical
database for Apache
Hadoop based on Apache
HAWQ
PIVOTAL GEMFIRE
Open source application
and transaction data grid
based on Apache Geode
Pivotal Big Data Suite
Open source data management portfolio
Great software companies leverage Big Data
to fundamentally change the consumer
experience and pioneer entirely new business
models
6© 2016 Pivotal Software, Inc. All rights reserved.
$4BN
Financial Services
$26BN
Hospitality
$50BN
Transportation
$54BN
Entertainment
$30BN
Automotive
$3.2BN
Industrial Products
CLOUD NATIVE SOFTWARE IS CHANGING INDUSTRIES
Data is Fueling Software
7© Copyright 2015 Pivotal. All rights reserved.
Hundreds of
thousands of “trip”
events each day
400+ billion of
viewing-related
events per day
Five billion
training data
points for Price
Tip feature
Disruptors Use a LOT of Data
8© Copyright 2015 Pivotal. All rights reserved.
“We’ve found that when a
host selects a price that’s
within 5% of their tip,
they’re nearly 4 times
more likely to get booked”
“The importance of accuracy and
efficiency […], will continue to
rise as we expand and improve
products like uberPOOL and
beyond.”
“Over 75% of what
people watch come
from our
recommendations”
Data manifests as features in an app
9© Copyright 2015 Pivotal. All rights reserved.
(Data)
Microservices
Loosely coupled
services architecture,
bounded by context
Cloud-Native
Platforms
Enabling continuous
delivery & automated
operations
Open Source
Database
Innovation
Extreme scale &
performance advantages,
built for the cloud
Machine
Learning
Use of predictive
analytics to build
smart apps
How are they accomplishing this?
10© Copyright 2015 Pivotal. All rights reserved.
These companies…
Release new features in minutes, multiple times a day
Support a micro-services architecture
Consume a wide range of data sources and protocols
Store and Analyze all their data
Update algorithms and predictive models daily
Continuously ask lots of questions of their data
Modify data pipelines and add processing steps daily
11© 2016 Pivotal Software, Inc. All rights reserved.
…but most enterprises are not quite there yet
11
Applications
scalability
limited by databases
Real-time data insights limited
by disconnected OLTP and
OLAP systems
Data services are not
ready for
cloud platforms
App 2
App 1 App 3
BottleneckBottleneck
Transactional
Database
App
App
App
Transactional
Database
ETL / ELT
Batches
Δt
TRANSACTIONS ANALYTICS
Analytic
Database
Continuous
Delivery
12© 2016 Pivotal Software, Inc. All rights reserved.
Stream + Batch Processing
Programming + Operating Model
Cloud-Native Platform
Microservices FrameworkPlatform Runtime
Hadoop
DW
Spark
Microservices and Polyglot Persistence
IMDG
K/V Store
Relational DB
Big Data &
Machine Learning
Modern Cloud-Native Data Architecture
Cloud Infrastructure
13© 2016 Pivotal Software, Inc. All rights reserved.
New pressures are breaking fragile systems
13
Applications
scalability
limited by databases
Real-time data insights limited
by disconnected OLTP and
OLAP systems
Data services are not
ready for
cloud platforms
App 2
App 1 App 3
BottleneckBottleneck
Transactional
Database
App
App
App
Transactional
Database
ETL / ELT
Batches
Δt
TRANSACTIONS ANALYTICS
Analytic
Database
Continuous
Delivery
14© 2016 Pivotal Software, Inc. All rights reserved.
Apps scalability limited by scalability of databases
14
DB scalability limitations are aggravated by additional devices, clients and apps
App 2
App 1
App 3
Existing
Applications
New devices
And clients
New cloud native
scalable data apps
App 2
App 1 App 3
BottleneckBottleneck
Transactional
Database
Scale-out applications vs Scale-up databases
15© 2016 Pivotal Software, Inc. All rights reserved.
GemFire:
15
Cloud-scale high performance transactional data
• Horizontally scalable
• Ultra fast, low-latency in-memory
transactions
• Fully configurable data consistency
• Reliable eventing and notification model
• Highly Available, auto-healing
• Inter-cluster WAN replication
Custom AppsCustom Apps
App 1App 1App 1App 1App 1App 1
App 2App 2App 2App 2App 2App 2 Push Updates
Transactional
Native API
Rest / HTTP
Transactional
Native API
Rest / HTTP
Pivotal GemFire
16© 2016 Pivotal Software, Inc. All rights reserved.
Batch-mode latency prevents real-time analysis
16
Applications
scalability
limited by databases
Real-time data insights limited
by disconnected OLTP and
OLAP systems
Data services are not
ready for
cloud platforms
App 2
App 1 App 3
BottleneckBottleneck
Transactional
Database
App
App
App
Transactional
Database
ETL / ELT
Batches
Δt
TRANSACTIONS ANALYTICS
Analytic
Database
Continuous
Delivery
17© 2016 Pivotal Software, Inc. All rights reserved.
Data TemperatureHot Hot
Real-time data analytics is limited by data integration batches
17
Overnight ETL / ELT jobs expose data that is already outdated
App 1 App 3
App 2
Transactional
Database
ETL / ELT
Batches
Δt
TRANSACTIONS ANALYTICS
• Analytical processes don’t
have access to the latest
data
• ETL/ELT processes
are expensive and hard to
maintain
• Batch process windows limits
data scalability
MPP
Cold
18© 2016 Pivotal Software, Inc. All rights reserved.
Operationalized data insights need an event-driven architecture
18
Combination of SQL Analytics and NoSQL event-driven transactions is needed
App 1 App 3
App 2
Transactional
Database
TRANSACTIONS ANALYTICS
• Data Insights must be
immediately pushed to
applications
• Apps should be able to react
in real-time to analytical
findings
MPP
Machine Learning
Advanced Analytics
ANSI SQL
APIs /
NoSQL
Data Insights
19© 2016 Pivotal Software, Inc. All rights reserved.
DataTemperatureWarmHot
GemFire and GPDB - Big Data meets Fast Data
19
Custom AppsCustom Apps
App 1App 1App 1App 1App 1App 1
App 2App 2App 2App 2App 2App 2
Pivotal GemFire
Data science,
analytics & ML
Data science,
analytics & ML
Transactional
Native API
Rest / HTTP
Transactional
Native API
Rest / HTTP
Analytical
ANSI SQL
Analytical
ANSI SQL
Push
Updates
Pivotal Greenplum
Parallel Configurable
Data Load
Parallel Configurable
Data Load
Transactional
data
Write behind
Analytical
Data
to cache
20© 2016 Pivotal Software, Inc. All rights reserved.
Example Use-Case - Predictive Maintenance
20
Evaluates live data
“According to historical trends, there’s an
80% chance this equipment would fail in
the next 12 hours”
Learns with historical trends
"How were the temperature and vibration
sensors reading when the latest failures
happened?"
Live data becomes
historical over time
Sensor data
Historical
Real-time
Take action
Smart
system
21© 2016 Pivotal Software, Inc. All rights reserved.
Reference Architecture
21
Sensors
Pivotal GemFire
Pivotal Greenplum Database
• Extensible
• Open-Source
• Fault-Tolerant
• Horizontally Scalable
Update with
findings
Client application
Machine learning
model
Evaluate
Train
Real-time
data streams
Spring Cloud Data Flow
22© 2016 Pivotal Software, Inc. All rights reserved.
…but most enterprises are not quite there yet
22
Applications
scalability
limited by databases
Real-time data insights limited
by disconnected OLTP and
OLAP systems
Data services are not
ready for
cloud platforms
App 2
App 1 App 3
BottleneckBottleneck
Transactional
Database
App
App
App
Transactional
Database
ETL / ELT
Batches
Δt
TRANSACTIONS ANALYTICS
Analytic
Database
Continuous
Delivery
23© 2016 Pivotal Software, Inc. All rights reserved.
The Cloud Native Transformation
23
Enabling continuous delivery on the cloud
Cloud Native PlatformsCloud Native Platforms
MicroservicesMicroservices DevOpsDevOps
DevelopersDevelopers OperatorsOperators
Continuous Delivery
24© 2016 Pivotal Software, Inc. All rights reserved.
Cloud Native apps are better suitable for NoSQL
24
Enabling fast and scalable event-driven data services
Unidirectional, request-response SQLUnidirectional, request-response SQL Bidirectional, event-driven APIsBidirectional, event-driven APIs
Monolithic apps needed complex schema-
based, SQL databases
Micro-services need much simpler schemas,
but much better scalability
SQL
API
API
API
25© 2016 Pivotal Software, Inc. All rights reserved.
Data Services must be integrated with your cloud platform
25
Enabling Agile Ops and DevOps
ApplicationsApplications ServicesServices
26© 2016 Pivotal Software, Inc. All rights reserved.
Pivot
al
Clou
d
Foun
dry
Pivot
al
Clou
d
Foun
dry
GemFire for Pivotal Cloud Foundry
26
Lightning fast in-memory persistence for cloud native apps
• One-click provisioning
• Pre-packaged configuration
• Embedded monitoring by Pulse
• Auto application binding
• Multi-cloud support
• Reliable data replication between PCF
sites
Pivotal GemFire
Click to
Deploy
Click to
Deploy
27© 2016 Pivotal Software, Inc. All rights reserved.
Cloud-ready, infra-structure
agnostic
Cloud-ready, infra-structure
agnostic
Next-generation databases must keep up to cloud native
apps
27
Can your database do all of this? GemFire IMDG DOES.
Horizontal ScalabilityHorizontal Scalability Automatic fail-overAutomatic fail-over Reliable eventing modelReliable eventing model
Multi-site High AvailabilityMulti-site High Availability
Seamless integration to
analytical databases
Seamless integration to
analytical databases
App 1 App 3App 2
28© 2016 Pivotal Software, Inc. All rights reserved.
Commercial distribution
• Enterprise Support Subscription
• Managed service on Pivotal Cloud Foundry
• Indemnification
Commercial distribution
• Enterprise Support Subscription
• Managed service on Pivotal Cloud Foundry
• Indemnification
Apache License
• Replication and partitioning
• Shared-nothing persistence
• High performance transactions
• OQL and Indexes
• Distributed functions
• Continuous query and event subscription
• Clustering and high availability
• Configurable WAN gateway
Apache License
• Replication and partitioning
• Shared-nothing persistence
• High performance transactions
• OQL and Indexes
• Distributed functions
• Continuous query and event subscription
• Clustering and high availability
• Configurable WAN gateway
Based on open source: Apache Geode (incubating)
GemFire
29© 2016 Pivotal Software, Inc. All rights reserved.
Pivotal Greenplum
World’s First Open Source Massively Parallel Data Warehouse
30© 2016 Pivotal Software, Inc. All rights reserved.
• Relational database system for big data and data warehousing
•
• Mission critical & system of record product with supporting tools and ecosystem
•
• Fully open source with a global community of developers and users
•
• Large industrial focused system
•
• PostgreSQL based
•
• Multi-platform technology
Greenplum Database Mission & Strategy
31© 2016 Pivotal Software, Inc. All rights reserved.
•Government
– Tax & benefits fraud detection
– Economic statistics research
•Financial Services
– Wealth management data science and product
development for Commercial Banking
– Risk and trade repositories reporting
– 401K providers analytics on investment choices
•Pharmaceutical
– Vaccine potency prediction based on
manufacturing sensors
•IoT
– Predictive maintenance for auto manufacturer,
industrial equipment and government agencies
– Semiconductor Fab sensor analytics and reporting
Highlighted Greenplum Successes
•Cyber Security & Surveillance
– Internal email and communication surveillance and
reporting
– Corporate network anomalous behavior and
intrusion detections
•Oil & Gas
– Drilling equipment predictive maintenance
•Communications
– Mobile telephone company enterprise data
warehouse
– Network performance and availability analytics
•Retail
– Customer purchases analytics
•Transportation
– Airlines loyalty program analytics
32© 2016 Pivotal Software, Inc. All rights reserved.
POLYMORPHIC
STORAGE
HEAP, Append Only,
Columnar, External,
Compression
MULTI-VERSION
CONCURRENCY
CONTROL (MVCC)
Greenplum Overview Greenplum
DB
SYSTEM
ACCESS
DATA
PROCESSING
DATA
STORAGE
CLIENT ACCESS
PSQL, ODBC, JDBC
BULK LOAD/UNLOAD
GPLoad, GPFdist,
External Tables, GPHDFS
ADMIN TOOLS
GP Perfmon, GP Support
3rd
PARTY TOOLS
Compatible with Industry
Standard BI & ETL Tools
SQL
STANDARD
COMPLIANCE
MASSIVELY
PARALLEL
PROCESSING (MPP)
IN-DATABASE
PROGRAMMING
LANGUAGES
PL/pgSQL, PL/Python,
PL/R, PL/Perl, PL/Java,
PL/C
IN-DATABASE
ANALYTICS &
EXTENSIONS
MADlib, PostGIS,
PGCrypto
FULLY ACID
COMPLIANT
TRANSACTIONAL
DATABASE
INDEXES
B-Tree, Bitmap,
GiST
BIG DATA
QUERY
OPTIMIZER
33© 2016 Pivotal Software, Inc. All rights reserved.
• An ambitious project
• 10+ years in the making
• Investment of hundred of millions of dollars
• Potential to define a new market and disrupt traditional EDW vendors
• www.greenplum.org
• Github code
• mailing lists / community engagement
• Global project w/ external contributors
• Pivotal Greenplum
• Enterprise software distribution & release management
• Pivotal expertise
Greenplum Open Source
34© 2016 Pivotal Software, Inc. All rights reserved.
PostgreSQL Heritage
Greenplum Open
Source Launch
• Widely used
• Open Source
• PostgreSQL License
• Enterprise class open source relational engine
35© 2016 Pivotal Software, Inc. All rights reserved.
MPP Shared Nothing Architecture
Flexible framework for processing large datasets
…
Master
Host
SQL
Master Host and Standby Master Host
Master coordinates work with Segment
Hosts
Segment Host with one or more
Segment Instances
Segment Instances process queries
in parallel
Segment Hosts have their own CPU,
disk and memory (shared nothing)
High speed interconnect for continuous
pipelining of data processing
Interconnect
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node1
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node2
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node3
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
nodeN
Greenplum
DB
36© 2016 Pivotal Software, Inc. All rights reserved.
Greenplum
DB
External
Sources
Loading, streaming,
etc.
Network
Interconnect
... ...
......
Master
Servers
Query planning &
dispatch
Segment
Servers
Query processing &
data storage
SQL
ETL
File
Systems
Fast Parallel Load & Unload
No Master Node bottleneck
10+ TB/Hour per Rack
Linear scalability
Low Latency
Data immediately available
No intermediate stores
No data “reorganization”
Load/Unload To & From:
File Systems
Any ETL Product
Hadoop & Amazon S3
Loading: Massively-Parallel Ingest
Extreme speed and immediate usability from files, ETL, Hadoop & S3
37© 2016 Pivotal Software, Inc. All rights reserved.
Pivotal Query Optimizer
Turns a SQL query into an execution plan
Greenplum
DB
• First Cost Based Optimizer for BIG data
• Applies broad set of optimization strategies at once
– Considers many more plan alternatives
– Optimizes a wider range of queries
– Optimizes memory usage
• New Extensible Code Base
– Rapid adoption of emerging technologies
• Significant improvements for demanding queries
38© 2016 Pivotal Software, Inc. All rights reserved.
Rule based query management to monitor and manage queries and resource queues
•Monitors Greenplum Database queries and host utilization statistics
•Logs when a query exceeds a threshold
•Throttles the CPU usage of a query when it exceeds a threshold
•Terminates a query
•Detects memory, CPU, or disk I/O skew occurring during the execution of a query
•Creates detailed rules to manage queries
•Adds, modified, or deletes Greenplum Database resource queues
Greenplum Workload Manager
39© 2016 Pivotal Software, Inc. All rights reserved.
Polymorphic Storage™
User Definable Storage Layout
 Columnar storage compresses better
 Optimized for retrieving a subset of
the columns when querying
 Compression can be set differently
per column: gzip (1-9), quicklz, delta,
RLE
 Row oriented faster when returning
all columns
 HEAP for many updates and deletes
 Use indexes for drill through queries
TABLE ‘SALES’
Jun
Column-orientedRow-oriented
Oct Year
-1
Year
-2
External HDFS or S3
 Less accessed partitions
on external and
seamlessly query all data
 All major Hadoop
distributions
 Amazon S3 storage
 Others in development
Nov DecJul Aug Sep
40© 2016 Pivotal Software, Inc. All rights reserved.
Parent table
Feb 2014
RETExternal
Dec 2014Jan2013 Jan 2014
Partitions and External Partitions
...
• Hash Distribution to evenly spread data across all segment instances
• Range Partition within a segment instance to minimize scan work
• Partitioned Tables Support for External Tables as a Partition
– Readable external table
– Host file system, NFS mount, HDFS or Amazon S3
Greenplum
DB
41© 2016 Pivotal Software, Inc. All rights reserved.
Hybrid Queries: Pivotal External Tables
4.3
• Readable Ext-Table MVP
• Readable Gzip Files
• Writable Ext-Table
• Investigation: Enhanced Security/Roles
• Investigation: Additional File Formats
S3 External Tables
Gemfire External Tables
• Hi Speed Ingestion
• Hi Concurrency Query Cache
GPHDFS
Roadmap
42© 2016 Pivotal Software, Inc. All rights reserved.
Greenplum Database Features for Data Scientists
•
Window functions: Perform
calculations across a set of table rows
that are somehow related to the
current row
•
Analytics extensions: In-database
machine learning at scale using
MADlib
•
Procedural language extensions:
Extended functionality using non-SQL
programming languages and packages
(e.g. Python and R)
• Client Access: ODBC and JDBC
access to support connections to 3rd
party tools * Only a subset of Greenplum Database features
43© 2016 Pivotal Software, Inc. All rights reserved.
Procedural Languages
• User Defined Types
• User Defined Functions
• User Defined Aggregates
• Import of libraries from open source
44© 2016 Pivotal Software, Inc. All rights reserved.
Scalable, In-Database
Machine Learning
• Open source https://github.com/apache/incubator-madlib
• Downloads and docs http://madlib.incubator.apache.org/
• Wiki https://cwiki.apache.org/confluence/display/MADLIB/
45© 2016 Pivotal Software, Inc. All rights reserved.
Functions
Linear Systems
• Sparse and Dense Solvers
• Linear Algebra
Matrix Factorization
• Singular Value Decomposition (SVD)
• Low Rank
Generalized Linear Models
• Linear Regression
• Logistic Regression
• Multinomial Logistic Regression
• Ordinal Regression
• Cox Proportional Hazards Regression
• Elastic Net Regularization
• Robust Variance (Huber-White),
Clustered Variance, Marginal Effects
Other Machine Learning Algorithms
• Principal Component Analysis (PCA)
• Association Rules (Apriori)
• Topic Modeling (Parallel LDA)
• Decision Trees
• Random Forest
• Support Vector Machines
• Conditional Random Field (CRF)
• Clustering (K-means)
• Cross Validation
• Naïve Bayes
• Support Vector Machines (SVM)
Descriptive Statistics
Sketch-Based Estimators
• CountMin (Cormode-Muth.)
• FM (Flajolet-Martin)
• MFV (Most Frequent Values)
Correlation and Covariance
Summary
Utility Modules
Array and Matrix Operations
Sparse Vectors
Random Sampling
Probability Functions
Data Preparation
PMML Export
Conjugate Gradient
Stemming
Inferential Statistics
Hypothesis Tests
Time Series
• ARIMA
April 2016
Path Functions
• Operations on Pattern Matches
46© 2016 Pivotal Software, Inc. All rights reserved.
GPDB Geospatial
Current Key Features:
• Points, Lines, Polygons,
Perimeter, Area, Intersection,
Contains, Distance, Long/Lat,
Spatial Indexes & Bounding Boxes
Round earth calculations
Ability to store
geospatial data and
query with with joins and
operators
Raster Image
Processing
47© 2016 Pivotal Software, Inc. All rights reserved.
Pivotal HDB
Hadoop Native SQL Database
48© 2016 Pivotal Software, Inc. All rights reserved.
49© 2016 Pivotal Software, Inc. All rights reserved.
Enabling data science and machine learning at scale
Making the Hadoop Data Lake More Consumable
2) Data scientists still have to resort
to sampling if they can't run
analytics in-database at scale
3) There are multiple data sets
and formats within Hadoop
SQL App
BUSINESS ANALYSTS DATA SCIENTISTS
DATA LAKE
DATA LAKE
Hive, HBase, etc.
DATA LAKE
1) Important people and tools
are cut-off because of SQL
completeness or
performance.
50© 2016 Pivotal Software, Inc. All rights reserved.
As the lingua franca of analytics, SQL can't be ignored. Neither can performance.
Making the Hadoop Data Lake More Consumable
2) Data scientists still have to resort
to sampling if they can't run
analytics in-database at scale
3) There are multiple data sets
and formats within Hadoop
SQL App
BUSINESS ANALYSTS DATA SCIENTISTS
DATA LAKE
DATA LAKE
Hive, HBase, etc.
DATA LAKE
1) Important people and tools are
cut-off because of SQL
completeness or performance.
51© 2016 Pivotal Software, Inc. All rights reserved.
Lack of interactive, ANSI SQL capabilities inhibits adoption and value
Hadoop data lakes sit underutilized
Producing complex queries, large
joins, interactive queries
Existing investments in
visualization and BI tools
Large population of users
with SQL skills
DATA LAKE
DATA SCIENTISTS
BUSINESS ANALYSTS
SQL App
52© 2016 Pivotal Software, Inc. All rights reserved.
High performance, interactive SQL queries on Hadoop
HDB: The Hadoop Native SQL Database
● Highly efficient MPP
(massively parallel processing)
● Low-latency
● Petabyte scalability
● ACID transaction support
● SQL-92, 99, 2003 compatibility
● Advanced cost-based optimizer
DATA LAKE
SQL App
BUSINESS ANALYSTS
DATA SCIENTISTS
53© 2016 Pivotal Software, Inc. All rights reserved.
Integrate SQL and data science tools into an interactive, operationalized environment
Making the Hadoop Data Lake More Consumable
2) Data scientists still have to resort
to sampling if they can't run
analytics in-database at scale
3) There are multiple data sets
and formats within Hadoop
SQL App
BUSINESS ANALYSTS DATA SCIENTISTS
DATA LAKE
DATA LAKE
Hive, HBase, etc.
DATA LAKE
1) Important people and tools are
cut-off because of SQL
completeness or performance.
54© 2016 Pivotal Software, Inc. All rights reserved.
Using traditional, single-node Python or R for analytics means using subsets because of the
lack of parallelization
Predictive analytics not scaling with Python or R
<...>
Implications
• Time-consuming data movement
• Working with small sample sizes
requires extra testing cycles
against larger data sets
• Slow feature generation limits
algorithm development
DATA LAKE
DATA LAKE
DATA LAKE
SAMPLE 1
SAMPLE 2
SAMPLE n
55© 2016 Pivotal Software, Inc. All rights reserved.
ApacheTM
MADlib®
(incubating) is an open-source library for scalable in-database analytics
In-database analytics speeds predictive modeling
Scale-out mathematical, statistical and
machine learning methods for structured
and unstructured data
• SQL-based
• Analyze without sampling
• Open source
• Runs on HDB, Greenplum, and
Postgres
• Compliments support for procedural
languages: PL/R, PL/Python, PL/Java
Train a
model...
Predict for new data...
DATA LAKE
56© 2016 Pivotal Software, Inc. All rights reserved.
Overcome complexity
Making the Hadoop Data Lake More Consumable
2) Data scientists still have to resort
to sampling if they can't run
analytics in-database at scale
3) There are multiple data sets
and formats within Hadoop
SQL App
BUSINESS ANALYSTS DATA SCIENTISTS
DATA LAKE
DATA LAKE
Hive, HBase, etc.
DATA LAKE
1) Important people and tools are
cut-off because of SQL
completeness or performance.
57© 2016 Pivotal Software, Inc. All rights reserved.
Schema Read
Data Read
Data
Read
DataRead
DataRead
DataRead
Data
R
ead
HDB’s Pivotal eXtension Framework (PXF) and HCatalog integration
Simplifying the data lake with data federation
• Enables connectivity between
Pivotal HDB and other stores
(Hive, HBase, HDFS files).
• Provides an extensible
framework to add support for
custom services
• Low latency on large data sets
• Considers cost model of
federated sources
HDFS DATA LAKE
HCatalog
CSV TXT Avro
Custom
Extensions
SQL Queries
58© 2016 Pivotal Software, Inc. All rights reserved.
INFORMATION
IN CONTEXT
List of customers at high risk of
churn, ranked by dollars at risk
Recommend savings goals with
portfolio projections
Optimize routine maintenance
schedule, based on predicted failures
Shopping recommendation,
based on recent purchases
Making the Hadoop Data Lake More Consumable
Delivering information in context
59© 2016 Pivotal Software, Inc. All rights reserved.
CUSTOMER
APP
Providing information in context with the right architecture and the right algorithms
HDB as part of an architecture: Next Likely Purchase
INTERNAL
APP
PURCHASE
NEXT OFFER
REAL-TIME VIEW OF
TRANSACTIONS AND OFFERS
REPORTS
60© 2016 Pivotal Software, Inc. All rights reserved.
CUSTOMER
APP
Providing information in context with the right architecture and the right algorithms
HDB as part of an architecture: Next Likely Purchase
INTERNAL
APP
PURCHASE
NEXT OFFER
REAL-TIME VIEW OF
TRANSACTIONS AND OFFERS
TRANSACTIONS
PMML
Model Creation &
Training
HDB Tables
HDFS Staging
1. Ingest, transform, and land data into HDFS
2. Score streaming data and serve to
application
DATASCIENCE &
AD HOC QUERIES
REPORTS
61© 2016 Pivotal Software, Inc. All rights reserved.
Advanced Analytics
Performance
Exceptional MPP performance, low latency,
petabyte scalability, ACID reliability, fault tolerance
Most Complete
Language Compliance
Higher degree of SQL compatibility, SQL-92, 99,
2003, OLAP, leverage existing SQL skills
Advanced Query
Optimizer
Maximize performance and
do advanced queries with confidence
Elastic Architecture for
Scalability
Scale-up/down or scale-in/out, expand/shrink
clusters on the fly
Integrated w/MADlib
Machine Learning
Advanced MPP analytics, data science at scale,
directly on Hadoop data
MAD
Pivotal HDB Advantages
62© Copyright 2015 Pivotal. All rights reserved.
“Companies need to learn how to
catch
people or things in the act of doing
something and affect the outcome“
PAUL MARITZ
Executive Chairman, Pivotal
Real-time and
Personalised Information
in Context is what Wins!
Pivotal Big Data Suite Technical Overview

Más contenido relacionado

La actualidad más candente

Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...Yahoo Developer Network
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackSnapLogic
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3OTN Systems Hub
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...DataWorks Summit
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopDataWorks Summit
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
The Destiny of Data
The Destiny of DataThe Destiny of Data
The Destiny of DataHortonworks
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic EcosystemsHortonworks
 
Oracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewOracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewJinyu Wang
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsHortonworks
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionTravis Wright
 
Oracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldOracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldJeffrey T. Pollock
 

La actualidad más candente (20)

Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over Hadoop
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Surviving the Hadoop Revolution
Surviving the Hadoop RevolutionSurviving the Hadoop Revolution
Surviving the Hadoop Revolution
 
The Destiny of Data
The Destiny of DataThe Destiny of Data
The Destiny of Data
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
 
Oracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewOracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service Overview
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Oracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldOracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorld
 

Destacado

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeAndre Langevin
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data ScienceVMware Tanzu
 
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2VMware Tanzu
 
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireSpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireJay Lee
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow VMware Tanzu
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?VMware Tanzu
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the MonolithVMware Tanzu
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization TransformationVMware Tanzu
 
Cloud foundry architecture and deep dive
Cloud foundry architecture and deep diveCloud foundry architecture and deep dive
Cloud foundry architecture and deep diveAnimesh Singh
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native JourneyVMware Tanzu
 
Cloud foundry presentation
Cloud foundry presentation Cloud foundry presentation
Cloud foundry presentation Vivek Parihar
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewVMware Tanzu
 

Destacado (20)

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
 
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireSpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
 
Cloud foundry architecture and deep dive
Cloud foundry architecture and deep diveCloud foundry architecture and deep dive
Cloud foundry architecture and deep dive
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native Journey
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Cloud foundry presentation
Cloud foundry presentation Cloud foundry presentation
Cloud foundry presentation
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
 

Similar a Pivotal Big Data Suite Technical Overview

IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...In-Memory Computing Summit
 
Pivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow KeynotePivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow Keynotecornelia davis
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Data Con LA
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationVMware Tanzu
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital TransformationVMware Tanzu
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsSkillspeed
 
The New Possible: How Platform-as-a-Service Changes the Game
 The New Possible: How Platform-as-a-Service Changes the Game The New Possible: How Platform-as-a-Service Changes the Game
The New Possible: How Platform-as-a-Service Changes the GameInside Analysis
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterpriseVMware Tanzu
 
Data Day - Escuchando la red
Data Day - Escuchando la redData Day - Escuchando la red
Data Day - Escuchando la redSoftware Guru
 
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...James Watters
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...
TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...
TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...CA Technologies
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
NetApp Industry Keynote - Flash Memory Summit - Aug2015
NetApp Industry Keynote - Flash Memory Summit - Aug2015NetApp Industry Keynote - Flash Memory Summit - Aug2015
NetApp Industry Keynote - Flash Memory Summit - Aug2015Val Bercovici
 
The Cloud Foundry Story
The Cloud Foundry StoryThe Cloud Foundry Story
The Cloud Foundry StoryVMware Tanzu
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lakeSumit Sarkar
 

Similar a Pivotal Big Data Suite Technical Overview (20)

IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
 
Pivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow KeynotePivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow Keynote
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital Transformation
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
The New Possible: How Platform-as-a-Service Changes the Game
 The New Possible: How Platform-as-a-Service Changes the Game The New Possible: How Platform-as-a-Service Changes the Game
The New Possible: How Platform-as-a-Service Changes the Game
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
Data Day - Escuchando la red
Data Day - Escuchando la redData Day - Escuchando la red
Data Day - Escuchando la red
 
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...
TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...
TechTalk: Accelerate Mobile Development using SDKs and Open APIs With CA API ...
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
NetApp Industry Keynote - Flash Memory Summit - Aug2015
NetApp Industry Keynote - Flash Memory Summit - Aug2015NetApp Industry Keynote - Flash Memory Summit - Aug2015
NetApp Industry Keynote - Flash Memory Summit - Aug2015
 
The Cloud Foundry Story
The Cloud Foundry StoryThe Cloud Foundry Story
The Cloud Foundry Story
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lake
 

Más de VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

Más de VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Último

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Último (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Pivotal Big Data Suite Technical Overview

  • 1.
  • 2. TECHNICAL OVERVIEW: Pivotal Big Data Suite Les Klein Field CTO Data Pivotal @LesKlein #PivotalForum #Dubai #DigitalTransformation
  • 3. Forward Looking Statements This presentation contains “forward-looking statements” as defined under the Federal Securities Laws. Actual results could differ materially from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in VMware’s Inc.’s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory; (xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and networks; (xiii) our ability to protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed previously and from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S. Securities and Exchange Commission. EMC and Pivotal disclaim any obligation to update any such forward-looking statements after the date of this release.
  • 4. 4© 2016 Pivotal Software, Inc. All rights reserved. Pivotal Big Data Suite Complete platform Hadoop Native SQL Deployment options Based on open source Flexible licensing Advanced data services PIVOTAL GREENPLUM DATABASE Data warehouse database based on open source Greenplum Database PIVOTAL HDB Open source analytical database for Apache Hadoop based on Apache HAWQ PIVOTAL GEMFIRE Open source application and transaction data grid based on Apache Geode Pivotal Big Data Suite Open source data management portfolio
  • 5. Great software companies leverage Big Data to fundamentally change the consumer experience and pioneer entirely new business models
  • 6. 6© 2016 Pivotal Software, Inc. All rights reserved. $4BN Financial Services $26BN Hospitality $50BN Transportation $54BN Entertainment $30BN Automotive $3.2BN Industrial Products CLOUD NATIVE SOFTWARE IS CHANGING INDUSTRIES Data is Fueling Software
  • 7. 7© Copyright 2015 Pivotal. All rights reserved. Hundreds of thousands of “trip” events each day 400+ billion of viewing-related events per day Five billion training data points for Price Tip feature Disruptors Use a LOT of Data
  • 8. 8© Copyright 2015 Pivotal. All rights reserved. “We’ve found that when a host selects a price that’s within 5% of their tip, they’re nearly 4 times more likely to get booked” “The importance of accuracy and efficiency […], will continue to rise as we expand and improve products like uberPOOL and beyond.” “Over 75% of what people watch come from our recommendations” Data manifests as features in an app
  • 9. 9© Copyright 2015 Pivotal. All rights reserved. (Data) Microservices Loosely coupled services architecture, bounded by context Cloud-Native Platforms Enabling continuous delivery & automated operations Open Source Database Innovation Extreme scale & performance advantages, built for the cloud Machine Learning Use of predictive analytics to build smart apps How are they accomplishing this?
  • 10. 10© Copyright 2015 Pivotal. All rights reserved. These companies… Release new features in minutes, multiple times a day Support a micro-services architecture Consume a wide range of data sources and protocols Store and Analyze all their data Update algorithms and predictive models daily Continuously ask lots of questions of their data Modify data pipelines and add processing steps daily
  • 11. 11© 2016 Pivotal Software, Inc. All rights reserved. …but most enterprises are not quite there yet 11 Applications scalability limited by databases Real-time data insights limited by disconnected OLTP and OLAP systems Data services are not ready for cloud platforms App 2 App 1 App 3 BottleneckBottleneck Transactional Database App App App Transactional Database ETL / ELT Batches Δt TRANSACTIONS ANALYTICS Analytic Database Continuous Delivery
  • 12. 12© 2016 Pivotal Software, Inc. All rights reserved. Stream + Batch Processing Programming + Operating Model Cloud-Native Platform Microservices FrameworkPlatform Runtime Hadoop DW Spark Microservices and Polyglot Persistence IMDG K/V Store Relational DB Big Data & Machine Learning Modern Cloud-Native Data Architecture Cloud Infrastructure
  • 13. 13© 2016 Pivotal Software, Inc. All rights reserved. New pressures are breaking fragile systems 13 Applications scalability limited by databases Real-time data insights limited by disconnected OLTP and OLAP systems Data services are not ready for cloud platforms App 2 App 1 App 3 BottleneckBottleneck Transactional Database App App App Transactional Database ETL / ELT Batches Δt TRANSACTIONS ANALYTICS Analytic Database Continuous Delivery
  • 14. 14© 2016 Pivotal Software, Inc. All rights reserved. Apps scalability limited by scalability of databases 14 DB scalability limitations are aggravated by additional devices, clients and apps App 2 App 1 App 3 Existing Applications New devices And clients New cloud native scalable data apps App 2 App 1 App 3 BottleneckBottleneck Transactional Database Scale-out applications vs Scale-up databases
  • 15. 15© 2016 Pivotal Software, Inc. All rights reserved. GemFire: 15 Cloud-scale high performance transactional data • Horizontally scalable • Ultra fast, low-latency in-memory transactions • Fully configurable data consistency • Reliable eventing and notification model • Highly Available, auto-healing • Inter-cluster WAN replication Custom AppsCustom Apps App 1App 1App 1App 1App 1App 1 App 2App 2App 2App 2App 2App 2 Push Updates Transactional Native API Rest / HTTP Transactional Native API Rest / HTTP Pivotal GemFire
  • 16. 16© 2016 Pivotal Software, Inc. All rights reserved. Batch-mode latency prevents real-time analysis 16 Applications scalability limited by databases Real-time data insights limited by disconnected OLTP and OLAP systems Data services are not ready for cloud platforms App 2 App 1 App 3 BottleneckBottleneck Transactional Database App App App Transactional Database ETL / ELT Batches Δt TRANSACTIONS ANALYTICS Analytic Database Continuous Delivery
  • 17. 17© 2016 Pivotal Software, Inc. All rights reserved. Data TemperatureHot Hot Real-time data analytics is limited by data integration batches 17 Overnight ETL / ELT jobs expose data that is already outdated App 1 App 3 App 2 Transactional Database ETL / ELT Batches Δt TRANSACTIONS ANALYTICS • Analytical processes don’t have access to the latest data • ETL/ELT processes are expensive and hard to maintain • Batch process windows limits data scalability MPP Cold
  • 18. 18© 2016 Pivotal Software, Inc. All rights reserved. Operationalized data insights need an event-driven architecture 18 Combination of SQL Analytics and NoSQL event-driven transactions is needed App 1 App 3 App 2 Transactional Database TRANSACTIONS ANALYTICS • Data Insights must be immediately pushed to applications • Apps should be able to react in real-time to analytical findings MPP Machine Learning Advanced Analytics ANSI SQL APIs / NoSQL Data Insights
  • 19. 19© 2016 Pivotal Software, Inc. All rights reserved. DataTemperatureWarmHot GemFire and GPDB - Big Data meets Fast Data 19 Custom AppsCustom Apps App 1App 1App 1App 1App 1App 1 App 2App 2App 2App 2App 2App 2 Pivotal GemFire Data science, analytics & ML Data science, analytics & ML Transactional Native API Rest / HTTP Transactional Native API Rest / HTTP Analytical ANSI SQL Analytical ANSI SQL Push Updates Pivotal Greenplum Parallel Configurable Data Load Parallel Configurable Data Load Transactional data Write behind Analytical Data to cache
  • 20. 20© 2016 Pivotal Software, Inc. All rights reserved. Example Use-Case - Predictive Maintenance 20 Evaluates live data “According to historical trends, there’s an 80% chance this equipment would fail in the next 12 hours” Learns with historical trends "How were the temperature and vibration sensors reading when the latest failures happened?" Live data becomes historical over time Sensor data Historical Real-time Take action Smart system
  • 21. 21© 2016 Pivotal Software, Inc. All rights reserved. Reference Architecture 21 Sensors Pivotal GemFire Pivotal Greenplum Database • Extensible • Open-Source • Fault-Tolerant • Horizontally Scalable Update with findings Client application Machine learning model Evaluate Train Real-time data streams Spring Cloud Data Flow
  • 22. 22© 2016 Pivotal Software, Inc. All rights reserved. …but most enterprises are not quite there yet 22 Applications scalability limited by databases Real-time data insights limited by disconnected OLTP and OLAP systems Data services are not ready for cloud platforms App 2 App 1 App 3 BottleneckBottleneck Transactional Database App App App Transactional Database ETL / ELT Batches Δt TRANSACTIONS ANALYTICS Analytic Database Continuous Delivery
  • 23. 23© 2016 Pivotal Software, Inc. All rights reserved. The Cloud Native Transformation 23 Enabling continuous delivery on the cloud Cloud Native PlatformsCloud Native Platforms MicroservicesMicroservices DevOpsDevOps DevelopersDevelopers OperatorsOperators Continuous Delivery
  • 24. 24© 2016 Pivotal Software, Inc. All rights reserved. Cloud Native apps are better suitable for NoSQL 24 Enabling fast and scalable event-driven data services Unidirectional, request-response SQLUnidirectional, request-response SQL Bidirectional, event-driven APIsBidirectional, event-driven APIs Monolithic apps needed complex schema- based, SQL databases Micro-services need much simpler schemas, but much better scalability SQL API API API
  • 25. 25© 2016 Pivotal Software, Inc. All rights reserved. Data Services must be integrated with your cloud platform 25 Enabling Agile Ops and DevOps ApplicationsApplications ServicesServices
  • 26. 26© 2016 Pivotal Software, Inc. All rights reserved. Pivot al Clou d Foun dry Pivot al Clou d Foun dry GemFire for Pivotal Cloud Foundry 26 Lightning fast in-memory persistence for cloud native apps • One-click provisioning • Pre-packaged configuration • Embedded monitoring by Pulse • Auto application binding • Multi-cloud support • Reliable data replication between PCF sites Pivotal GemFire Click to Deploy Click to Deploy
  • 27. 27© 2016 Pivotal Software, Inc. All rights reserved. Cloud-ready, infra-structure agnostic Cloud-ready, infra-structure agnostic Next-generation databases must keep up to cloud native apps 27 Can your database do all of this? GemFire IMDG DOES. Horizontal ScalabilityHorizontal Scalability Automatic fail-overAutomatic fail-over Reliable eventing modelReliable eventing model Multi-site High AvailabilityMulti-site High Availability Seamless integration to analytical databases Seamless integration to analytical databases App 1 App 3App 2
  • 28. 28© 2016 Pivotal Software, Inc. All rights reserved. Commercial distribution • Enterprise Support Subscription • Managed service on Pivotal Cloud Foundry • Indemnification Commercial distribution • Enterprise Support Subscription • Managed service on Pivotal Cloud Foundry • Indemnification Apache License • Replication and partitioning • Shared-nothing persistence • High performance transactions • OQL and Indexes • Distributed functions • Continuous query and event subscription • Clustering and high availability • Configurable WAN gateway Apache License • Replication and partitioning • Shared-nothing persistence • High performance transactions • OQL and Indexes • Distributed functions • Continuous query and event subscription • Clustering and high availability • Configurable WAN gateway Based on open source: Apache Geode (incubating) GemFire
  • 29. 29© 2016 Pivotal Software, Inc. All rights reserved. Pivotal Greenplum World’s First Open Source Massively Parallel Data Warehouse
  • 30. 30© 2016 Pivotal Software, Inc. All rights reserved. • Relational database system for big data and data warehousing • • Mission critical & system of record product with supporting tools and ecosystem • • Fully open source with a global community of developers and users • • Large industrial focused system • • PostgreSQL based • • Multi-platform technology Greenplum Database Mission & Strategy
  • 31. 31© 2016 Pivotal Software, Inc. All rights reserved. •Government – Tax & benefits fraud detection – Economic statistics research •Financial Services – Wealth management data science and product development for Commercial Banking – Risk and trade repositories reporting – 401K providers analytics on investment choices •Pharmaceutical – Vaccine potency prediction based on manufacturing sensors •IoT – Predictive maintenance for auto manufacturer, industrial equipment and government agencies – Semiconductor Fab sensor analytics and reporting Highlighted Greenplum Successes •Cyber Security & Surveillance – Internal email and communication surveillance and reporting – Corporate network anomalous behavior and intrusion detections •Oil & Gas – Drilling equipment predictive maintenance •Communications – Mobile telephone company enterprise data warehouse – Network performance and availability analytics •Retail – Customer purchases analytics •Transportation – Airlines loyalty program analytics
  • 32. 32© 2016 Pivotal Software, Inc. All rights reserved. POLYMORPHIC STORAGE HEAP, Append Only, Columnar, External, Compression MULTI-VERSION CONCURRENCY CONTROL (MVCC) Greenplum Overview Greenplum DB SYSTEM ACCESS DATA PROCESSING DATA STORAGE CLIENT ACCESS PSQL, ODBC, JDBC BULK LOAD/UNLOAD GPLoad, GPFdist, External Tables, GPHDFS ADMIN TOOLS GP Perfmon, GP Support 3rd PARTY TOOLS Compatible with Industry Standard BI & ETL Tools SQL STANDARD COMPLIANCE MASSIVELY PARALLEL PROCESSING (MPP) IN-DATABASE PROGRAMMING LANGUAGES PL/pgSQL, PL/Python, PL/R, PL/Perl, PL/Java, PL/C IN-DATABASE ANALYTICS & EXTENSIONS MADlib, PostGIS, PGCrypto FULLY ACID COMPLIANT TRANSACTIONAL DATABASE INDEXES B-Tree, Bitmap, GiST BIG DATA QUERY OPTIMIZER
  • 33. 33© 2016 Pivotal Software, Inc. All rights reserved. • An ambitious project • 10+ years in the making • Investment of hundred of millions of dollars • Potential to define a new market and disrupt traditional EDW vendors • www.greenplum.org • Github code • mailing lists / community engagement • Global project w/ external contributors • Pivotal Greenplum • Enterprise software distribution & release management • Pivotal expertise Greenplum Open Source
  • 34. 34© 2016 Pivotal Software, Inc. All rights reserved. PostgreSQL Heritage Greenplum Open Source Launch • Widely used • Open Source • PostgreSQL License • Enterprise class open source relational engine
  • 35. 35© 2016 Pivotal Software, Inc. All rights reserved. MPP Shared Nothing Architecture Flexible framework for processing large datasets … Master Host SQL Master Host and Standby Master Host Master coordinates work with Segment Hosts Segment Host with one or more Segment Instances Segment Instances process queries in parallel Segment Hosts have their own CPU, disk and memory (shared nothing) High speed interconnect for continuous pipelining of data processing Interconnect Segment Host Segment Instance Segment Instance Segment Instance Segment Instance Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node1 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node2 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node3 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance nodeN Greenplum DB
  • 36. 36© 2016 Pivotal Software, Inc. All rights reserved. Greenplum DB External Sources Loading, streaming, etc. Network Interconnect ... ... ...... Master Servers Query planning & dispatch Segment Servers Query processing & data storage SQL ETL File Systems Fast Parallel Load & Unload No Master Node bottleneck 10+ TB/Hour per Rack Linear scalability Low Latency Data immediately available No intermediate stores No data “reorganization” Load/Unload To & From: File Systems Any ETL Product Hadoop & Amazon S3 Loading: Massively-Parallel Ingest Extreme speed and immediate usability from files, ETL, Hadoop & S3
  • 37. 37© 2016 Pivotal Software, Inc. All rights reserved. Pivotal Query Optimizer Turns a SQL query into an execution plan Greenplum DB • First Cost Based Optimizer for BIG data • Applies broad set of optimization strategies at once – Considers many more plan alternatives – Optimizes a wider range of queries – Optimizes memory usage • New Extensible Code Base – Rapid adoption of emerging technologies • Significant improvements for demanding queries
  • 38. 38© 2016 Pivotal Software, Inc. All rights reserved. Rule based query management to monitor and manage queries and resource queues •Monitors Greenplum Database queries and host utilization statistics •Logs when a query exceeds a threshold •Throttles the CPU usage of a query when it exceeds a threshold •Terminates a query •Detects memory, CPU, or disk I/O skew occurring during the execution of a query •Creates detailed rules to manage queries •Adds, modified, or deletes Greenplum Database resource queues Greenplum Workload Manager
  • 39. 39© 2016 Pivotal Software, Inc. All rights reserved. Polymorphic Storage™ User Definable Storage Layout  Columnar storage compresses better  Optimized for retrieving a subset of the columns when querying  Compression can be set differently per column: gzip (1-9), quicklz, delta, RLE  Row oriented faster when returning all columns  HEAP for many updates and deletes  Use indexes for drill through queries TABLE ‘SALES’ Jun Column-orientedRow-oriented Oct Year -1 Year -2 External HDFS or S3  Less accessed partitions on external and seamlessly query all data  All major Hadoop distributions  Amazon S3 storage  Others in development Nov DecJul Aug Sep
  • 40. 40© 2016 Pivotal Software, Inc. All rights reserved. Parent table Feb 2014 RETExternal Dec 2014Jan2013 Jan 2014 Partitions and External Partitions ... • Hash Distribution to evenly spread data across all segment instances • Range Partition within a segment instance to minimize scan work • Partitioned Tables Support for External Tables as a Partition – Readable external table – Host file system, NFS mount, HDFS or Amazon S3 Greenplum DB
  • 41. 41© 2016 Pivotal Software, Inc. All rights reserved. Hybrid Queries: Pivotal External Tables 4.3 • Readable Ext-Table MVP • Readable Gzip Files • Writable Ext-Table • Investigation: Enhanced Security/Roles • Investigation: Additional File Formats S3 External Tables Gemfire External Tables • Hi Speed Ingestion • Hi Concurrency Query Cache GPHDFS Roadmap
  • 42. 42© 2016 Pivotal Software, Inc. All rights reserved. Greenplum Database Features for Data Scientists • Window functions: Perform calculations across a set of table rows that are somehow related to the current row • Analytics extensions: In-database machine learning at scale using MADlib • Procedural language extensions: Extended functionality using non-SQL programming languages and packages (e.g. Python and R) • Client Access: ODBC and JDBC access to support connections to 3rd party tools * Only a subset of Greenplum Database features
  • 43. 43© 2016 Pivotal Software, Inc. All rights reserved. Procedural Languages • User Defined Types • User Defined Functions • User Defined Aggregates • Import of libraries from open source
  • 44. 44© 2016 Pivotal Software, Inc. All rights reserved. Scalable, In-Database Machine Learning • Open source https://github.com/apache/incubator-madlib • Downloads and docs http://madlib.incubator.apache.org/ • Wiki https://cwiki.apache.org/confluence/display/MADLIB/
  • 45. 45© 2016 Pivotal Software, Inc. All rights reserved. Functions Linear Systems • Sparse and Dense Solvers • Linear Algebra Matrix Factorization • Singular Value Decomposition (SVD) • Low Rank Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Ordinal Regression • Cox Proportional Hazards Regression • Elastic Net Regularization • Robust Variance (Huber-White), Clustered Variance, Marginal Effects Other Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Apriori) • Topic Modeling (Parallel LDA) • Decision Trees • Random Forest • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation • Naïve Bayes • Support Vector Machines (SVM) Descriptive Statistics Sketch-Based Estimators • CountMin (Cormode-Muth.) • FM (Flajolet-Martin) • MFV (Most Frequent Values) Correlation and Covariance Summary Utility Modules Array and Matrix Operations Sparse Vectors Random Sampling Probability Functions Data Preparation PMML Export Conjugate Gradient Stemming Inferential Statistics Hypothesis Tests Time Series • ARIMA April 2016 Path Functions • Operations on Pattern Matches
  • 46. 46© 2016 Pivotal Software, Inc. All rights reserved. GPDB Geospatial Current Key Features: • Points, Lines, Polygons, Perimeter, Area, Intersection, Contains, Distance, Long/Lat, Spatial Indexes & Bounding Boxes Round earth calculations Ability to store geospatial data and query with with joins and operators Raster Image Processing
  • 47. 47© 2016 Pivotal Software, Inc. All rights reserved. Pivotal HDB Hadoop Native SQL Database
  • 48. 48© 2016 Pivotal Software, Inc. All rights reserved.
  • 49. 49© 2016 Pivotal Software, Inc. All rights reserved. Enabling data science and machine learning at scale Making the Hadoop Data Lake More Consumable 2) Data scientists still have to resort to sampling if they can't run analytics in-database at scale 3) There are multiple data sets and formats within Hadoop SQL App BUSINESS ANALYSTS DATA SCIENTISTS DATA LAKE DATA LAKE Hive, HBase, etc. DATA LAKE 1) Important people and tools are cut-off because of SQL completeness or performance.
  • 50. 50© 2016 Pivotal Software, Inc. All rights reserved. As the lingua franca of analytics, SQL can't be ignored. Neither can performance. Making the Hadoop Data Lake More Consumable 2) Data scientists still have to resort to sampling if they can't run analytics in-database at scale 3) There are multiple data sets and formats within Hadoop SQL App BUSINESS ANALYSTS DATA SCIENTISTS DATA LAKE DATA LAKE Hive, HBase, etc. DATA LAKE 1) Important people and tools are cut-off because of SQL completeness or performance.
  • 51. 51© 2016 Pivotal Software, Inc. All rights reserved. Lack of interactive, ANSI SQL capabilities inhibits adoption and value Hadoop data lakes sit underutilized Producing complex queries, large joins, interactive queries Existing investments in visualization and BI tools Large population of users with SQL skills DATA LAKE DATA SCIENTISTS BUSINESS ANALYSTS SQL App
  • 52. 52© 2016 Pivotal Software, Inc. All rights reserved. High performance, interactive SQL queries on Hadoop HDB: The Hadoop Native SQL Database ● Highly efficient MPP (massively parallel processing) ● Low-latency ● Petabyte scalability ● ACID transaction support ● SQL-92, 99, 2003 compatibility ● Advanced cost-based optimizer DATA LAKE SQL App BUSINESS ANALYSTS DATA SCIENTISTS
  • 53. 53© 2016 Pivotal Software, Inc. All rights reserved. Integrate SQL and data science tools into an interactive, operationalized environment Making the Hadoop Data Lake More Consumable 2) Data scientists still have to resort to sampling if they can't run analytics in-database at scale 3) There are multiple data sets and formats within Hadoop SQL App BUSINESS ANALYSTS DATA SCIENTISTS DATA LAKE DATA LAKE Hive, HBase, etc. DATA LAKE 1) Important people and tools are cut-off because of SQL completeness or performance.
  • 54. 54© 2016 Pivotal Software, Inc. All rights reserved. Using traditional, single-node Python or R for analytics means using subsets because of the lack of parallelization Predictive analytics not scaling with Python or R <...> Implications • Time-consuming data movement • Working with small sample sizes requires extra testing cycles against larger data sets • Slow feature generation limits algorithm development DATA LAKE DATA LAKE DATA LAKE SAMPLE 1 SAMPLE 2 SAMPLE n
  • 55. 55© 2016 Pivotal Software, Inc. All rights reserved. ApacheTM MADlib® (incubating) is an open-source library for scalable in-database analytics In-database analytics speeds predictive modeling Scale-out mathematical, statistical and machine learning methods for structured and unstructured data • SQL-based • Analyze without sampling • Open source • Runs on HDB, Greenplum, and Postgres • Compliments support for procedural languages: PL/R, PL/Python, PL/Java Train a model... Predict for new data... DATA LAKE
  • 56. 56© 2016 Pivotal Software, Inc. All rights reserved. Overcome complexity Making the Hadoop Data Lake More Consumable 2) Data scientists still have to resort to sampling if they can't run analytics in-database at scale 3) There are multiple data sets and formats within Hadoop SQL App BUSINESS ANALYSTS DATA SCIENTISTS DATA LAKE DATA LAKE Hive, HBase, etc. DATA LAKE 1) Important people and tools are cut-off because of SQL completeness or performance.
  • 57. 57© 2016 Pivotal Software, Inc. All rights reserved. Schema Read Data Read Data Read DataRead DataRead DataRead Data R ead HDB’s Pivotal eXtension Framework (PXF) and HCatalog integration Simplifying the data lake with data federation • Enables connectivity between Pivotal HDB and other stores (Hive, HBase, HDFS files). • Provides an extensible framework to add support for custom services • Low latency on large data sets • Considers cost model of federated sources HDFS DATA LAKE HCatalog CSV TXT Avro Custom Extensions SQL Queries
  • 58. 58© 2016 Pivotal Software, Inc. All rights reserved. INFORMATION IN CONTEXT List of customers at high risk of churn, ranked by dollars at risk Recommend savings goals with portfolio projections Optimize routine maintenance schedule, based on predicted failures Shopping recommendation, based on recent purchases Making the Hadoop Data Lake More Consumable Delivering information in context
  • 59. 59© 2016 Pivotal Software, Inc. All rights reserved. CUSTOMER APP Providing information in context with the right architecture and the right algorithms HDB as part of an architecture: Next Likely Purchase INTERNAL APP PURCHASE NEXT OFFER REAL-TIME VIEW OF TRANSACTIONS AND OFFERS REPORTS
  • 60. 60© 2016 Pivotal Software, Inc. All rights reserved. CUSTOMER APP Providing information in context with the right architecture and the right algorithms HDB as part of an architecture: Next Likely Purchase INTERNAL APP PURCHASE NEXT OFFER REAL-TIME VIEW OF TRANSACTIONS AND OFFERS TRANSACTIONS PMML Model Creation & Training HDB Tables HDFS Staging 1. Ingest, transform, and land data into HDFS 2. Score streaming data and serve to application DATASCIENCE & AD HOC QUERIES REPORTS
  • 61. 61© 2016 Pivotal Software, Inc. All rights reserved. Advanced Analytics Performance Exceptional MPP performance, low latency, petabyte scalability, ACID reliability, fault tolerance Most Complete Language Compliance Higher degree of SQL compatibility, SQL-92, 99, 2003, OLAP, leverage existing SQL skills Advanced Query Optimizer Maximize performance and do advanced queries with confidence Elastic Architecture for Scalability Scale-up/down or scale-in/out, expand/shrink clusters on the fly Integrated w/MADlib Machine Learning Advanced MPP analytics, data science at scale, directly on Hadoop data MAD Pivotal HDB Advantages
  • 62. 62© Copyright 2015 Pivotal. All rights reserved. “Companies need to learn how to catch people or things in the act of doing something and affect the outcome“ PAUL MARITZ Executive Chairman, Pivotal Real-time and Personalised Information in Context is what Wins!