SlideShare una empresa de Scribd logo
1 de 22
SnappyData Roadmap
In-Memory Data Platform based on Spark
for Interactive Analyticson LIVE Data
© Snappydata Inc 2017
www.Snappydata.io
SnappyData Team
Disclaimer – Dates can change, content can be reprioritized
2
ACCESS using Spark programming model, Core functions from Snappy
SnappyData – Unified Analytics Platform On Spark
Mutable In-memory database, HA,
High concurrency, Persist/recover, WAN…
Low latency predictions, Row-Column tables,
Approximate Query processing, Transactions ….
600% faster OOTB than latest Spark version
Spark API
- Streaming
- Graph
- Map-reduce
- ML
- Spark DL pipelines
SQL: JDBC, ODBC
REST
Spark connectors
- NoSQL (Cassandra,
HBase,
Redis, Elastic)
- RDBMS
- CSV, S3
- HDFS
- Mainframes
SnappyData: An In-Memory Virtual Cloud Warehouse
CDC
On-Prem
Cloud
Enterprise
Security
Streaming
& ML
Querying
Subsystem
Intelligent Data Management
Inventory
Finance
On-prem mfg
m/c
Digital Click
Streams
Legacy
Operational
Systems
CDC
IoT streams
Cloud Native
Data Sources
Cloud native
streaming sources
SnappyData
© SnappyData Inc. 2017
2 0 1 9 P l a n
SnappyData Real Time Analytics Product Suite
A Spark Based Big Data Analytics Platform
5
Spark API
(Streaming, ML, Graph)
Transactions
, Indexing
Full SQL HA
DataFrame,
RDD, DataSets
RowsColumnar
IN-MEMORY
Spark Cache
Synopses
(Samples)
Unified Data Access
(Virtual Tables)
Unified CatalogNative Store
SNAPPYDATA
HDFS/HBAS
E
S3
JSON, CSV,
XML
SQL db Cassandra MPP DB
Stream
sources
Spark Jobs, Scala/Java/Python/R API, JDBC/ODBC, Object API (RDD, DataSets)
GemFire
We transform Spark from this…
6
Deep Scale,
High Volume
MPP DB
USER 1 / APP 1
SPARK
MASTER
Spark Execution (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
USER 2 / APP 2
SPARK
MASTER
Spark Execution (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
HDFS
SQL
NoSQL
• Cannot update
• Repeated for each
User/APP
Bottleneck
… Into “an always-on hybrid database !
7
Deep Scale,
High Volume
MPP DB
HDFS
SQL
NoSQL
HISTORY
Spark Execution (Worker)JVM
- Long running
Framework for
streaming
SQL, ML…
Spark
Driver
IN-Memory
ROW + COLUMN
Start with
Indexing
Store
- Mutable,
- TransactionalSPARK
Cluster
JDBC
ODBC
Spark Job
Shared Nothing
Persistence
Architecture
8
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
TXN
Synopsis Data Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC
9
Continuous
replication
Join with
Hadoop
NoSQL
Rich SPARK APIs
Stream window
Spark
Transform
(Data Prep)
- Apps/BI Clients execute ad-hoc Join/aggregation queries on multiple NoSQL stores
Live Analytics WithOut The Need For Pipelines
In-memory
Row-Column
Tables
Virtual Tables
NoSQL Connectors SQL
Pull history
on Demand
Continuously
summarize
- No need to do expensive pre-aggregations on large data sets
- Analytics on current, moving data
- Built-in Spark ETL to enrich data
20X faster than Spark, 100-1000X faster than Spark-Cassandra
Micro
Service 1
Micro
Service 2
Micro
Service 3
Session state
Profiles
Orders
Use-case Patterns
•Real-time Analytics operational DB
• Move from traditional cubes to distributed in-memory for real-time
•Streaming with Interactive Analytics
• Stream joins with history/context
• Tableau/SpotFire/Zeppelin based interactive analytics
•Interactive exploratory analytics
• Patterns, Top-K, Trends at Google like speed
Snappy on PKS – Cloud Neutral Containerized Analytics
Platform
 In-memory redundancy and HA
provided by SnappyData
 Pod redundancy and restarts
provided by Kubernetes
 VM redundancy and restarts
provided by PKS
Steps To Launch A Snappy Cluster On PKS
# Connect to PKS cluster
• pks login -a https://api.pks.snappydata.io -u <uname> -p test123 -k
• pks get-credentials pks-cluster-01
• kubectl config use-context pks-cluster-01
# Update to the latest snappydata chart
• cd <spark-on-k8s-checkout>
• git fetch
• git checkout enable-hive-server
# Start SnappyData cluster and note the external IP addresses of lead and locator
• helm install --name snappydata --namespace snappy ./charts/snappydata/
• kubectl get services -n snappy | grep public
Steps To Launch A Snappy Cluster On PKS
# Load data into the cluster
• <snappydata-product-dir>/bin/snappy
• snappy> connect client '<locator-public-ip>:1527';
• snappy> run '<path/to/attached/load_CFPB_CC_Data.sql';
# Access SnappyData dashboard at <lead-public-ip>:5050
# Tableau workbook
• Point the workbook to the lead node.
• Launch the workbook by double-clicking it.
How We Beat The Competition
 Unified Analytics through deep integration into Apache Spark and its eco-
system
 High performance through in-memory design center
 Support for ETL free live data through CDC integration
 Scale and Performance using our Synopsis Data Engine
 Cloud neutral, lower TCO analytics platform based on Kubernetes
 Standards based approach with support for SQL, ML, & Streaming
© SnappyData Inc. 2017
Apache Spark
compatible data
platform = Unified
analytics in the
cloud
Analytics any on
prem or cloud
native data source
=
No expensive
cloud data
migration!!
Cloud neutral
portable real time
analytics= No
Amazon lock in!!
High concurrency
and BI tool support
= Analytics for
everyone!!
Intelligent CDC
integration =
Unified analytics
on live data!! In memory
scale out
virtual
cloud
warehouse
What Makes SnappyData Compelling
• Multi-cloud certification
Kubernetes for Multi-cloud support
• Cloud neutral managed cloud offering
DevOps Simplification
• RLS, persistence to cloud, backup/restore using parquet, Dashboard
enhancements, improved performance using SIMD
Enterprise Readiness
• Support for Debizium, certified on major Spark distributions
Eco system support
2019 Themes
17
Sampling of Customer Use Cases today
CDC
Streams
NoSQL
w i n d o w
Spark Transform
(Data Prep)
In-memory
Row-Column
Tables
SpotFire & Tableau
Raw Data
Ingestion
& Prep
Rich SPARK APIs NoSQL ConnectorsSQL
© SnappyData Inc. 2017
SnappyData
Analytics Back Bone For A Large Fortune 30 company
Smart City – Parking, congestion management
● Sensors power lamp posts
● Optimize parking services
● Optimize energy consumption
● Congestion control
Challenge:
• Hundreds of thousands of sensor streams generating too much data
• Actionable intelligence requires analysis of streams with history
• Ad-hoc Interactive analytics on all this data
Smart City – Parking, Congestion Management
 Application built using SnappyData’s Unified
Analytics API
 Reduced complexity due to fewer moving
parts
 20X better performance and far fewer
resources
Performance Benchmark
600% faster than Apache Spark in TPC-H (Complex Analytical queries)
Up to 20X faster than Spark on complex joins, aggregations
Continue at www.snappydata.io/resources

Más contenido relacionado

La actualidad más candente

Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-103-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1Ognjen Antonic
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure DatabricksSascha Dittmann
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 

La actualidad más candente (20)

Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-103-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 

Similar a High performance Spark distribution on PKS by SnappyData

SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resumeTarun P
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 

Similar a High performance Spark distribution on PKS by SnappyData (20)

SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 

Más de Carlos Andrés García

Cloud Native Security: New Approach for a New Reality
Cloud Native Security: New Approach for a New RealityCloud Native Security: New Approach for a New Reality
Cloud Native Security: New Approach for a New RealityCarlos Andrés García
 
Automate and Enhance Application Security Analysis
Automate and Enhance Application Security AnalysisAutomate and Enhance Application Security Analysis
Automate and Enhance Application Security AnalysisCarlos Andrés García
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Carlos Andrés García
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSCarlos Andrés García
 
PKS - Solving Complexity for Modern Data Workloads
PKS - Solving Complexity for Modern Data Workloads PKS - Solving Complexity for Modern Data Workloads
PKS - Solving Complexity for Modern Data Workloads Carlos Andrés García
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteCarlos Andrés García
 
Orchestrating Stateful Applications with PKS and Portworx
Orchestrating Stateful Applications with PKS and PortworxOrchestrating Stateful Applications with PKS and Portworx
Orchestrating Stateful Applications with PKS and PortworxCarlos Andrés García
 

Más de Carlos Andrés García (8)

Cloud Native Security: New Approach for a New Reality
Cloud Native Security: New Approach for a New RealityCloud Native Security: New Approach for a New Reality
Cloud Native Security: New Approach for a New Reality
 
Automate and Enhance Application Security Analysis
Automate and Enhance Application Security AnalysisAutomate and Enhance Application Security Analysis
Automate and Enhance Application Security Analysis
 
Securing a Cloud Migration
Securing a Cloud MigrationSecuring a Cloud Migration
Securing a Cloud Migration
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 
PKS - Solving Complexity for Modern Data Workloads
PKS - Solving Complexity for Modern Data Workloads PKS - Solving Complexity for Modern Data Workloads
PKS - Solving Complexity for Modern Data Workloads
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
Orchestrating Stateful Applications with PKS and Portworx
Orchestrating Stateful Applications with PKS and PortworxOrchestrating Stateful Applications with PKS and Portworx
Orchestrating Stateful Applications with PKS and Portworx
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

High performance Spark distribution on PKS by SnappyData

  • 1. SnappyData Roadmap In-Memory Data Platform based on Spark for Interactive Analyticson LIVE Data © Snappydata Inc 2017 www.Snappydata.io SnappyData Team Disclaimer – Dates can change, content can be reprioritized
  • 2. 2 ACCESS using Spark programming model, Core functions from Snappy SnappyData – Unified Analytics Platform On Spark Mutable In-memory database, HA, High concurrency, Persist/recover, WAN… Low latency predictions, Row-Column tables, Approximate Query processing, Transactions …. 600% faster OOTB than latest Spark version Spark API - Streaming - Graph - Map-reduce - ML - Spark DL pipelines SQL: JDBC, ODBC REST Spark connectors - NoSQL (Cassandra, HBase, Redis, Elastic) - RDBMS - CSV, S3 - HDFS - Mainframes
  • 3. SnappyData: An In-Memory Virtual Cloud Warehouse CDC On-Prem Cloud Enterprise Security Streaming & ML Querying Subsystem Intelligent Data Management Inventory Finance On-prem mfg m/c Digital Click Streams Legacy Operational Systems CDC IoT streams Cloud Native Data Sources Cloud native streaming sources SnappyData
  • 4. © SnappyData Inc. 2017 2 0 1 9 P l a n SnappyData Real Time Analytics Product Suite
  • 5. A Spark Based Big Data Analytics Platform 5 Spark API (Streaming, ML, Graph) Transactions , Indexing Full SQL HA DataFrame, RDD, DataSets RowsColumnar IN-MEMORY Spark Cache Synopses (Samples) Unified Data Access (Virtual Tables) Unified CatalogNative Store SNAPPYDATA HDFS/HBAS E S3 JSON, CSV, XML SQL db Cassandra MPP DB Stream sources Spark Jobs, Scala/Java/Python/R API, JDBC/ODBC, Object API (RDD, DataSets) GemFire
  • 6. We transform Spark from this… 6 Deep Scale, High Volume MPP DB USER 1 / APP 1 SPARK MASTER Spark Execution (Worker) Framework for streaming SQL, ML… Immutable CACHE USER 2 / APP 2 SPARK MASTER Spark Execution (Worker) Framework for streaming SQL, ML… Immutable CACHE HDFS SQL NoSQL • Cannot update • Repeated for each User/APP Bottleneck
  • 7. … Into “an always-on hybrid database ! 7 Deep Scale, High Volume MPP DB HDFS SQL NoSQL HISTORY Spark Execution (Worker)JVM - Long running Framework for streaming SQL, ML… Spark Driver IN-Memory ROW + COLUMN Start with Indexing Store - Mutable, - TransactionalSPARK Cluster JDBC ODBC Spark Job Shared Nothing Persistence
  • 8. Architecture 8 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP TXN Synopsis Data Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  • 9. 9 Continuous replication Join with Hadoop NoSQL Rich SPARK APIs Stream window Spark Transform (Data Prep) - Apps/BI Clients execute ad-hoc Join/aggregation queries on multiple NoSQL stores Live Analytics WithOut The Need For Pipelines In-memory Row-Column Tables Virtual Tables NoSQL Connectors SQL Pull history on Demand Continuously summarize - No need to do expensive pre-aggregations on large data sets - Analytics on current, moving data - Built-in Spark ETL to enrich data 20X faster than Spark, 100-1000X faster than Spark-Cassandra Micro Service 1 Micro Service 2 Micro Service 3 Session state Profiles Orders
  • 10. Use-case Patterns •Real-time Analytics operational DB • Move from traditional cubes to distributed in-memory for real-time •Streaming with Interactive Analytics • Stream joins with history/context • Tableau/SpotFire/Zeppelin based interactive analytics •Interactive exploratory analytics • Patterns, Top-K, Trends at Google like speed
  • 11. Snappy on PKS – Cloud Neutral Containerized Analytics Platform  In-memory redundancy and HA provided by SnappyData  Pod redundancy and restarts provided by Kubernetes  VM redundancy and restarts provided by PKS
  • 12. Steps To Launch A Snappy Cluster On PKS # Connect to PKS cluster • pks login -a https://api.pks.snappydata.io -u <uname> -p test123 -k • pks get-credentials pks-cluster-01 • kubectl config use-context pks-cluster-01 # Update to the latest snappydata chart • cd <spark-on-k8s-checkout> • git fetch • git checkout enable-hive-server # Start SnappyData cluster and note the external IP addresses of lead and locator • helm install --name snappydata --namespace snappy ./charts/snappydata/ • kubectl get services -n snappy | grep public
  • 13. Steps To Launch A Snappy Cluster On PKS # Load data into the cluster • <snappydata-product-dir>/bin/snappy • snappy> connect client '<locator-public-ip>:1527'; • snappy> run '<path/to/attached/load_CFPB_CC_Data.sql'; # Access SnappyData dashboard at <lead-public-ip>:5050 # Tableau workbook • Point the workbook to the lead node. • Launch the workbook by double-clicking it.
  • 14. How We Beat The Competition  Unified Analytics through deep integration into Apache Spark and its eco- system  High performance through in-memory design center  Support for ETL free live data through CDC integration  Scale and Performance using our Synopsis Data Engine  Cloud neutral, lower TCO analytics platform based on Kubernetes  Standards based approach with support for SQL, ML, & Streaming
  • 15. © SnappyData Inc. 2017 Apache Spark compatible data platform = Unified analytics in the cloud Analytics any on prem or cloud native data source = No expensive cloud data migration!! Cloud neutral portable real time analytics= No Amazon lock in!! High concurrency and BI tool support = Analytics for everyone!! Intelligent CDC integration = Unified analytics on live data!! In memory scale out virtual cloud warehouse What Makes SnappyData Compelling
  • 16. • Multi-cloud certification Kubernetes for Multi-cloud support • Cloud neutral managed cloud offering DevOps Simplification • RLS, persistence to cloud, backup/restore using parquet, Dashboard enhancements, improved performance using SIMD Enterprise Readiness • Support for Debizium, certified on major Spark distributions Eco system support 2019 Themes
  • 17. 17 Sampling of Customer Use Cases today
  • 18. CDC Streams NoSQL w i n d o w Spark Transform (Data Prep) In-memory Row-Column Tables SpotFire & Tableau Raw Data Ingestion & Prep Rich SPARK APIs NoSQL ConnectorsSQL © SnappyData Inc. 2017 SnappyData Analytics Back Bone For A Large Fortune 30 company
  • 19. Smart City – Parking, congestion management ● Sensors power lamp posts ● Optimize parking services ● Optimize energy consumption ● Congestion control Challenge: • Hundreds of thousands of sensor streams generating too much data • Actionable intelligence requires analysis of streams with history • Ad-hoc Interactive analytics on all this data
  • 20. Smart City – Parking, Congestion Management  Application built using SnappyData’s Unified Analytics API  Reduced complexity due to fewer moving parts  20X better performance and far fewer resources
  • 21. Performance Benchmark 600% faster than Apache Spark in TPC-H (Complex Analytical queries) Up to 20X faster than Spark on complex joins, aggregations

Notas del editor

  1. This slide needs work. The CDC lines should be animated to show stuff constantly coming from the sources into Snappy. And on top of these boxes, there needs to be the following Some visual dashboards An arrow sending Spark jobs into the cluster
  2. Snappy store and Spark Executor share the same process space and JVM memory Reference based access – zero copy
  3. Examples from current customers