SlideShare a Scribd company logo
1 of 26
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager - SAM
Sriharsha Chintalapani Arun Mahadevan
Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History of Streaming at Hortonworks
 Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)
 First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)
 Added several improvements & features into Apache Storm. Yahoo! Running 2400 nodes of
Storm
 Added Security and critical features/improvements to Apache Kafka
 Lot of learnings from shipping Storm & Kafka from past 3 years
 Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from
shipping Storm & Kafka for past 3 years.
Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Schema Registry
 Streaming Applications usually fronts with a queue such as Kafka, Kinesis, EventHub etc..
 Data in Messaging Queues are Byte payloads and there is no schema associated with it.
 Streaming applications developers usually looks at the data flowing and defines their
processing of that data
 Any change to this data, schema wise, means developers have to update their code to
process the new format
 Support both programmatic schema creation and managed schemas
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Schema Registry
Kafka
Kinesis
EventHub
ConsumerBytes
Payload
Bytes
Payload
Storm
Spark Streaming
Others…
Producer
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
 What is it?
• A platform to design, develop, deploy and manage streaming analytics applications using a drag
drop visualize paradigm in minutes
• Allows you to do event correlation, context enrichment, complex pattern matching, analytical
aggregations and alerts/notifications when insights are discovered.
• Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g:
Storm, Spark Streaming, Flink)
• Extensibility is a first class citizen (add sinks, processors, sources as needed)
 Guiding Principle
– Build complex streaming applications easily with minimum code
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Complexities in building streaming applications
 New streaming engines and APIs
 Implementing windowing, joins, and state management is hard
 Interaction with external services such as HBase, Hive, HDFS etc
 Deploying with all the necessary configuration files
 Operations around the streaming application including monitoring and metrics
 Debugging streaming application
 Securing a streaming application cluster with the right configurations is a pain
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key challenges that SAM is trying to solve
 Building streaming applications requires specialized skillsets that most enterprise
organizations don’t have today
 Streaming applications require considerable amount of programming, testing and tuning
before deploying to production which takes a significant amount of time
 Key streaming primitives such as joining/splitting streams, aggregations over a window of
time and pattern matching are difficult to implement
 People don’t prefer to code to build complex streaming applications
 No true open source project today solves all of the above challenges
 People don’t care about the streaming engine that powers streaming applications so much as
long challenges above are addressed and doesn’t force them into vendor lock in.
Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Components and User
Personas
Distributed Streaming
Computation Engine
(Different Streaming Engines that powers higher level services to build stream application. )
App Developer
Business Analyst
Operations
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics powered by Druid and Superset
 What is Stream Insight?
 Provides a tool for business analysts to do descriptive analytics of the streaming data and
insights using a sophisticated UI provided by Superset
 Tooling to create time-series and real-time analytics dashboards, charts and graphs and
create rich customizable visualization of data
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM Architecture
Web server
(Jetty)
DB
SAM UI
Storage
Manager
Topology
actions
service
Topology DAG Builder
Topology Lifecycle
Manager
Storm
Runners
(translate SAM DAG
to Streaming Engine
topology)
Flink Spark
Flux
Deploy
DAG
Ambari
(cluster manager)
Streaming computation Engines
(Storm)
Service
Pools
REST
API
Environ
Service
Schema
Registry
SR
Client
Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology lifecycle
Initial
DAG
Constructed
Extra artifacts
set up
Deployed
Suspended
Deployment
Failed
Deploy
Kill
Suspend
Kill
Resume
Re-deploy
Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology DAG
Source
Processor 1
Processor 2
Sink 1
Stream 2
Edge
Stream 1
Stream 1
Stream 1
Sink 2
Fields: [
“a”: Int,
“b”:String
…
]
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - Topology Actions
Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - TopologyDAGVisitor
Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm runner example
Page27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SDK
Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Custom Processor - allows users to write their own business logic
Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Multi-lang support (upcoming)
Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 UADFs - compute aggregates within a window
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 UDFs - does simple transformations
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Notifier - sends notifications such as Email, SMS or more complex ones that can
invoke external APIs
Built in notifiers
 Email
 More in future…
Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The current release – 0.5
 Manual service pool registration not requiring Ambari
 Test mode to easily test out the streaming app
 Kerberos and delegation token based Authentication
 Authorization support with RBAC + permissions
 New sources, processors and sinks
Upcoming…
 Extending token based authentication for other components
 Support for state management in SAM
 Support for other streaming engines – Flink, Spark streaming
Page34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/streamline
 Apache incubation soon
 SAM 0.5 is out!
 https://groups.google.com/forum/#!forum/streamline-users
 Contributions are welcome!

More Related Content

What's hot

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopDataWorks Summit
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4DataWorks Summit
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseGregory Keys
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doHortonworks
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 

What's hot (20)

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we do
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 

Similar to Streaming analytics manager

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...DataWorks Summit
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerAbdelkrim Hadjidj
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appgvetticaden
 
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0lisanl
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209minseok kim
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP Technology
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018alanfgates
 

Similar to Streaming analytics manager (20)

Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209Pivotal cf for_devops_mkim_20141209
Pivotal cf for_devops_mkim_20141209
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application Development
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Streaming analytics manager

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager - SAM Sriharsha Chintalapani Arun Mahadevan
  • 2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Streaming at Hortonworks  Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)  First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)  Added several improvements & features into Apache Storm. Yahoo! Running 2400 nodes of Storm  Added Security and critical features/improvements to Apache Kafka  Lot of learnings from shipping Storm & Kafka from past 3 years  Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm & Kafka for past 3 years.
  • 3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Schema Registry  Streaming Applications usually fronts with a queue such as Kafka, Kinesis, EventHub etc..  Data in Messaging Queues are Byte payloads and there is no schema associated with it.  Streaming applications developers usually looks at the data flowing and defines their processing of that data  Any change to this data, schema wise, means developers have to update their code to process the new format  Support both programmatic schema creation and managed schemas
  • 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Schema Registry Kafka Kinesis EventHub ConsumerBytes Payload Bytes Payload Storm Spark Streaming Others… Producer
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Allows you to do event correlation, context enrichment, complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build complex streaming applications easily with minimum code
  • 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windowing, joins, and state management is hard  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application  Securing a streaming application cluster with the right configurations is a pain
  • 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  • 8. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  • 9. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight?  Provides a tool for business analysts to do descriptive analytics of the streaming data and insights using a sophisticated UI provided by Superset  Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  • 10. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 11. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 12. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture
  • 13. Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM Architecture Web server (Jetty) DB SAM UI Storage Manager Topology actions service Topology DAG Builder Topology Lifecycle Manager Storm Runners (translate SAM DAG to Streaming Engine topology) Flink Spark Flux Deploy DAG Ambari (cluster manager) Streaming computation Engines (Storm) Service Pools REST API Environ Service Schema Registry SR Client
  • 14. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Topology lifecycle Initial DAG Constructed Extra artifacts set up Deployed Suspended Deployment Failed Deploy Kill Suspend Kill Resume Re-deploy
  • 15. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Topology DAG Source Processor 1 Processor 2 Sink 1 Stream 2 Edge Stream 1 Stream 1 Stream 1 Sink 2 Fields: [ “a”: Int, “b”:String … ]
  • 16. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Runner implements - Topology Actions
  • 17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Runner implements - TopologyDAGVisitor
  • 18. Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm runner example
  • 19. Page27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SDK
  • 20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor - allows users to write their own business logic
  • 21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Multi-lang support (upcoming)
  • 22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UADFs - compute aggregates within a window Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  UDFs - does simple transformations Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notifier - sends notifications such as Email, SMS or more complex ones that can invoke external APIs Built in notifiers  Email  More in future…
  • 25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The current release – 0.5  Manual service pool registration not requiring Ambari  Test mode to easily test out the streaming app  Kerberos and delegation token based Authentication  Authorization support with RBAC + permissions  New sources, processors and sinks Upcoming…  Extending token based authentication for other components  Support for state management in SAM  Support for other streaming engines – Flink, Spark streaming
  • 26. Page34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.5 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!