2. • Volume of network data into terabytes
• Siloed data limits ability to perform
correlation and causal analysis
• Relational databases limit the ability to
• Application of big data analytics to the
network dataset is key to providing both real-
time and historical insights
• Data science is driving the bifurcation of the
OSS stack
Network data is becoming a big data problem
3-fold increase
in total IP Traffic
>60% increase
in devices and
connections
Telemetry data
streamed in near
real-time
Source: Cisco VNI/GCI Global IP Traffic Forecast
3. What changes?
PMO FMO
Orientation Single domain Cross domain
Realisation Small data, tool driven Big data, data driven
Data collection Polled Streamed
Data aggregation and
analysis
Coupled Decoupled
Domain data schema Schema-on-write Schema-on-read
Analysis Prescriptive Prescriptive + Stochastic +
ML
Customisation Design time Run time
4. • Tight coupling of data
aggregation/store/
analysis
• Multiple analytics
pipelines implemented
from open source
components
• Common design
patterns ~75% of effort
wasted / duplicated
• Siloes limit the potential
of big data analytics and
lead to industry
divergence
Today’s siloed analytics pipelines
Telemetry
Metrics
Data sources
HDFS
Data store
Spark
Streaming
MapR
Data analysis
Hbase
Storm
Kafka
Streamsets
Data aggregation
Kafka
Impala
Query
Outputs
Dashboard &
ReportingNiFi
Logs
5. What is PNDA?
PNDA brings together a number of open source technologies to
provide a simple, scalable open big data analytics Platform for
Network Data Analytics
Linux Foundation Collaborative Project based on the Apache
ecosystem
6. • Simple, scalable open data
platform
• Provides a common set of
services for developing
analytics applications
• Accelerates the process of
developing big data analytics
applications whilst significantly
reducing the TCO
• PNDAprovides a platform for
convergence of network data
analytics
PNDA
PNDA
Plugins
ODL
Logstash
OpenBPM
pmacct
XR Telemetry
Real
-time
DataDistribution
File
Store
Platform Services: Installation, Mgmt,
Security, Data Privacy
App Packaging
and Mgmt
Stream
Batch
Processing
SQL
Query
OLAP
Cube
Search/
Lucene
NoSQL Time
Series
Data
Exploration
Metric
Visualisation
Event
Visualisation PNDA
Mnged App
PNDA
Mnged App
Unmnged
App
Unmnged
App
Query
Visualisation
and Exploration
PNDA
Applications
PNDA
Producer API
PNDA
Consumer API
7. • Horizontally scalable platform for
analytics and data processing
applications
• Support for near-real-time stream
processing and in-depth batch analysis
on massive datasets
• PNDA decouples data aggregation from
data analysis
• Consuming applications can be either
platform apps developed for PNDA or
client apps integrated with PNDA
• Client apps can use one of several
structured query interfaces or consume
streams directly.
• Leverages best current practise in big
data analytics
PNDA
PNDA
Plugins
ODL
Logstash
OpenBPM
pmacct
XR Telemetry
Real
-time
DataDistribution
File
Store
Platform Services: Installation, Mgmt,
Security, Data Privacy
App Packaging
and Mgmt
Stream
Batch
Processing
SQL
Query
OLAP
Cube
Search/
Lucene
NoSQL Time
Series
Data
Exploration
Metric
Visualisation
Event
Visualisation PNDA
Mnged App
PNDA
Mnged App
Unmnged
App
Unmnged
App
Query
Visualisation
and Exploration
PNDA
Applications
PNDA
Producer API
PNDA
Consumer API
8. Why PNDA?
There are a bewildering number of big data technologies out
there, so how do you decide what to use?
We've evaluated and chosen the best tools, based on technical
capability and community support.
PNDA combines them to streamline the process of developing
data processing applications.
10. Why PNDA?
Innovation in the big data space is extremely rapid, but combining
multiple technologies into an end-to-end solution can be
extremely complex and time-consuming
PNDA removes this complexity and allows you to focus on
developing the analytics applications, not on developing the
pipeline – significantly reducing the effort required and time-to-
insight
12. • Platform for data aggregation,
distribution, processing and storage
• Automated installation, creation, and
configuration
• Openstack, AWS and baremetal
• Typical install ~1hr
• Modular install
• Open producer and consumer APIs
• Avro platform schema
• Plugins for Logstash, pmacct,
OpenBMP, OpenDaylight, Cisco XR-
telemetry, bulk ingest …
• Data distribution – Apache Kafka
• Data store:
• Automated data partitioning and storage
(HDFS)
• OpenTSDB – time series analysis
• Hbase - NoSQL
• Support for batch and stream processing:
• Apache Spark and Spark Streaming
• Jupyter notebook server for app
prototyping and data exploration
• Impala-based SQL query support
• Grafana for time series visualisation
• PNDAapplication packaging
• PNDAmanagement and dashboard
PNDA 3.4 Capabilities
13. • The PNDA console
provides a dashboard
across all components
in a cluster
• Inbuilt platform test
agents verify the
operation of all
components
• Active platform testing
verifies the end-to-end
data pipeline
PNDA Console
14. • Ingested data should be
encapsulated in PNDA Avro
schema and published on a
pre-defined Kafka topic or set
of topics
Publishing Data to PNDA
17. BGP Analytics Pipeline
Open BPM
Collector
BGPBGP
BMP
Logstash
PNDA Cluster
Gobblin
HDFS
Kafka
Spark
OpenTSDB
BGP Data
Service
UI
Impala
18. PNDA Applied to NFV
Infrastruct
ure
Analytics
Data Aggregators
Open Data Platform (PNDA)
Analytics Applications
Open
Source
Custom Licensed
Alerts
Metrics
Telemetry
Logs
Data Sources
Inventory
Orchestration
NFVO
VNFM
VIM
NFVI
VNF
Data CenterCoreUser
State
Data
Access Aggregation
Related as loosely coupled
systems
Context
Network
Control
19. Convergence of network data analytics
Operational
Intelligence
Planning
Intelligence
Security
Intelligence