2. NETWORK ANALYTICS IS A BIG DEAL
Business Agility
Virtualization Cloud Adoption
Operational
Simplicity
Application
Performance
3. THE OLD WAY LACKS TRANSPARENCY
User requests
data from
device
User driven, per-device
Low frequency and capacity data extraction
You need to know what you want to know
Limited visibility into virtual tunnels and paths
Network-centric approach to data collection
4. INTRODUCING THE CLOUD ANALYTICS
ENGINE
Open, standards based solution
Network tells you what you need to know
Automated, proactive, end-to-end
Visualize and correlate physical and virtual
Data collected streamed at wire rate
Cloud Analytics Engine
(CAE)
Enables application-centric view of intelligent network
5. CLOUD ANALYTICS ENGINE IS NOT JUST
ABOUT THE NETWORK
Analytics
DevOps
App/ Network
readiness
Operations
User
Experience
Apps
Developer
App
performance
Network
Admin
Network
performance
Co-ordinated troubleshooting and root cause analysis
Reduce IT service delivery time and costs
Improve efficiency of IT operations
7. CLOUD ANALYTICS ENGINE – SAMPLE USE
CASES
Application Visibility & Performance Management
Capacity Planning & Optimization
Troubleshooting and Root Cause Analysis
8. Application Visibility & Performance Management
CLOUD ANALYTICS ENGINE – DIVE INTO
USE CASES
Capacity Planning & Optimization
Troubleshooting and Root Cause Analysis
Control Application flows and Workload placement
Detect Hotspots, monitor Latency and Microbursts
Correlate Overlay and Underlay Network
9. JUNIPER CLOUD ANALYTICS ARCHITECTURE
COMPONENTS
Compute
Agent
(CA)
REST API
and Schema*
Network
Device
Agent*
(NDA)
Data
Learning
Engine*
(DLE)
Present on network device
Process Schema and install probes
REST API – Open API for 3rd party integration
Schema -Generic representation of data
to be collected
Application intelligence at the network edge
Visibility into underlay/ overlay
* Available in Phase2
Plug-in module interface to Network Director,
or 3rd. party tools (roadmap)
Aggregates and correlates collected data
10. CA + NDA + DLE (with Network Director)
CA only with REST/ Web API
CA and DLE
NDA only with Schema API
NDA and DLE
CAE DEPLOYMENT AND HOSTING
ENVIRONMENT
11. BENEFITS OF CLOUD ANALYTICS ENGINE
Correlate end to end network performance with application requirements
Transparency into physical and virtual layers for simpler operations
Improve co-ordination between teams for better application delivery and experience
Open Scalable
Partner
Ecosystem
13. Open API Open Schema
Data Center Network Infrastructure
ORCHESTRATION
Network DirectorDLE
HOW CLOUD ANALYTICS ENGINE COMPONENTS
WORK TOGETHER
QFX / EX Switches
JUNOS NDA Physical Host
with Hypervisor
CA
Physical Host
with Hypervisor
CA
QFX / EX Switches
JUNOS NDA
QFX / EX Switches
JUNOS NDA
14. USE CASE: PATH DETECTION & VISIBILITY
Provide integrated visibility into the actual physical network in use
NETWORK DIRECTOR
Compute Node A Compute Node B
Flow Paths
Red App: S1
S1
S2
S3
S4
S2 S4
Green App: S1 S3 S4
Blue App: S1
S3
S4
S2
REST Call to
Compute Agent
CA-B
CA-B
CA-B
Flow Latency
Red App:
S1 S2 S4
T+1 T+2 T+3
CA-B
T+4
End To End Latency: 4
Time stamp:
T+1
Time stamp:
T+2
Time stamp:
T+3
Time stamp:
T+4
15. S1
S2
S3
S4
USE CASE: PATH ATTRIBUTES
Data Recorded by Device Agent in OAM Reply
Compute Node A Compute Node B
Flow Paths
Red App: S1
S2
S3
S4
S2 S4
Green App: S1 S3 S4
Blue App: S1
S3
S4
S2
CA-B
CA-B
CA-B
• Timestamp of probe ingress and egress
• Per Hop Latency
• Ingress Interface
• Hash Computed Egress Interface
• Buffer and Queue Statistics
• Interface Error Statistics
• Bandwidth Utilization at Ingress and Egress
• ECMP Bucket Utilization
• CPU Utilization
• Memory Utilization
Network
Statistics
Host
Statistics
Analytics and Orchestration Layer*
* Openstack and Cloudstack integration - Roadmap item
16. Analytics and Orchestration Layer*
USE CASE: LATENCY CALCULATIONS
Provide Per Hop and End-to-End Latency per traffic Flow
Compute Node A Compute Node B
S1
S2
S3
S4
REST Call to
Compute Agent
S2
Flow Latency
Red App:
S1 S2 S4
T+1 T+2 T+3
CA-B
T+4
End To End Latency: 4
Time stamp:
T+1
Time stamp:
T+3
Time stamp:
T+4
Time stamp:
T+2
* Openstack and Cloudstack integration - Roadmap item
17. USE CASE: FLOW TO MICROBURST DETECTION
Correlate Microburst Detection with Flows that are affected
Compute Node A Compute Node B
Flow Paths
Red App: S1
S1
S2
S3
S4
S2 S4
Green App: S1 S3 S4
Blue App: S1
S3
S4
S2
CA-B
CA-B
CA-B
Analytics and Orchestration Layer*
Burst
REST Call to
Compute
Agent
AnalyticsD / Insight
Alert
Flow
Mappings
Microburst Alert
Burst Detected on S2 towards S4.
Apps Affected:
Red
Blue
* Openstack and Cloudstack integration - Roadmap item
18. Request
OVERLAY_INFO
Probe
USE CASE: OVERLAY / UNDERLAY CORRELATION
Provide integrated visibility into the actual physical network in use
Compute Node A Compute Node B
S1
S2
S3
S4
Analytics and Orchestration Layer*
S1
S2
S3
S4
VNI: Red
VNI: Blue
VNI: Green
VM 1 VM 2
VM 3 VM 4
VM 5
VM 6 VM 7
VM 8 VM 9
VM 10
Overlay Awareness
S1> show overlay tunnel vtep summary
VNI Red: VM1, VM2, VM6, VM7
VNI Blue: VM3, VM4, VM8, VM9
VNI Green: VM5, VM10
Overlay Awareness
S2> show overlay tunnel vtep summary
VNI Red: VM1, VM2, VM6, VM7
VNI Blue: VM3, VM4, VM8, VM9
Overlay Awareness
S3> show overlay tunnel vtep summary
VNI Blue: VM3, VM4, VM8, VM9
VNI Green: VM5, VM10
* Openstack and Cloudstack integration - Roadmap item
Editor's Notes
Virtualization and cloud adoption are pushing the need for business agility further. According to Gartner, more than 70% of all server workloads are now virtualized. The wave of virtualization is gradually sweeping across storage and networks and is also giving rise to new technologies such as application containers.
Amongst all this, is the network whose role is becoming more crucial to application, service delivery and business operations. A 2013 IT survey from IDC shows that 78% of respondents feel their networks are more critical to deliver applications than they were a year ago.
Agility depends on simplicity of operations and optimal application experience.
But the problem is that as the network itself is becoming virtualized, you now have overlays or layers of abstraction. In many cases, these layers are being added over other logical network partitions and all of this is being carved out and co-existing on a single shared physical infrastructure. Such layering and abstraction can result in a lack of transparency into the many multiple entities of the network and lead to operational complexity.
Increased virtualization places stringent performance demands on the underlying network infrastructure.
So, knowing what is happening in all those layers of the network is important – you need to be able to troubleshoot, maintain and operate your network based on the data that you get from it.
The problem though is that data collection has been traditionally user driven and collected from each individual device.
Collecting data from each device separately usually means that you do not get the network wide view of what is happening and miss out performance hits or failures that are happeming on a different device. Add to this the complexity of haing to monitor or troubleshoot multiple layers – physical and virtual on each device.
When use SNMP for example, you need to know ahead of time what data you need to collect and from where – it assumes that you know everything to ask it to do something. Such an approach was barely adequate when you only had network admins dealing with the network. It is also difficult to keep up with montioring given the limited visibility into virtual paths and overlay tunnels across the network.
With the rise of applications, DevOps and automation driven by virtualization, there are a lot more teams that want to extract optimal experience from the network. They want to get projects and processes rolled out quickly. The network needs to perform optimally to enable them but the data needed to tune to these performance levels that is basic to this is still very network centric.
The Cloud Analytics Engine (CAE) is a new solution that uses network data analysis to improve application performance and availability. It includes data collection, analysis, correlation and visualization, helping different operations teams better understand the behavior of workloads and applications across their physical and virtual infrastructure. A CAE agent actually sits on the bare metal server or virtual machine. So the data can be collected end to end (Note: network needs to made up of all Juniper devices). CAE is based on Junos functionality
The driver for CAE is the ability to reduce the time and expense associated with IT operations. This is for resolving and troubleshooting problems, as well as for proactive planning.
CAE provides an aggregated and detailed level of visibility, tying applications and the network together to quickly identify root cause.
With CAE, customers get an application-centric view of their network status, improving their ability to quickly roll out new applications and troubleshoot problems. This helps the business by reducing costs associated with manpower and downtime, as well as improving the end user application experience
CAE is a single integrated analytics solution, providing an application-centric view of network statistics, compared to other solutions that require third-party functionality to be pieced together
End to end Visibility of physical underlay with virtual overlay
Simpler operations from End to Visibility physical underlay with virtual overlay
Provide Network context to Application making it more relevant to other teams other than just networking teams. Solution is relevant to non-networking teams such as DevOps and app developers since this is an end-to-end VM<-VM or server to server view that is application centric, So the teams now have the flexibility of teams being able to either work together or independently of each other to accomplish assigned tasks.
It takes the frustration and lack of visibility away from network management and operations across the various layers. End to end visibility of overlay and underlay network, physical and virtual workloads (Juniper based network). The result is simpler, smoother operations
Increased business agility – now an opportunity for developers to tune applications based on a network that they understand but do not have to necessarily master
Openstack and Cloudstack integration roadmap item
Juniper provides the ability to visualize the data with Network Director; customers will have the flexibility to use other third-party visualization tools as well. Communication between these components is enabled through open, REST APIs and a published, extensible schema
These use cases rely on the Compute Agent functionality that will be available in Phase1
Collect network stats from an application point of view. This makes it easier to triage the problem- is it application or network related?
Build a data vault that consists of structured database from all the real-time data that is collected. This is useful for troubleshooting
NOTE: This slide goes a level deeper than the previous slide, uses animation
You can monitor Attributes associated with the application path at the edge (using the Compute Agent on a virtual machine).
You cab get the application path and path attributes on the physical network – parameters such as Ingress and egress interfaces, Hop statistics and time stamps
This helps you get visibility into the application flows and control how they flow and where the workloads are placed.
Detect microbursts and hotspots, monitor latency – Get a correlation between flows that have taken a performance hit with microbursts that are detected.
You can detect congestion events with the help of High-frequency measurements.
You can monitor end-to-end, per hop, and switch latencies
Correlate virtual overlay and physical underlay – You get end to end integrated visibility into the actual physical network paths along with overlay awareness. This helps you to troubleshoot and trace routes through both the physical and virtual layers of the network. So now you can see the physical paths taken by the virtual overlay tunnels between virtual machines.
CAE comprises a distributed set of components across the data center infrastructure, including compute and network:
• Compute Agent: A software agent that resides on the compute hosts (physical and hypervisor).
Schema – This is a Generic representation of what data is to be collected. It is structured as URI.
• Network Device Agent: This is an agent that resides on Junos-enabled devices but is Independent of the Junos release cycles. Initially, this will be supported on the QFX5100, with other platforms being planned and availability TBD. It processes the Schema
• Data Learning Engine (DLE): A centralized controller and aggregation point for analytics data. Initially, DLE functionality will be integrated with Junos Space Network Director
Open REST APIs that can interoperate with any database
Visualization: Juniper provides the ability to visualize the data with Network Director; customers will have the flexibility to use other third-party visualization tools as well.
Communication between these components is enabled through open, REST APIs and a published, extensible schema.
Decoupling from CLI/Hypervisor/Junos & HW
Industry Standardization
Network performance needs to become more transparent to not just the network admins/ team but also other teams such as DevOPs/ App developers (in other words LOB owners) also.
CAE is a simple to deploy, single integrated analytics solution, that can provide an application-centric view of network wide statistics
With CAE, customers can improve their ability to quickly roll out new applications and troubleshoot problems. CAE provides an aggregated and detailed level of visibility, tying applications and the network together to quickly identify root cause. Rather than teams that blame each other, CAE brings clarity by providing data collection, analysis, correlation and visualization, helping different operations teams better understand the behavior of workloads and applications across their physical and virtual infrastructure. This helps the business by reducing costs associated with manpower and downtime, as well as improving the end user application experience.
Schema complier and application logic reside on Compute Agent on VM
DLE – Data leearning Engine interfaces with Network Director and maintains communication with VM/ CA
Compute agent streams out data based on Schema and meta information
DLE pushes schema to network device