SlideShare una empresa de Scribd logo
1 de 33
Managing Open vSwitch 
Across a large heterogeneous fleet 
Andy Hill @andyhky 
Systems Engineer, Rackspace 
Joel Preas @joelintheory 
Systems Engineer, Rackspace
Some Definitions 
Large Fleet Heterogenous 
• Several different hardware 
manufacturers 
• Several XenServer major versions 
(sometimes on varying kernels) 
• Five hardware profiles 
• Six production public clouds 
• Six internal private clouds 
• Various non production environments 
• Tens of thousands of hosts 
• Hundreds of thousands of instances
Quick OVS Introduction
History 
• Rackspace used Open vSwitch since the pre 1.0 days 
• Behind most of First Generation Cloud Servers (Slicehost) 
• Powers 100% of Next Generation Cloud Servers 
• Upgraded OVS on Next Gen hypervisors 9 times over 2 
years
Upgrade Open vSwitch 
If you get nothing else from this talk, upgrade OVS!
Why upgrade? 
Reasons we upgraded: 
• Performance 
• Less impacting upgrades 
• NSX Controller version requirements 
• Nasty regression in 2.1 [96be8de] 
http://bit.do/OVS21Regression 
• Performance
Performance 
• Broadcast domain sizing 
• Special care in ingress broadcast flows 
• Craft flows to explicitly allow destined broadcast traffic
Performance 
• The Dark Ages (< 1.11) 
• Megaflows (>= 1.11) 
• Ludicrous Speed (>= 2.1)
The Dark Ages (< 1.11) 
• Flow-eviction-threshold = 2000 
• Single threaded 
• 12 point match for datapath flow 
• 8 upcall paths for datapath misses 
• Userspace hit per bridge (2x the lookups)
Megaflows (1.11+) 
• Wildcard matching on datapath 
• Less likely to hit flow-eviction-threshold 
• Some workloads still had issues 
• Most cases datapath flows cut in half or better
Ludicrous speed (2.1+) 
• RIP flow-eviction-threshold 
• 200000 datapath flows (configurable) 
• In the wild, we have seen over 72K datapath flows / 
260K pps
OVS 1.4 -> OVS 2.1 
Broadcast flows
Mission Accomplished! 
We moved the bottleneck! 
New bottlenecks: 
● Guest OS kernel configuration 
● Xen Netback/Netfront Driver
Upgrade, Upgrade, Upgrade 
If you package Open vSwitch, don’t leave 
your customers in The Dark Ages 
Open vSwitch 2.3 is LTS
Upgrade process 
• Ansible Driven (async - watch your SSH timeouts) 
• /etc/init.d/openvswitch force-reload-kmod 
• bonded <= 30 sec of data plane impact 
• non-bonded <=5 sec of data plane impact 
http://bit.do/ovsupgrade
Bridge Fail Modes 
Secure vs. Normal bridge fail mode 
Learning L2 switch, overriding default 
• Critical XenServer bug with Windows causing full host 
reboots (CTX140814) 
• Bridge fail mode change is a datapath impacting event 
• Fail modes do not persist across reboots in XenServer 
unless in bridge other-config
Patch Ports and Bridge Fail Modes 
• Misconfigured patch ports + ‘Normal’ Bridge Fail mode 
• Patches do not persist across reboots, cron.reboot to 
set up- no hypervisor hook available
Bridge Migration 
OVS Upgrades required all bridges to be secured 
1. Create new bridge 
2. Move VIFs from old bridge to new bridge (loss of a 
couple of packets) 
3. Upgrade OVS 
4. Ensure bridge fail mode change persists across reboot 
5. Clean up 
Entire process orchestrated with Ansible
Kernel Modules 
Running Kernel OVS Kernel Module Staged Kernel Reboot Outcome 
vABC OVSvABC None Everything’s Fine 
vABC OVSvABC vDEF No Networking 
vABC OVSvABC, OVSvDEF vDEF Everything’s Fine
Kernel Modules 
• Ensure proper OVS kernel modules are in place 
• Kernel Upgrade = OVS kernel module upgrade 
• More packaging work to do for heterogenous 
environment 
• Failure to do so can force a trip to a Java console
Other Challenges with OVS 
• Tied to old version because $REASON 
• VLAN Splinters/ovs-vlan-bug-workaround 
• Hypervisor Integration 
• Platforms: LXC, KVM, XenServer 5.5 and beyond
Measuring OVS 
PavlOVS sends these metrics to StatsD/graphite: 
• Per bridge byte_count, packet_count, flow_count 
• Instance count 
• ovs CPU utilization 
• Aggregate datapath flow_count, missed, hit, lost rates 
These are aggregated per region->cell->host 
Useful for DDoS detection (Graphite highestCurrent()) 
Scaling issues with Graphite/StatsD
OVS in Compute Host Lifecycle 
Ovsulate - Ansible Module that checks host into NVP/NSX controllers. Can 
fail if routes bad or the host certificate changes, i.e. a host is re-kicked. First 
made sure it failed explicitly, later added logic to delete existing on 
provisioning.
Monitoring OVS 
Connectivity to SDN controller 
• ovs-vsctl find manager is_connected=false 
• ovs-vsctl find controller is_connected=false 
SDN integration process (ovs-xapi-sync) 
• pgrep -f ovs-xapi-sync 
Routes
Monitoring OVS
Reboots 
Will the host networking survive a reboot? (kernel modules) 
http://bit.do/iwillsurvive
Monitoring OVS
Monitoring OVS 
XSA-108 - AKA Rebootpocalypse 2014 
• Incorrect kmods on reboot may require OOB access to fix! 
• Had monitoring in place 
• Pre-flight check for KMods just in case
Questions? 
THANK YOU 
RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM 
© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Más contenido relacionado

La actualidad más candente

OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
LF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitchLF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitchLF_OpenvSwitch
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 
LF_OVS_17_State of the OVN
LF_OVS_17_State of the OVNLF_OVS_17_State of the OVN
LF_OVS_17_State of the OVNLF_OpenvSwitch
 
Open stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_diveOpen stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_diveyfauser
 
Automating linux network performance testing
Automating linux network performance testingAutomating linux network performance testing
Automating linux network performance testingAntonio Ojea Garcia
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitchSim Janghoon
 
OpenStack networking
OpenStack networkingOpenStack networking
OpenStack networkingSim Janghoon
 
Osdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauseryfauser
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, greSim Janghoon
 
Troubleshooting Tracebacks
Troubleshooting TracebacksTroubleshooting Tracebacks
Troubleshooting TracebacksJames Denton
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch IntroductionHungWei Chiu
 
Open Networking for Your OpenStack
Open Networking for Your OpenStackOpen Networking for Your OpenStack
Open Networking for Your OpenStackCumulus Networks
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerJérôme Petazzoni
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 

La actualidad más candente (20)

OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
LF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitchLF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitch
 
OVS-NFV Tutorial
OVS-NFV TutorialOVS-NFV Tutorial
OVS-NFV Tutorial
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
LF_OVS_17_State of the OVN
LF_OVS_17_State of the OVNLF_OVS_17_State of the OVN
LF_OVS_17_State of the OVN
 
Open stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_diveOpen stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_dive
 
Automating linux network performance testing
Automating linux network performance testingAutomating linux network performance testing
Automating linux network performance testing
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
 
OpenStack networking
OpenStack networkingOpenStack networking
OpenStack networking
 
Geneve
GeneveGeneve
Geneve
 
Osdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauser
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, gre
 
Troubleshooting Tracebacks
Troubleshooting TracebacksTroubleshooting Tracebacks
Troubleshooting Tracebacks
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch Introduction
 
Open Networking for Your OpenStack
Open Networking for Your OpenStackOpen Networking for Your OpenStack
Open Networking for Your OpenStack
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
Kubernetes Intro
Kubernetes IntroKubernetes Intro
Kubernetes Intro
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 

Destacado

DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer Khubaib Mahar
 
Upcoming internet challenges
Upcoming internet challengesUpcoming internet challenges
Upcoming internet challengesIvan Pepelnjak
 
OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07Nicolas (Nick) Barcet
 
Module 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGModule 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGTail-f Systems
 
Open daylight and Openstack
Open daylight and OpenstackOpen daylight and Openstack
Open daylight and OpenstackDave Neary
 
Module 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionModule 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionTail-f Systems
 
SDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininetSDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininetSAMeh Zaghloul
 
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerUnder the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerThe Linux Foundation
 
Openstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNsOpenstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNsThomas Morin
 
NAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutesNAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutesIvan Pepelnjak
 
DEVNET-1006 Getting Started with OpenDayLight
DEVNET-1006	Getting Started with OpenDayLightDEVNET-1006	Getting Started with OpenDayLight
DEVNET-1006 Getting Started with OpenDayLightCisco DevNet
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch YongKi Kim
 

Destacado (14)

DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer
 
Upcoming internet challenges
Upcoming internet challengesUpcoming internet challenges
Upcoming internet challenges
 
OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07
 
Module 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGModule 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANG
 
Open daylight and Openstack
Open daylight and OpenstackOpen daylight and Openstack
Open daylight and Openstack
 
Module 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionModule 1: ConfD Technical Introduction
Module 1: ConfD Technical Introduction
 
SDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininetSDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininet
 
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerUnder the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
 
Openstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNsOpenstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNs
 
NAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutesNAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutes
 
DEVNET-1006 Getting Started with OpenDayLight
DEVNET-1006	Getting Started with OpenDayLightDEVNET-1006	Getting Started with OpenDayLight
DEVNET-1006 Getting Started with OpenDayLight
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
 
NETCONF YANG tutorial
NETCONF YANG tutorialNETCONF YANG tutorial
NETCONF YANG tutorial
 

Similar a Managing Open vSwitch Across a Large Heterogenous Fleet

Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)andyhky
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013Jun Park
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Community
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld
 
OVN DBs HA with scale test
OVN DBs HA with scale testOVN DBs HA with scale test
OVN DBs HA with scale testAliasgar Ginwala
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld
 
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OpenvSwitch
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
VIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkVIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkYousef Morcos
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Vietnam Open Infrastructure User Group
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kiloSteven Li
 
Presentation oracle rac on vsphere 5
Presentation   oracle rac on vsphere 5Presentation   oracle rac on vsphere 5
Presentation oracle rac on vsphere 5solarisyourep
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld
 

Similar a Managing Open vSwitch Across a Large Heterogenous Fleet (20)

Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed Switch
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
OVN DBs HA with scale test
OVN DBs HA with scale testOVN DBs HA with scale test
OVN DBs HA with scale test
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
 
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
VIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkVIO on Cisco UCS and Network
VIO on Cisco UCS and Network
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
Presentation oracle rac on vsphere 5
Presentation   oracle rac on vsphere 5Presentation   oracle rac on vsphere 5
Presentation oracle rac on vsphere 5
 
OpenStack and Windows
OpenStack and WindowsOpenStack and Windows
OpenStack and Windows
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Managing Open vSwitch Across a Large Heterogenous Fleet

  • 1. Managing Open vSwitch Across a large heterogeneous fleet Andy Hill @andyhky Systems Engineer, Rackspace Joel Preas @joelintheory Systems Engineer, Rackspace
  • 2. Some Definitions Large Fleet Heterogenous • Several different hardware manufacturers • Several XenServer major versions (sometimes on varying kernels) • Five hardware profiles • Six production public clouds • Six internal private clouds • Various non production environments • Tens of thousands of hosts • Hundreds of thousands of instances
  • 4. History • Rackspace used Open vSwitch since the pre 1.0 days • Behind most of First Generation Cloud Servers (Slicehost) • Powers 100% of Next Generation Cloud Servers • Upgraded OVS on Next Gen hypervisors 9 times over 2 years
  • 5. Upgrade Open vSwitch If you get nothing else from this talk, upgrade OVS!
  • 6. Why upgrade? Reasons we upgraded: • Performance • Less impacting upgrades • NSX Controller version requirements • Nasty regression in 2.1 [96be8de] http://bit.do/OVS21Regression • Performance
  • 7. Performance • Broadcast domain sizing • Special care in ingress broadcast flows • Craft flows to explicitly allow destined broadcast traffic
  • 8.
  • 9.
  • 10. Performance • The Dark Ages (< 1.11) • Megaflows (>= 1.11) • Ludicrous Speed (>= 2.1)
  • 11. The Dark Ages (< 1.11) • Flow-eviction-threshold = 2000 • Single threaded • 12 point match for datapath flow • 8 upcall paths for datapath misses • Userspace hit per bridge (2x the lookups)
  • 12. Megaflows (1.11+) • Wildcard matching on datapath • Less likely to hit flow-eviction-threshold • Some workloads still had issues • Most cases datapath flows cut in half or better
  • 13.
  • 14. Ludicrous speed (2.1+) • RIP flow-eviction-threshold • 200000 datapath flows (configurable) • In the wild, we have seen over 72K datapath flows / 260K pps
  • 15.
  • 16. OVS 1.4 -> OVS 2.1 Broadcast flows
  • 17. Mission Accomplished! We moved the bottleneck! New bottlenecks: ● Guest OS kernel configuration ● Xen Netback/Netfront Driver
  • 18. Upgrade, Upgrade, Upgrade If you package Open vSwitch, don’t leave your customers in The Dark Ages Open vSwitch 2.3 is LTS
  • 19. Upgrade process • Ansible Driven (async - watch your SSH timeouts) • /etc/init.d/openvswitch force-reload-kmod • bonded <= 30 sec of data plane impact • non-bonded <=5 sec of data plane impact http://bit.do/ovsupgrade
  • 20. Bridge Fail Modes Secure vs. Normal bridge fail mode Learning L2 switch, overriding default • Critical XenServer bug with Windows causing full host reboots (CTX140814) • Bridge fail mode change is a datapath impacting event • Fail modes do not persist across reboots in XenServer unless in bridge other-config
  • 21. Patch Ports and Bridge Fail Modes • Misconfigured patch ports + ‘Normal’ Bridge Fail mode • Patches do not persist across reboots, cron.reboot to set up- no hypervisor hook available
  • 22. Bridge Migration OVS Upgrades required all bridges to be secured 1. Create new bridge 2. Move VIFs from old bridge to new bridge (loss of a couple of packets) 3. Upgrade OVS 4. Ensure bridge fail mode change persists across reboot 5. Clean up Entire process orchestrated with Ansible
  • 23. Kernel Modules Running Kernel OVS Kernel Module Staged Kernel Reboot Outcome vABC OVSvABC None Everything’s Fine vABC OVSvABC vDEF No Networking vABC OVSvABC, OVSvDEF vDEF Everything’s Fine
  • 24. Kernel Modules • Ensure proper OVS kernel modules are in place • Kernel Upgrade = OVS kernel module upgrade • More packaging work to do for heterogenous environment • Failure to do so can force a trip to a Java console
  • 25. Other Challenges with OVS • Tied to old version because $REASON • VLAN Splinters/ovs-vlan-bug-workaround • Hypervisor Integration • Platforms: LXC, KVM, XenServer 5.5 and beyond
  • 26. Measuring OVS PavlOVS sends these metrics to StatsD/graphite: • Per bridge byte_count, packet_count, flow_count • Instance count • ovs CPU utilization • Aggregate datapath flow_count, missed, hit, lost rates These are aggregated per region->cell->host Useful for DDoS detection (Graphite highestCurrent()) Scaling issues with Graphite/StatsD
  • 27. OVS in Compute Host Lifecycle Ovsulate - Ansible Module that checks host into NVP/NSX controllers. Can fail if routes bad or the host certificate changes, i.e. a host is re-kicked. First made sure it failed explicitly, later added logic to delete existing on provisioning.
  • 28. Monitoring OVS Connectivity to SDN controller • ovs-vsctl find manager is_connected=false • ovs-vsctl find controller is_connected=false SDN integration process (ovs-xapi-sync) • pgrep -f ovs-xapi-sync Routes
  • 30. Reboots Will the host networking survive a reboot? (kernel modules) http://bit.do/iwillsurvive
  • 32. Monitoring OVS XSA-108 - AKA Rebootpocalypse 2014 • Incorrect kmods on reboot may require OOB access to fix! • Had monitoring in place • Pre-flight check for KMods just in case
  • 33. Questions? THANK YOU RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM © RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Notas del editor

  1. Worth mentioning the # of kernel versions?