The BonFIRE architecture was presented at the TridentCom Conference. These are the slides for the paper, which describes the key components and principles of the architecture and also some specific features offered to experimenter that are not available elsewhere.
How to Troubleshoot Apps for the Modern Connected Worker
BonFIRE TridentCom presentation
1. Building service testbeds on FIRE
BonFIRE: A Multi-cloud Test Facility for
Internet of Services Experimentation
Ally Hume, EPCC, University of Edinburgh
Tridentcom 2012
2. BonFIRE Project
The BonFIRE project is designing, building and operating
a multi-cloud facility to support research across
applications, services and systems targeting services
research community on Future Internet.
2 Tridentcom 2012 BonFIRE 2
3. Three Scenarios
• Multi-site clouds connected through normal internet
• Cloud scenario with emulated network
• Extended Cloud scenario with controlled network (implies
federation with network facility)
3 Tridentcom 2012 BonFIRE 3
5. Example experiment categories
1. Service applications experiments (non-cloud)
– Including distributed peer to peer applications
2. Cloud (or multi-cloud) applications
– e.g. cloud busting scenarios with private and public clouds.
3. Cloud infrastructure experiments
– e.g. schedulers on top of BonFIRE, new live-migration strategies,
contention minimisation
4. Application-and-network experiments
– Experimenters configuring network to support applications
– Bandwidth on demand
– Application specific routing
5 Tridentcom 2012 BonFIRE 5
7. Observability
• To understand the behaviour of the system you need to
see inside the abstractions normally provided by clouds.
7 Tridentcom 2012 BonFIRE 7
8. Application, VM and Physical
machine metrics
Physical Machine
Virtual Virtual Virtual
Machine Machine Machine
8 Tridentcom 2012 BonFIRE 8
11. Experiments supported
• Passive use of infrastructure metrics
– Understanding of contention and its impact on:
• Application performance
• General VM performance
• Active use of infrastructure metrics:
– by a cloud scheduler to co-locate VMs to minimise contention
• e.g. by collecting statistics of the previous usage patterns of an
image
– by a cloud application to react to the contention currently in the
system
11 Tridentcom 2012 BonFIRE 11
12. Control
• Observing is good but control is even better.
12 Tridentcom 2012 BonFIRE 12
13. Exclusive physical machines and
controlled placement
INRIA
node1 node2 node3
node 5 node6 node7
Location: inria
Location: inria
Cluster: 34563
Cluster: 34563
Host: p40 node8 node9
Resource Open
Manager Nebula
Give me
3 physical
Cluster: 34563
machines
from p23
p40
26/4/12 10:00 to
28/4/12 20:00
Reservation
System
Cluster: 34563 p92
P23
P40
p92
13 Tridentcom 2012 BonFIRE 13
14. Supported Experiments
• Exclusive use of physical machine and controlled
placement
– Supports elimination of unwanted contention
– Increases experiment repeatability
– Supports the implementation of controlled contention
• possible via common contention templates
Physical Machine
Contention
Template 1
Contention
VM under Template 2
Contention Contention
test generator generator
14 Tridentcom 2012 BonFIRE 14
15. Supported Experiments
• Exclusive use of physical machine and controlled
placement
– Experiments using external schedulers
MyExclusiveCluster
Application
BonFIRE PM1 PM2 PM3
Scheduler
Resource
(External to
Manager
BonFIRE)
Instance type: lite Instance type : lite
Location: inria
Cluster:
MyExclusiveCluster
Host: PM2
15 Tridentcom 2012 BonFIRE 15
16. Controlled (EMULAB) networks with
Virtual Wall
high
bandwidth
server
connection
last mile
client The internet
connection
Software Network B
Software Network B
RouterBandwidth: 1 Gbps
Bandwidth: 1 Gbps
Router Server 1
1
Latency: 0
Latency: 0
Loss rate: 0
rate: 0
Network C
Network C Software
Software Network D
Network D Traffic: 100Mbps
Traffic: 100Mbps
Client Bandwidth: 100 Mbps Router
Client Bandwidth: 100 Mbps Router Bandwidth :: 1Gbps
Bandwidth 1Gbps
Latency: 10ms
Latency: 10ms Latency: 100ms
Latency: 100ms
Loss rate: 1%
Loss rate: 1% Loss rate: 20%
Loss rate: 20%
Traffic: 10 Mbps
Traffic: 10 Mbps Traffic: 200 Mbps
Traffic: 200 Mbps
Network A
Network A
Software
Software Bandwidth: 1 Gpbs
Bandwidth: 1 Gpbs Server 2
2
Router
Router
Latency: 0
Latency:
Loss rate: 0
Loss rate: 0
Traffic: 250Mbps
Traffic: 250Mbps
16 Tridentcom 2012 BonFIRE 16
17. Supported Experiments
• Controlled network quality of service
– Testing streaming applications
– Testing Internet of Services distributed applications
– Multi-cloud applications
• Experiments into how bandwidth on demand services could be used by cloud
applications
Hybrid
Cloud
Public
Private Cloud
Cloud
17 Tridentcom 2012 BonFIRE 17
18. FEDERICA offering
• FEDERICA is an e-Infrastructure based on virtualization of compute
and network resources
• BonFIRE experimenters will be able to request slices of network
resources and are able to:
– Select network topology
– Decide IP Configuration
– Configure static routes and dynamic routing protocols (OSPF, BGP)
FEDERICA FEDERICA
Server Server
FEDERICA FEDERICA
Router Router
BonFIRE cloud site BonFIRE cloud site
Controlled
Network
FEDERICA
FEDERICA Server
Router
18 Tridentcom 2012 BonFIRE 18
19. FEDERICA interconnection
• We will have:
– A pair of BonFIRE sites connected
to FEDERICA PoPs
– Access to resources from four
FEDERICA PoPs.
• BonFIRE implementing SFA
adaptor to request FEDERICA
slices.
– NOVI is developing an SFA interface
on top of FEDERICA Web Services.
– GÉANT3 will deploy the NOVI SFA
interface.
– We are using NOVI to test the SFA
adaptor.
19 Tridentcom 2012 BonFIRE 19
20. Integration with AutoBAHN
EPCC PSNC
• Integrate BonFIRE with the GÉANT BonFIRE BonFIRE
Bandwidth-on-Demand interfaces Site Site
(AutoBAHN)
• This will allows network-aware
experiments with requirements
for guaranteed bandwidth
• Control
• Future service
• Why AutoBAHN?
– Reference BoD system for European NRENs and GÉANT
– Most mature solution in terms of specification, implementation and
deployment in the multi-domain environment interconnecting some of
the sites
– Handles both on demand and advance requests
20 Tridentcom 2012 BonFIRE 20
22. Advanced features
• Vertical elasticity
– Scale up or down a virtual machine
– For some applications scale up may be better than scale out
• video transcoding
• Bandwidth on demand services
– As well as control tool BoD services can also be used by network-
aware applications
• Application and network
– e.g. OpenFlow for application specific routing
22 Tridentcom 2012 BonFIRE 22
23. Architecture
Monitoring
dashboard GUI
Portal
Identity
LDAP
Server
(Used by Portal,
API Experiment
Manager , Resource
Experiment Manager Manager and
Testbeds)
OCCI Monitoring Scheduling Reservation Accounting
ResourceManager Message
Read/
Write Queue
OCCI Scheduling Reservation Accounting
Enactor
OCCI Scheduling Reservation Accounting
SSH Key
Testbed BonFIRE internal interface or API
Interface or API accessible by
X X
SSH Monitoring Monitoring
end users, user-agents or the
layer above
accessible only by the layer above
SSH
Gateway GUI API
BonFIRE internal API accessible
VM
Gateway (Monitoring VM VM VM
BonFIRE virtual machine X
from multiple layers of the
architecture or user agents
Aggregator) running on BONFIRE VMs
Layer of the Component of BonFIRE
BonFIRE architecture used by multiple
architecture layers
23 Tridentcom 2012 BonFIRE 23
24. SFA Adaptor implementation
• OCCI: one request per
each resource.
• Rspec: one single request
per experiment (including
all resources).
• Extensions:
– Add an Enactor DB to batch
the OCCI requests belonging
to one experiment.
– Add a SFAAdaptorEndPoint
to convert and map the OCCI
requests to a RSpec request.
– SFA dialog with the
Aggregate Manager (AM).
24 Tridentcom 2012 BonFIRE 24
25. Summary
• Multi-cloud test facility for services community
• Focus on:
– Observability
– Control
– Advanced Features
– Usability
• Supports a wide range of experimenters
• Not targeting network experimenters but network is key
part of experiments
• Uses a broker architecture for federation
• Being used by open call experiments
• Open for general use by around October 2012
25 Tridentcom 2012 BonFIRE 25
Extended cloud scenario : It consists of using four BonFIRE computing sites interconnected through today’s public internet. Cloud with a controlled experimental network scenario. This scenario consists of using nodes connected over a user-controlled emulated virtual internet. This second scenario is supported by the Virtual Wall environment enriched with Cloud capabilities. Extended cloud with complex network implications scenario. This consists of using two cloud sites interconnected through a controlled network that allows studying the implications of the network behavior on services and clouds, and experiment with new solutions and approaches. This scenario implies a vertical federation with a cloud-to-network interface. If we implement 3 rd scenario very powerful - To allow analysing CROSS CUTTING issues – network and service level aspects – in the same test ---- Scenario 1 - Extended Cloud Federation of cloud sites using normal Internet Scenario 2 - Cloud with emulated and controlled network Network emulation environment for cross-cutting tests Scenario 3 - Federation with other external facilities Potential use of Federica network slices (network interactions / experimental internet) Connection to Open Cirrus or other cloud infrastructures ---
Skip the cluster VM, do only the physical host one.