3. CPE Overview
• CPE Charter
– Consolidated cloud infrastructure that offers platform services
for Symantec cloud applications
• Symantec Cloud Infrastructure already operating at
scale
– Compute – Reputation based security
– Storage – Consumer and Enterprise backup
– Network – Hosted email security
• How do we leverage the best practices/insights from
operating at scale to the new platform?
• Core objectives
– Secure, scalable and reliable OpenStack based cloud platform
4. Core Services
CPE Platform Architecture
2
Compute Networking Storage
CLIs ScriptsCloud Applications
Big Data Messaging
Identity &
Access
(Keystone)
Supporting
Services
Authn
Roles
User Mgmt
Tenancy
Quotas
Logging
Metering
Monitoring
Deployment
Compute
(Nova)
Image
(Glance)
SDN
(Neutron)
Load
Balancing
DNS SQL
Batch
Analytics
Stream
Processing
Msg Queue
Mem Cache
Email Relay
SSL
K/V Store
Web Portal
Object Store
REST/JSON API
Cloud Platform Engineering (CPE)
5. CPE Reference App #1 - Log
Collection service
CPE Cloud
Object Store (Swift)
Compute
VM0
VM1
LB
Container
DNS queries
Keystone
Authentication
Log Collection AppLog Sources
(e.g security metadata,
install logs, telemetery)
1 Acquire an authentication token
2
Create two VMs, associate a
network and start them using a
CentOS image
3
Create a LB endpoint, place the
two VMs in it and configure a
DNS entry
4
Provision a container in the
Object store
5
Deploy and start the flask
application
6 Fetch log files from Object store
6. Problem Statement
• Cloud infrastructure at scale is a highly dynamic
environment
– Diversity of cloud workloads
• Cannot predict application behaviors and patterns
– Addition and removal of resources (machines, network
equipment etc.)
– Configuration drift over a period of time
– External events causing huge variations in network, compute
and storage consumption
– Stability issues occur when you cross scale boundaries (jump
an order of magnitude)
• Key Question – What validation tools/frameworks do
we need to identify issues at scale and remediate
them?
7. What capabilities do we need in a validation
framework?
• Ability to test generic REST/JSON endpoints (services)
– Including OpenStack and platform services
• Ability to quickly create tests for functionality, stability
and performance
– Should not be burdensome for developers
• Ability to customize/extend test conditions and/or verification functions
• Independent channel of verification
– Higher order verification
• E.g Just don’t check for return status from individual services, but verify end-
to-end function
– Extensible, pluggable design
• Provide continuous visibility into the health and
performance of production cloud
– Proactively monitor transient and persistent errors
9. Symantec Cloud Test Framework (SCTF)
• What is SCTF?
– A set of python libraries, scripts and simple text files (YAML)
that facilitate the validation of a cloud infrastructure
– Primitives for expressing REST requests and validating
responses
Built in exec
function
Test
Command
Validation
condition
10. How to run SCTF?
Input YAML
file
Test case
name
Validation
summary
11. SCTF Usage – Simple web request
Built in Web
service
function
Request URL
and Method
Response
Code
12. SCTF Usage – Reusable Primitives
Test Procedure
Name
Variable
definitions
Test case
definition
13. SCTF Usage – Independent channel of verification
Built in exec
function
started after
VM create
ssh
command
line
Retry args
15. SCTF Roadmap
• Stream files– enable large file downloads
• Test Runner – execute all test files in a directory hierarchy
• Preserve comments – retain comments after programmatic
manipulation
• Improve error reporting - make stack traces and error
reporting more descriptive
• Incorporate salt to allow remote execution and job
management
• Allow tests to be run in parallel multiple ( possible ways )
– Use pykka ( https://github.com/jodal/pykka ) for actors in single
process
– Call out to julia ( http://julialang.org/ ) and use the parallel
facilities
16. SCTF Roadmap – Cont’d
•Allow test results to be written to files and databases.
•Allow test documentation to be queried.
•Determine why the test failed
– Diagnosis
– Remediation
– Validate remediation
•Add timing and meta data to test output.
•Performance as test criteria
•Add extension type to allow type handlers to be added
at run-time
17. Summary/Conclusion
• We plan use SCTF as a primary means of functional
and performance validation
– Enable continuous monitoring of the stability and performance
of the CPE cloud
– Ability to associate diagnosis and remediation with failing
functional tests
– Scale the ability to generate tests along with the cloud
– Enable shorter mean time to resolution
• Planning to collaborate with other similar open source
projects
• Our primary motivation is to ensure the stability of an
OpenStack based cloud when deployed at scale