2024: Domino Containers - The Next Step. News from the Domino Container commu...
How a Cloud Computing Provider Reached the Holy Grail of Visibility
1. SPO3378
How a Cloud Computing
Provider Reached the
Holy Grail of Visibility
Elad Gotfrid, CloudShare
Leena Joshi, Splunk Inc
#vmworldsponsor
2. How A Cloud Computing Provider
Reached the Holy Grail of Visibility
SPO3378
Elad Gotfrid
Director of IT @ CloudShare
Leena Joshi
Director, Solutions Marketing, Splunk
CONFIDENTIAL
3. Company Overview
About:
Headquartered in San Mateo, CA
Founded in 2007
70,000+ users worldwide
Backed by leading VCs:
Sequoia, CRV, Globespan, Gemini
The Leading Cloud for Pre-Production
Focus on Dev/Test/Pre Production Segment
Many Fortune 500 customers including:
McAfee, HP, SAP, Cisco, Dell , Microsoft , IBM , Juniper
40% of Microsoft SharePoint MVPs and MCMs already adopted
CloudShare for development, testing and training
3 | CONFIDENTIAL
4. Company Platform Benefits
CloudShare IAAS (infrastructure as a service) platform grants each
customer his own private multi-VM networked environment including
compute resources, networking, IP, Preinstalled OS.
4 | CONFIDENTIAL
5. CloudShare Operations Overview
CloudShare platform is designed to handle high load:
Running 150,000 Customer Virtual Machines per month
During peak hours our system perform ~500 VM Resume/Suspend
operations in an hour
Robust dynamic assignment of infrastructure
resources including:
ESX Server
Storage units
Firewall
Switches
VLANs
Public IPs
5 | CONFIDENTIAL
6. CloudShare Custom Cloud
CloudShare uses its own patent pending Backend
“private cloud” system designed to handle all virtual
machine and datacenter life cycle:
Environments operation
Environment lifecycle
Self healing & Error correction
Resource management
Manage large scale infrastructure:
15 VMware Virtual Centers
20 storage units
Hundreds of switch ports/Gateway configuration
6 | CONFIDENTIAL
7. IT/Operations Challenges
Looking for a centralized console for complete IT/Operations visibility
Business Requirements:
• Aggregate all IT/Infrastructure data into a single console
Data
Aggregation
• Correlate business data with performance/application data
Data
Correlation
• Analyze and search the data
Data • Find patterns and correlation between events
Analysis
7 | CONFIDENTIAL
8. The Trick Is Finding a Way to Interact
8 | CONFIDENTIAL
9. Enter Splunk
Evaluated Splunk for a narrow use initially
Quickly realized it could do a lot more
Eventually standardized on it
9 | CONFIDENTIAL
15. Splunk Adoption
Splunk adoption was IT and R&D driven:
From hundreds of daily e-mail alerts to few actionable email alerts
Massive use in QA for finding anomalies and issues
Dashboards for:
Performance trends
Current system status
Capacity planning
Root cause analysis
Business metrics
Viral adoption within the organization.
From DevOps to IT, R&D, Marketing and Management
15 | CONFIDENTIAL
16. Splunk As a Data Aggregator
App IIS
Google
VMware
Docs
Backend SQL data
Network/
Storage
GWs/FW
API Actions Incident Salesforce
Management
16 | CONFIDENTIAL
17. Splunk As a Central Platform in CloudShare
Support/NOC: IT/Ops: R&D: Management: Marketing:
• Performance • Capacity • Debug / Error • SLA • Tracking – Visits,
data IT
Planning Operations logsSecurity Compliance Leads, Deals,
• BI (Cohort Usage patterns
• System Alerts • Performance • Health analysis,
Support/
Monitoring measurements dashboards) • Qualifying leads
NOC:
• • System
Perform Usage • A/B Testing
ance
Data • Logs
• System
Alerts
Developer Framework
17 | CONFIDENTIAL
18. Splunk Provides Operational Intelligence
Allows CloudShare to correlate the business data (Users, Usage)
with the IT/Infrastructure data
Examples :
Understand how much resources each customer consumes
(CPU, Memory, Network, etc…) and when
Customer can have more than 1 VM or environment, Splunk helps
us aggregate the data easily and look at the customer level usage
18 | CONFIDENTIAL
19. Splunk Dashboards
Management Dashboard – full visibility to business critical Metrics
SLA Dashboards
- Measure service level
- Analyze and present
statistics according to
business guidelines
Capacity Planning
- High Level status for
management on capacity
- True visibility into
operational data
19 | CONFIDENTIAL
20. Splunk Dashboards
Dashboard for high utilization storage consumers:
All storage related data is collected by splunk
List of number of IOPS per business unit or customer
We can easily identify our top storage consumer
20 | CONFIDENTIAL
21. Splunk Dashboards
Dashboard for storage latency :
All VMware storage related data is collected by splunk
We can easily identify poor performing storage unit
Latency is calculated when a 20ms threshold was breached
21 | CONFIDENTIAL
22. Splunk Dashboards
High Level Dashboard of system usage (used by the NOC/Support)
22 | CONFIDENTIAL
23. Splunk Dashboards
High Level Dashboard of Network usage (used by the IT/Network Ops)
Show drilldown per user on number of connections, packets, traffic
Full visibility on traffic usage and patterns per customer
23 | CONFIDENTIAL
24. BI usage for Marketing and Management
Tracking of conversion:
Visit to Lead
Lead to Deal
Lead Qualification
A/B Testing
Churn analysis
Cohort Analysis
User engagement score
Feature usage
Customers usage patterns
24 | CONFIDENTIAL
25. Splunk App for VMware
Active participants in the beta group for Splunk app
for VMware
Splunk app collect data from ESX,VC including :
ESX Logs, VC Logs
Performance
Tasks/Events
Inventory
Topology
Collect metrics from the host ESX/ESXi servers at a low level
of granularity (20 second granularity)
25 | CONFIDENTIAL
26. Splunk App for VMware #2
Collecting VMware performance data in large scale
is a “Big Data” problem :
50,000,000 events per day
~2 million events per hour
Five dedicated Splunk FA (data forwarder appliance)
are used to gather all data in real time
Forwarder Real Time Data
ESX Splunk
Appliance Dashboard
26 | CONFIDENTIAL
27. Splunk Data Statistics
How Splunk handles CloudShare “Big Data”
Total of 6,000,000,000 events stored in
splunk datastore
90,000,000 events per day
~3.5 million events per hour
CloudShare deployed Splunk scale out architecture :
2 Indexers
1 Search Head
32GB RAM per server
1.5TB (total DB size)
27 | CONFIDENTIAL
28. Best Practices Deploying Splunk
For Your Cloud Environments
Splunk App for VMware:
Take in consideration the high amount of data each ESX generate
Properly size the FA hardware (CPU , Memory)
Add more engine process to each FA if needed
Consider creating a dedicated indexer for the VMware data in order to
reduce the load
Storage monitoring
Let splunk collect both physical storage latency (from storage disks) and VMware
vDisk latency to better understand and get root cause on latency problems
In a linked clone environment consider monitoring both volume level and vDisk level
latency in order to understand if the problem is on the master disk or the clones
Network monitoring
Monitor network traffic for anomalies like high rate of open connection
Generate real time search in order to react quickly and shutdown abuse
28 | CONFIDENTIAL
29. Summary
For Cloudshare, Splunk is our platform for operational
intelligence
Once all data is placed in a central repository its very easy to
correlate events and understand patterns
Use Splunk dashboards to create visibility in to other groups in
the organization
29 | CONFIDENTIAL
30. Experiment with Splunk
All users joining the session will receive 2 weeks trial account in
CloudShare with a dedicated pre- installed and fully configured
Splunk environment
To get access to your dedicated environment browse to:
http://tinyurl.com/SplunkDemo
30 | CONFIDENTIAL
33. Company Highlights
Company
Founded 2004, first software release in 2006
Headquarters: San Francisco, CA
Regional headquarters in Hong Kong and London
Over 580 employees, based in 10 countries
Q1 Revenue: $37.2 million; +80% year-over-year
Business Model / Product
Free download
Current release: Splunk Enterprise 4.3
4,000+ Customers
Customers in over 75 countries
54 of the Fortune 100
Largest Customer: 100 Terabytes per day
33 | CONFIDENTIAL
34. FILL OUT
A SURVEY
EVERY COMPLETE SURVEY
IS ENTERED INTO
DRAWING FOR A
$25 VMWARE COMPANY
STORE GIFT CERTIFICATE
35. SPO3378
How a Cloud Computing
Provider Reached the
Holy Grail of Visibility
Elad Gotfrid, CloudShare
Leena Joshi, Splunk Inc
#vmworldsponsor