SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
1
The Modern Data Center
Topology:
The High Availability Mantra
2
TopicsTopics
• The Modern Data Center Overview
• The High Availability (HA) Mantra
• Operating Challenges
• A Solution
3
Modern Data Center
Overview
4
Multiple Classes of Data CentersMultiple Classes of Data Centers
• Internet Data Center
 used by external clients connecting from the Internet
 supports servers and devices required for B2C transaction-based applications (e-
commerce).
• Extranet Data Center
 provides support and services for external B2B partner transactions.
 accessed over secure VPN connections or private WAN links between the partner
network and the enterprise extranet.
• Intranet Data Center
 hosts applications and services mostly accessed by internal employees with
connectivity to the internal enterprise network.
ness services.
• Special Purpose Data Center
 For specialized application areas like Geological & Geophysical for Oil & Gas
Industry
May or may not be inter-connected
5
Common Objective: Business ContinuityCommon Objective: Business Continuity
• Disaster Recovery Data Center
 Each Class may have dedicated or Shared DR Center
 Usually located separately from Primary Data Center
• High Availability (HA) Data Center
 Each Data Center provided for with significant redundancies
 DR Center comes into play only when a Disaster strikes.
 Component or system failures within any DC should be either self-healing or
redundancies within the DC should take over
• Insurance Against Power & Network Outages
 Reliability through multiple service providers
 Internal Back-ups
ness services.
• Securing the Data Center
 Against malicious hacking that can bring down the Data Center impacting
business continuity
 Implementing Firewalls/ Virtual Firewalls
6
Common Complexity: Multitude of AssetsCommon Complexity: Multitude of Assets
Multitude of Assets
 Divided between two
worlds: IT & Facilities
 Includes Mission
Critical Applications
 Like a manufacturing
operation
 Raw Material: Power &
Networks
 Processing: Data
 Output: Information
Service
 Needs: Asset
Management, Resource
Optimization, a la
Manufacturing
Multitude of Assets
 Divided between two
worlds: IT & Facilities
 Includes Mission
Critical Applications
 Like a manufacturing
operation
 Raw Material: Power &
Networks
 Processing: Data
 Output: Information
Service
 Needs: Asset
Management, Resource
Optimization, a la
Manufacturing
7
The High Availability
Mantra
8
Extreme Redundancies for 99.99% Uptime -> Higher Power ConsumptionExtreme Redundancies for 99.99% Uptime -> Higher Power Consumption
Huge Population of N+1/N+2 Equipment -> Asset Under utilization & Too complex to
manage with spreadsheets & Visio tools
Huge Population of N+1/N+2 Equipment -> Asset Under utilization & Too complex to
manage with spreadsheets & Visio tools
Chain of inter-dependent equipment -> Multiple points of failuresChain of inter-dependent equipment -> Multiple points of failures
Growing Heat Loads, Carbon Emissions & e-waste -> Sustainability IssuesGrowing Heat Loads, Carbon Emissions & e-waste -> Sustainability Issues
KW per Rack increases as more processing capacity is added -> Trade-offs: need to
support more per rack versus extra space & heat loads.
KW per Rack increases as more processing capacity is added -> Trade-offs: need to
support more per rack versus extra space & heat loads.
High Availability is Inversely Proportional to Asset Utilization & Energy EfficiencyHigh Availability is Inversely Proportional to Asset Utilization & Energy Efficiency
Today’s High Availability Data CenterToday’s High Availability Data Center
9
When HA fails - Tale of Two DisastersWhen HA fails - Tale of Two Disasters
AmazonAmazon RBSRBS
Tech fault at RBS and Natwest freezes
millions of UK bank balances
RBS and Natwest have failed to register inbound
payments for up to three days, customers have
reported, leaving people unable to pay for bills,
travel and even food. The banks - both owned
by RBS Group - have confirmed that technical
glitches have left bank accounts displaying the
wrong balances and certain services
unavailable. There is no fix date available.
Amazon cloud outage takes down
Netflix, Instagram, Pinterest, & more
With the critical Amazon outage, which is the
second this month, we wouldn’t be surprised
if these popular services started looking at
other options, including Rackspace, SoftLayer,
Microsoft’s Azure, and Google’s just-
introduced Compute Engine. Some of
Amazon’s biggest EC2 outages occurred in
April and August of last year.
Which Will Be The Next One?Which Will Be The Next One?
10
What’s the High Availability Mantra?What’s the High Availability Mantra?
Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) has had
two outages already in 2012 – each over 3 hours!
• Tier 3/Tier 4 just defined by hardware redundancies
• Glaring gaps in operating procedures to prevent fatal human errors
• Lack of purpose-built BCP software to predict failures
• Lack of chain of custody to detect root cause
Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) has had
two outages already in 2012 – each over 3 hours!
• Tier 3/Tier 4 just defined by hardware redundancies
• Glaring gaps in operating procedures to prevent fatal human errors
• Lack of purpose-built BCP software to predict failures
• Lack of chain of custody to detect root cause
Availability % Downtime per year Downtime per month* Downtime per week
99% ("two nines") 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes
99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds
11
Delivering the High Availability PromiseDelivering the High Availability Promise
Adequate Redundancies
• Are there any points of failure – besides power and external networks - that can impact
uptime? (Not everything is N+1)
• What are my redundancy paths?
• Are the relationships & dependencies among critical assets clearly defined?
• Can I do an impact analysis on the outage/downtime of any equipment? Can I predict
the cascading effect of such an outage on other assets/applications in the data center?
Preventing Failures
• Can any failure be predicted to take proactive measures? Do I get alerts on threshold
breaches so that I can take preventive actions before a failure happens?
• Is there a history of a Move-Add-Change (MAC) that I should be aware of?
• What is the impact of a MAC on space, power, cooling?
• Where can new devices/servers be best placed? Floor -> Rack -> Cage. How this can be
determined based on current infrastructure and other dependencies to avoid a failure?
• How do I prevent a fatal human error?
12
Operating Challenges
13
The High Availability ChallengeThe High Availability Challenge
Asset Over Provisioning Lack of HA Management Tool
 IT assets tracked by Systems
Management Tool
 Facilities assets tracked by BMS
 Two not inter-operable: Unable to
determine missing link for HA
 Unable to track redundancy paths
 HA fails if any equipment or
software in critical path fails
 HA fails if there’s fatal human error
 Health and history of equipment, or
previous MAC impact, not tracked
 IT assets tracked by Systems
Management Tool
 Facilities assets tracked by BMS
 Two not inter-operable: Unable to
determine missing link for HA
 Unable to track redundancy paths
 HA fails if any equipment or
software in critical path fails
 HA fails if there’s fatal human error
 Health and history of equipment, or
previous MAC impact, not tracked
 Too many assets; two classes of assets
 Absence of Software Portfolio (even if
hardware assets are tracked)
 Move-Add-Change: Decisions not
based on simulations, analysis
 Absence of change management
 Absence of workflow approvals
 Unable to predict failures
 No chain of custody
 Too many assets; two classes of assets
 Absence of Software Portfolio (even if
hardware assets are tracked)
 Move-Add-Change: Decisions not
based on simulations, analysis
 Absence of change management
 Absence of workflow approvals
 Unable to predict failures
 No chain of custody
Need to Predict FailuresNeed to Predict Failures
14
Beyond HA: Infrastructure & Operational ChallengesBeyond HA: Infrastructure & Operational Challenges
Energy Problems Operational Problems
 Low level asset tracking
 Under utilization of many computing
resources
 Running of old inefficient equipment
 Decisions not based on analysis
 Cooling not optimized
 Floor & Rack Space: Non-optimal
placements of equipment
 Increasing demand for rack space
 Absence of capacity planning
 Low level asset tracking
 Under utilization of many computing
resources
 Running of old inefficient equipment
 Decisions not based on analysis
 Cooling not optimized
 Floor & Rack Space: Non-optimal
placements of equipment
 Increasing demand for rack space
 Absence of capacity planning
 Higher power consumption & growing
power bills
 Not monitoring power use at device
levels
 Dissemination of enormous heat
 Creation of hot spots
 Drastic reduction in expected life of
computing equipment
 Failing of a data center
 Increase in CO2 emission
 Higher power consumption & growing
power bills
 Not monitoring power use at device
levels
 Dissemination of enormous heat
 Creation of hot spots
 Drastic reduction in expected life of
computing equipment
 Failing of a data center
 Increase in CO2 emission
15
A Solution
16
Solution That Bridges the Gap Between IT & FacilitiesSolution That Bridges the Gap Between IT & Facilities
Data Center Infrastructure Management (DCIM) SoftwareData Center Infrastructure Management (DCIM) Software
IT System
Performance
Management
IT System
Performance
Management
Building
Management
System
Building
Management
System
Data Center
Infrastructure
Management
Data Center
Infrastructure
Management
17
Solution That Addresses The High Availability ChallengeSolution That Addresses The High Availability Challenge
DCIM Helps to Predict FailuresDCIM Helps to Predict Failures
Asset Over Provisioning Lack of HA Management Tool
 IT assets tracked by Systems
Management Tool
 Facilities assets tracked by BMS
 Two not inter-operable: Unable to
determine missing link for HA
 Unable to track redundancy paths
 HA fails if any equipment or software
in critical path fails
 HA fails if there’s fatal human error
 Health and history of equipment, or
previous MAC impact, not tracked
 IT assets tracked by Systems
Management Tool
 Facilities assets tracked by BMS
 Two not inter-operable: Unable to
determine missing link for HA
 Unable to track redundancy paths
 HA fails if any equipment or software
in critical path fails
 HA fails if there’s fatal human error
 Health and history of equipment, or
previous MAC impact, not tracked
 Too many assets; two classes of assets
 Absence of Software Portfolio (even if
hardware assets are tracked)
 Move-Add-Change: Decisions not
based on simulations, analysis
 Absence of change management
 Absence of workflow approvals
 Unable to predict failures
 No chain of custody
 Too many assets; two classes of assets
 Absence of Software Portfolio (even if
hardware assets are tracked)
 Move-Add-Change: Decisions not
based on simulations, analysis
 Absence of change management
 Absence of workflow approvals
 Unable to predict failures
 No chain of custody
18
Solution That Addresses Infra & Operational ChallengesSolution That Addresses Infra & Operational Challenges
DCIM Improves Energy & Operational EfficienciesDCIM Improves Energy & Operational Efficiencies
Energy Problems Operational Problems
 Low level asset tracking
 Under utilization of many computing
resources
 Running of old inefficient equipment
 Decisions not based on analysis
 Cooling not optimized
 Floor & Rack Space: Non-optimal
placements of equipment
 Increasing demand for rack space
 Absence of capacity planning
 Low level asset tracking
 Under utilization of many computing
resources
 Running of old inefficient equipment
 Decisions not based on analysis
 Cooling not optimized
 Floor & Rack Space: Non-optimal
placements of equipment
 Increasing demand for rack space
 Absence of capacity planning
 Higher power consumption & growing
power bills
 Not monitoring power use at device
levels
 Dissemination of enormous heat
 Creation of hot spots
 Drastic reduction in expected life of
computing equipment
 Failing of a data center
 Increase in CO2 emission
 Higher power consumption & growing
power bills
 Not monitoring power use at device
levels
 Dissemination of enormous heat
 Creation of hot spots
 Drastic reduction in expected life of
computing equipment
 Failing of a data center
 Increase in CO2 emission
19
Anatomy of a DCIM Software: GFS Crane
20
Thank You
http://www.greenfieldsoft.com
Email: sales@greenfieldsoft.com
See also on slideshare:
Data Center Infrastructure Management:
ERP for the Data Center Manager

Más contenido relacionado

Destacado

Datacenter
DatacenterDatacenter
Datacenterjayconde
 
European Utility Week 2015: Next Generation Outage Management
European Utility Week 2015: Next Generation Outage ManagementEuropean Utility Week 2015: Next Generation Outage Management
European Utility Week 2015: Next Generation Outage ManagementOMNETRIC
 
Building Scalable Data Center Networks
Building Scalable Data Center NetworksBuilding Scalable Data Center Networks
Building Scalable Data Center NetworksCumulus Networks
 
Better Data Center Infrastructure Management
Better Data Center Infrastructure ManagementBetter Data Center Infrastructure Management
Better Data Center Infrastructure ManagementViridity Software
 
Junos space seminar
Junos space seminarJunos space seminar
Junos space seminarKappa Data
 
Presentation data center design overview
Presentation   data center design overviewPresentation   data center design overview
Presentation data center design overviewxKinAnx
 
1 introduction-to-computer-networking
1 introduction-to-computer-networking1 introduction-to-computer-networking
1 introduction-to-computer-networkingKhan Rahimeen
 
Modern Data Center Network Architecture - The house that Clos built
Modern Data Center Network Architecture - The house that Clos builtModern Data Center Network Architecture - The house that Clos built
Modern Data Center Network Architecture - The house that Clos builtCumulus Networks
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologiesrjain51
 
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014Amazon Web Services
 
Successful Outage Management Lessons Learned From Global Generation Leaders
Successful Outage Management   Lessons Learned From Global Generation LeadersSuccessful Outage Management   Lessons Learned From Global Generation Leaders
Successful Outage Management Lessons Learned From Global Generation LeadersTedLemmers
 
Basic concepts of computer Networking
Basic concepts of computer NetworkingBasic concepts of computer Networking
Basic concepts of computer NetworkingHj Habib
 
The Inevitable Cloud Outage
The Inevitable Cloud OutageThe Inevitable Cloud Outage
The Inevitable Cloud OutageNewvewm
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud OutageNati Shalom
 
BASIC TO ADVANCED NETWORKING TUTORIALS
BASIC TO ADVANCED NETWORKING TUTORIALSBASIC TO ADVANCED NETWORKING TUTORIALS
BASIC TO ADVANCED NETWORKING TUTORIALSVarinder Singh Walia
 

Destacado (19)

Henry
HenryHenry
Henry
 
Datacenter
DatacenterDatacenter
Datacenter
 
Технологии ЦОД. Virtual Chassis Fabric
Технологии ЦОД. Virtual Chassis FabricТехнологии ЦОД. Virtual Chassis Fabric
Технологии ЦОД. Virtual Chassis Fabric
 
European Utility Week 2015: Next Generation Outage Management
European Utility Week 2015: Next Generation Outage ManagementEuropean Utility Week 2015: Next Generation Outage Management
European Utility Week 2015: Next Generation Outage Management
 
Building Scalable Data Center Networks
Building Scalable Data Center NetworksBuilding Scalable Data Center Networks
Building Scalable Data Center Networks
 
An In-Depth Look at Junos Space SDK
An In-Depth Look at Junos Space SDKAn In-Depth Look at Junos Space SDK
An In-Depth Look at Junos Space SDK
 
Better Data Center Infrastructure Management
Better Data Center Infrastructure ManagementBetter Data Center Infrastructure Management
Better Data Center Infrastructure Management
 
Junos space seminar
Junos space seminarJunos space seminar
Junos space seminar
 
Presentation data center design overview
Presentation   data center design overviewPresentation   data center design overview
Presentation data center design overview
 
1 introduction-to-computer-networking
1 introduction-to-computer-networking1 introduction-to-computer-networking
1 introduction-to-computer-networking
 
Modern Data Center Network Architecture - The house that Clos built
Modern Data Center Network Architecture - The house that Clos builtModern Data Center Network Architecture - The house that Clos built
Modern Data Center Network Architecture - The house that Clos built
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologies
 
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
 
Datacenter overview
Datacenter overviewDatacenter overview
Datacenter overview
 
Successful Outage Management Lessons Learned From Global Generation Leaders
Successful Outage Management   Lessons Learned From Global Generation LeadersSuccessful Outage Management   Lessons Learned From Global Generation Leaders
Successful Outage Management Lessons Learned From Global Generation Leaders
 
Basic concepts of computer Networking
Basic concepts of computer NetworkingBasic concepts of computer Networking
Basic concepts of computer Networking
 
The Inevitable Cloud Outage
The Inevitable Cloud OutageThe Inevitable Cloud Outage
The Inevitable Cloud Outage
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud Outage
 
BASIC TO ADVANCED NETWORKING TUTORIALS
BASIC TO ADVANCED NETWORKING TUTORIALSBASIC TO ADVANCED NETWORKING TUTORIALS
BASIC TO ADVANCED NETWORKING TUTORIALS
 

Más de GreenField Software Private Limited (7)

Entrepreneurship & Quality Management
Entrepreneurship & Quality ManagementEntrepreneurship & Quality Management
Entrepreneurship & Quality Management
 
Sleep Better At Night: Eliminate Data Center Failure
Sleep Better At Night: Eliminate Data Center FailureSleep Better At Night: Eliminate Data Center Failure
Sleep Better At Night: Eliminate Data Center Failure
 
Resurgence of Captive Data Centers: A Contrarian View
Resurgence of Captive Data Centers: A Contrarian ViewResurgence of Captive Data Centers: A Contrarian View
Resurgence of Captive Data Centers: A Contrarian View
 
DCIM Awareness Workshop
DCIM Awareness WorkshopDCIM Awareness Workshop
DCIM Awareness Workshop
 
Cloud Computing & DCIM
Cloud Computing & DCIMCloud Computing & DCIM
Cloud Computing & DCIM
 
The High Availability Mantra - How DCIM Can Help
The High Availability Mantra - How DCIM Can HelpThe High Availability Mantra - How DCIM Can Help
The High Availability Mantra - How DCIM Can Help
 
DCIM: ERP for the Data Center Manager
DCIM: ERP for the Data Center ManagerDCIM: ERP for the Data Center Manager
DCIM: ERP for the Data Center Manager
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

The Modern Data Center Topology - The High Availability Mantra

  • 1. 1 The Modern Data Center Topology: The High Availability Mantra
  • 2. 2 TopicsTopics • The Modern Data Center Overview • The High Availability (HA) Mantra • Operating Challenges • A Solution
  • 4. 4 Multiple Classes of Data CentersMultiple Classes of Data Centers • Internet Data Center  used by external clients connecting from the Internet  supports servers and devices required for B2C transaction-based applications (e- commerce). • Extranet Data Center  provides support and services for external B2B partner transactions.  accessed over secure VPN connections or private WAN links between the partner network and the enterprise extranet. • Intranet Data Center  hosts applications and services mostly accessed by internal employees with connectivity to the internal enterprise network. ness services. • Special Purpose Data Center  For specialized application areas like Geological & Geophysical for Oil & Gas Industry May or may not be inter-connected
  • 5. 5 Common Objective: Business ContinuityCommon Objective: Business Continuity • Disaster Recovery Data Center  Each Class may have dedicated or Shared DR Center  Usually located separately from Primary Data Center • High Availability (HA) Data Center  Each Data Center provided for with significant redundancies  DR Center comes into play only when a Disaster strikes.  Component or system failures within any DC should be either self-healing or redundancies within the DC should take over • Insurance Against Power & Network Outages  Reliability through multiple service providers  Internal Back-ups ness services. • Securing the Data Center  Against malicious hacking that can bring down the Data Center impacting business continuity  Implementing Firewalls/ Virtual Firewalls
  • 6. 6 Common Complexity: Multitude of AssetsCommon Complexity: Multitude of Assets Multitude of Assets  Divided between two worlds: IT & Facilities  Includes Mission Critical Applications  Like a manufacturing operation  Raw Material: Power & Networks  Processing: Data  Output: Information Service  Needs: Asset Management, Resource Optimization, a la Manufacturing Multitude of Assets  Divided between two worlds: IT & Facilities  Includes Mission Critical Applications  Like a manufacturing operation  Raw Material: Power & Networks  Processing: Data  Output: Information Service  Needs: Asset Management, Resource Optimization, a la Manufacturing
  • 8. 8 Extreme Redundancies for 99.99% Uptime -> Higher Power ConsumptionExtreme Redundancies for 99.99% Uptime -> Higher Power Consumption Huge Population of N+1/N+2 Equipment -> Asset Under utilization & Too complex to manage with spreadsheets & Visio tools Huge Population of N+1/N+2 Equipment -> Asset Under utilization & Too complex to manage with spreadsheets & Visio tools Chain of inter-dependent equipment -> Multiple points of failuresChain of inter-dependent equipment -> Multiple points of failures Growing Heat Loads, Carbon Emissions & e-waste -> Sustainability IssuesGrowing Heat Loads, Carbon Emissions & e-waste -> Sustainability Issues KW per Rack increases as more processing capacity is added -> Trade-offs: need to support more per rack versus extra space & heat loads. KW per Rack increases as more processing capacity is added -> Trade-offs: need to support more per rack versus extra space & heat loads. High Availability is Inversely Proportional to Asset Utilization & Energy EfficiencyHigh Availability is Inversely Proportional to Asset Utilization & Energy Efficiency Today’s High Availability Data CenterToday’s High Availability Data Center
  • 9. 9 When HA fails - Tale of Two DisastersWhen HA fails - Tale of Two Disasters AmazonAmazon RBSRBS Tech fault at RBS and Natwest freezes millions of UK bank balances RBS and Natwest have failed to register inbound payments for up to three days, customers have reported, leaving people unable to pay for bills, travel and even food. The banks - both owned by RBS Group - have confirmed that technical glitches have left bank accounts displaying the wrong balances and certain services unavailable. There is no fix date available. Amazon cloud outage takes down Netflix, Instagram, Pinterest, & more With the critical Amazon outage, which is the second this month, we wouldn’t be surprised if these popular services started looking at other options, including Rackspace, SoftLayer, Microsoft’s Azure, and Google’s just- introduced Compute Engine. Some of Amazon’s biggest EC2 outages occurred in April and August of last year. Which Will Be The Next One?Which Will Be The Next One?
  • 10. 10 What’s the High Availability Mantra?What’s the High Availability Mantra? Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) has had two outages already in 2012 – each over 3 hours! • Tier 3/Tier 4 just defined by hardware redundancies • Glaring gaps in operating procedures to prevent fatal human errors • Lack of purpose-built BCP software to predict failures • Lack of chain of custody to detect root cause Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) has had two outages already in 2012 – each over 3 hours! • Tier 3/Tier 4 just defined by hardware redundancies • Glaring gaps in operating procedures to prevent fatal human errors • Lack of purpose-built BCP software to predict failures • Lack of chain of custody to detect root cause Availability % Downtime per year Downtime per month* Downtime per week 99% ("two nines") 3.65 days 7.20 hours 1.68 hours 99.5% 1.83 days 3.60 hours 50.4 minutes 99.8% 17.52 hours 86.23 minutes 20.16 minutes 99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes 99.95% 4.38 hours 21.56 minutes 5.04 minutes 99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes 99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds 99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds 99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds
  • 11. 11 Delivering the High Availability PromiseDelivering the High Availability Promise Adequate Redundancies • Are there any points of failure – besides power and external networks - that can impact uptime? (Not everything is N+1) • What are my redundancy paths? • Are the relationships & dependencies among critical assets clearly defined? • Can I do an impact analysis on the outage/downtime of any equipment? Can I predict the cascading effect of such an outage on other assets/applications in the data center? Preventing Failures • Can any failure be predicted to take proactive measures? Do I get alerts on threshold breaches so that I can take preventive actions before a failure happens? • Is there a history of a Move-Add-Change (MAC) that I should be aware of? • What is the impact of a MAC on space, power, cooling? • Where can new devices/servers be best placed? Floor -> Rack -> Cage. How this can be determined based on current infrastructure and other dependencies to avoid a failure? • How do I prevent a fatal human error?
  • 13. 13 The High Availability ChallengeThe High Availability Challenge Asset Over Provisioning Lack of HA Management Tool  IT assets tracked by Systems Management Tool  Facilities assets tracked by BMS  Two not inter-operable: Unable to determine missing link for HA  Unable to track redundancy paths  HA fails if any equipment or software in critical path fails  HA fails if there’s fatal human error  Health and history of equipment, or previous MAC impact, not tracked  IT assets tracked by Systems Management Tool  Facilities assets tracked by BMS  Two not inter-operable: Unable to determine missing link for HA  Unable to track redundancy paths  HA fails if any equipment or software in critical path fails  HA fails if there’s fatal human error  Health and history of equipment, or previous MAC impact, not tracked  Too many assets; two classes of assets  Absence of Software Portfolio (even if hardware assets are tracked)  Move-Add-Change: Decisions not based on simulations, analysis  Absence of change management  Absence of workflow approvals  Unable to predict failures  No chain of custody  Too many assets; two classes of assets  Absence of Software Portfolio (even if hardware assets are tracked)  Move-Add-Change: Decisions not based on simulations, analysis  Absence of change management  Absence of workflow approvals  Unable to predict failures  No chain of custody Need to Predict FailuresNeed to Predict Failures
  • 14. 14 Beyond HA: Infrastructure & Operational ChallengesBeyond HA: Infrastructure & Operational Challenges Energy Problems Operational Problems  Low level asset tracking  Under utilization of many computing resources  Running of old inefficient equipment  Decisions not based on analysis  Cooling not optimized  Floor & Rack Space: Non-optimal placements of equipment  Increasing demand for rack space  Absence of capacity planning  Low level asset tracking  Under utilization of many computing resources  Running of old inefficient equipment  Decisions not based on analysis  Cooling not optimized  Floor & Rack Space: Non-optimal placements of equipment  Increasing demand for rack space  Absence of capacity planning  Higher power consumption & growing power bills  Not monitoring power use at device levels  Dissemination of enormous heat  Creation of hot spots  Drastic reduction in expected life of computing equipment  Failing of a data center  Increase in CO2 emission  Higher power consumption & growing power bills  Not monitoring power use at device levels  Dissemination of enormous heat  Creation of hot spots  Drastic reduction in expected life of computing equipment  Failing of a data center  Increase in CO2 emission
  • 16. 16 Solution That Bridges the Gap Between IT & FacilitiesSolution That Bridges the Gap Between IT & Facilities Data Center Infrastructure Management (DCIM) SoftwareData Center Infrastructure Management (DCIM) Software IT System Performance Management IT System Performance Management Building Management System Building Management System Data Center Infrastructure Management Data Center Infrastructure Management
  • 17. 17 Solution That Addresses The High Availability ChallengeSolution That Addresses The High Availability Challenge DCIM Helps to Predict FailuresDCIM Helps to Predict Failures Asset Over Provisioning Lack of HA Management Tool  IT assets tracked by Systems Management Tool  Facilities assets tracked by BMS  Two not inter-operable: Unable to determine missing link for HA  Unable to track redundancy paths  HA fails if any equipment or software in critical path fails  HA fails if there’s fatal human error  Health and history of equipment, or previous MAC impact, not tracked  IT assets tracked by Systems Management Tool  Facilities assets tracked by BMS  Two not inter-operable: Unable to determine missing link for HA  Unable to track redundancy paths  HA fails if any equipment or software in critical path fails  HA fails if there’s fatal human error  Health and history of equipment, or previous MAC impact, not tracked  Too many assets; two classes of assets  Absence of Software Portfolio (even if hardware assets are tracked)  Move-Add-Change: Decisions not based on simulations, analysis  Absence of change management  Absence of workflow approvals  Unable to predict failures  No chain of custody  Too many assets; two classes of assets  Absence of Software Portfolio (even if hardware assets are tracked)  Move-Add-Change: Decisions not based on simulations, analysis  Absence of change management  Absence of workflow approvals  Unable to predict failures  No chain of custody
  • 18. 18 Solution That Addresses Infra & Operational ChallengesSolution That Addresses Infra & Operational Challenges DCIM Improves Energy & Operational EfficienciesDCIM Improves Energy & Operational Efficiencies Energy Problems Operational Problems  Low level asset tracking  Under utilization of many computing resources  Running of old inefficient equipment  Decisions not based on analysis  Cooling not optimized  Floor & Rack Space: Non-optimal placements of equipment  Increasing demand for rack space  Absence of capacity planning  Low level asset tracking  Under utilization of many computing resources  Running of old inefficient equipment  Decisions not based on analysis  Cooling not optimized  Floor & Rack Space: Non-optimal placements of equipment  Increasing demand for rack space  Absence of capacity planning  Higher power consumption & growing power bills  Not monitoring power use at device levels  Dissemination of enormous heat  Creation of hot spots  Drastic reduction in expected life of computing equipment  Failing of a data center  Increase in CO2 emission  Higher power consumption & growing power bills  Not monitoring power use at device levels  Dissemination of enormous heat  Creation of hot spots  Drastic reduction in expected life of computing equipment  Failing of a data center  Increase in CO2 emission
  • 19. 19 Anatomy of a DCIM Software: GFS Crane
  • 20. 20 Thank You http://www.greenfieldsoft.com Email: sales@greenfieldsoft.com See also on slideshare: Data Center Infrastructure Management: ERP for the Data Center Manager