SharePoint best practices dictate that a proper disaster recovery plan should be in place before the launch of your SharePoint farm. Standard methodologies related to disaster planning in SharePoint deal with the traditional type of scenarios where your datacenter is a smoldering hole in the ground. Processes such as SQL Server database backups or STSADM backups for site collections are often employed to cater to such scenarios. When something seemingly benign like a Secure Store Service Application corruption strikes, architects and administrators often come to the sad conclusion that a complete farm rebuild is their only recourse. Additionally the risks associated with the application of regular bi-monthly SharePoint Cumulative Updates and periodic service packs, all of which have no uninstall or undo features, also serve to increase the probability of experiencing an complete emergency farm rebuild at some point in an architect/administrator’s career. Long after a rebuild is completed and business has been restored to "almost" normal status, you’ll still be troubleshooting server configurations and tweaking the environment to get back to your pre-disaster level.
This workshop takes you through a dramatically new way of architecting your disaster plan. By applying the principles of this new methodology, you’ll dramatically cut down your disaster response time to the point of almost avoiding them entirely.
Share point disaster avoidance architecture for large scale enterprises
1. SharePoint Disaster
Avoidance Architecture for
Large Scale Enterprises
Cornelius J. van Dyk Jason Himmelstein
Crayveon Corporation Sentri
c@crayveon.com jhimmelstein@sentri.com
@cjvandyk @sharepointlhorn
3. • SharePoint Practice Director, Sentri Inc.
• MCITP, MCTS SharePoint 2010
• Microsoft vTSP
● virtual Technology Solutions Professional
• SharePoint Foundation Logger
(http://spflogger.codeplex.com)
• Web: www.sentri.com
• Blog: www.sharepointlonghorn.com
• Twitter: @sharepointlhorn
• LinkedIn: www.linkedin.com/in/jasonhimmelstein
4. Why do we do this?
Jason’s Family Cornelius’ Family
5. GET TO KNOW YOU
• Name
• Company
• What you do with SharePoint
• Something interesting about yourself
6. DISASTER
• Outage vs Disaster
• When is a disaster actually a disaster?
• Traditional disaster planning
7. DISCUSSION GROUP BREAKOUT
• What is disaster planning to you?
• In the context of SharePoint
• Critical points
8. BUSINESS CONTINUITY PLANNING
• Business continuity planning identifies an
organization's exposure to internal and external
threats and synthesizes hard and soft assets to
provide effective prevention and recovery for the
organization, whilst maintaining competitive
advantage and system integrity.
• Components
● Planning
● Testing
● Validation
10. DISASTER PLANNING STEPS
• Executive Management Commitment
● This costs money
● Must invest to protect
● Think of Insurance
11. DISASTER PLANNING STEPS
• Planning Committee
● All business units represented
● One person to lead – think Chief Justice
● Responsibility
● Authority
22. DISASTER PLANNING STEPS
• Perform Data Collection
● Critical phone numbers
● Hardware inventory
• Vendor contact and equipment information
● Software inventory
● Notification checklist
23. DISASTER PLANNING STEPS
• Organize & Document a Written Plan
● Plan should follow a checklist
● Think rebuild from scratch
• Notifications
• Hardware
• Software
• Restore backups
24. DISASTER PLANNING STEPS
• Organize & Document a Written Plan (cont.)
● Think rebuild from scratch (cont.)
• Re-establish systems
• Test & Validate
• Communicate
• After Action Review
25. DISASTER PLANNING STEPS
• Develop Testing Criteria & Procedures
• Test the plan
• Test the plan again
• Approve the plan
26. DISASTER PLANNING STEPS
• Ongoing plan validation
● Annual testing
● Scenario testing
● Testing when something changes
29. RECOVERY vs AVOIDANCE
• What is Disaster Avoidance?
• A new way of looking at DR
• Why another DR strategy?
• What makes SPDAALSE different?
30. CAUSES OF DISASTERS
• Natural disasters such as floods, hurricanes,
earthquakes, tornados, storms etc.
• Human induced such as accidents, acts of
terrorism etc.
• Hardware failures such as drive crashes,
memory or board failures etc.
31. CAUSES OF DISASTERS (cont)
• Malware such as worms, viruses etc.
• The one everyone forgets about…
• Software incompatibility when upgrading:
● Operating systems
● Software service pack
● Software patches
32. SHAREPOINT CUMULATIVE UPDATES
• Bi-monthly
• Recommended by support
• History of hot fixes and re-releases
• Famously broke User Profile Services
33. CUs A NECCESARY EVIL
• Why apply them at all?
• What’s their risk?
• Can’t we just uninstall them?
• Compared to Exchange…
34. HOW DOES SPDAALSE HELP?
• Farm Architecture
• SharePoint databases
• Difference between data and configuration
• What makes Large Scale Enterprises different?
41. Agenda
• Infrastructure Design
● Analyze Customer Requirements
● Hardware requirements
● Server configuration
● Network recommendations
● Virtual vs. Physical
• SQL Server Performance
● Pre-grow vs. Auto-growth
● IO requirements
● Sizing recommendations
● Database Isolation
• SharePoint Server Performance
● Tier isolation vs. Location Proximity Requirements
● Load balancing your App Tier
● Load testing in your environment
● Governance & Troubleshooting
42. Infrastructure Design
• Analyze Customer Requirements
● High Availability
● Disaster Recovery
● Budget Constraints
● Location Awareness
● Number of Concurrent Users
43. Infrastructure Design
• Hardware requirements
● Web servers & Application servers
Developer or Evaluation environments Production in Single Server or farm
CPU: 4 cores, 64-bit required environments
RAM: 4GB CPU: 4 cores, 64-bit required
Hard Drive space: 80GB RAM: 8GB
Hard Drive space: 80GB
● SQL servers
Small Farm Medium Farm Large Farm
CPU: 4 cores, 64-bit required CPU: 8 cores, 64-bit required Up to 2TB Content DBS
RAM: 8GB RAM: 16GB RAM: 32 GB
Hard Drive space: 80GB Hard Drive space: 80GB From 2TB to 5TB Content DBS
RAM: 64 GB
• What constitutes a small/medium/large farm?
47. Infrastructure Design
• Network recommendations
● Traffic Isolation
• Web
• Database
• Search
• Service Applications
• Authentication
● Number of NICs per server
● Limit the number of hops
● Colocation of servers
48. Infrastructure Design
• Physical
● Benefits
• No virtualization overhead
• Ability to target DBs to separate physical spindles
• Only OS limits on Hardware
• Simple Networking
● Drawbacks
• Backup & recovery time
• Limited snapshot ability
• Costly & lacking Centralized Management
• Failover limitations
49. Infrastructure Design
• Virtualization
● Benefits
• Snapshot capability
• Rapid system deployment
• HADR ability
• Centralized Management
● Drawbacks
• Loss of minimum 8% compute for overhead
• Limitations on addressing full hardware
• Disks are stored as single/multi-file
• Centralized Networking
50. SQL Server Performance
• Pre-grow databases
● Requires more space initially
● Dramatic increase in performance
● Databases like contiguous space
• Auto-growth
● Immediately change from 1m increments
● Do not use “Grow by %” setting
● 50-100m maximum growth per required
● Schedule maintenance task to check size & grow in off peak
hours as required
51. SQL Server Performance
• IO requirements
DB Files RAID Level Optimization
1 TempDB data 10 Write
2 TempDB logs 10 Write
3 ContentDB data 10 ReadWrite
4 ContentDB logs 10 Write
5 Crawl DB logs 10 Write
6 Crawl DB data 10 ReadWrite
7 Property DB logs 10 Write
8 Property DB data 10 Write
9 Services DB logs 10 Write
10 Services DB data 5/10 ReadWrite
11 Archive Content DB 5 Read
12 Publishing Site Content DB 5 Read
52. SQL Server Performance
• Sizing recommendations
● Recommended limit for ContentDBs: 200G
• Maximum supported: 4TB
– Includes Remote BLOBs
● Backup/Restore timing
● Simple vs. Full recovery mode
53. SQL Server Performance
• Database Instance Isolation
● Secure Store Database
● SharePoint core databases
● Content Databases
● Search
● Highly Transactional non-SharePoint DBs
• Drawback
● Lose the central management in a single SQL Server
Management Studio window
54. SharePoint Server Performance
• Tier isolation vs. Location Proximity Requirements
● Separation via vLAN
• Less chatter
• Increased hop count
● Collocating SharePoint in a single vLAN
• Increased chatter
• Lower hop count
• Key take away
● Know your network, determine your topology based upon traffic
& requirements
55. SharePoint Server Performance
• Load balancing your App Tier
● Know your load
● Scale based upon need, not perception
• Find your choke point,
then release the grasp
● Don’t assume, validate!
56. SharePoint Server Performance
• Load testing in your environment
● Example
• 2 Web Servers (4cores, 16GB RAM) using NLB
• 1 App Server (4cores, 16 GB RAM)
• 1 SQL Server Instance (16cores, 128GB RAM)
• Simple CRUD operations
– Login, create list item, open item, modify item, save item, delete item,
log out
57. SharePoint Server Performance
• Load testing in your environment
● Results
• Farm was completely non-responsive at ~500 concurrent users
● Root cause
• Watching this test on the server side we found that we were
immediately CPU bound.
● Conclusion
• Add CPUs or Web Servers to the farm to handle additional load
58. References
• Jason’s Blog Sentri, Inc SharePoint Foundation Logger
http://www.sharepointlonghorn.com http://www.sentri.com http://spflogger.codeplex.com
• My Article on SharePoint Pro
http://www.sharepointpromag.com/content1/topic/sharepoint-performance-troubleshooting-141506/catpath/sharepoint-server-2010
• Cornelius J. van Dyk’s Blog
http://www.cjvandyk.com/blog
• Eric Shupps’s Blog
http://www.sharepointcowboy.com
• SharePoint Server 2010 Hardware and software requirements
http://technet.microsoft.com/en-us/library/cc262485.aspx
• SharePoint Server 2010 Capacity Management: Software Boundaries and Limits
http://technet.microsoft.com/en-us/library/cc262787.aspx
• Capacity Management and Sizing Overview for SharePoint Server 2010
http://technet.microsoft.com/en-us/library/ff758647.aspx
• Capacity Planning for SharePoint Server 2010
http://technet.microsoft.com/en-us/library/ff758645.aspx
• Performance Testing for SharePoint Server 2010
http://technet.microsoft.com/en-us/library/ff758659.aspx
• Storage and SQL Server Capacity Planning and Configuration
http://technet.microsoft.com/en-us/library/cc298801.aspx
• Performance and Capacity Technical Case Studies
http://technet.microsoft.com/en-us/library/cc261716.aspx
• Monitoring and Maintaining SharePoint Server 2010
http://technet.microsoft.com/en-us/library/ff758658.aspx
• Performance Testing for SharePoint Server 2010
http://technet.microsoft.com/en-us/library/ff758659.aspx
• The Load Testing Kit for Visual Studio Team System
http://technet.microsoft.com/en-us/library/ff823731.aspx
• Web Capacity Analysis Tool (WCAT)
http://www.iis.net/community/default.aspx?tabid=34&g=6&i=1466