In this session, we discuss challenges that remain when attempting to scale only using SharePoint’s native functionality. Afterward, we’ll share vital strategies and available solutions for ensuring seamless, centralized enterprise-wide management and efficient externalization of Binary Large Objects in order to free up valuable SQL Server space and subsequently improve SharePoint performance.
1. Twitter: @GarthLuke Garth Luke (MCSE, MCP) Vice President, Sales AvePoint, Inc. Effective SharePoint Scalability & ManagementTo BLOB or not to BLOB, that’s the question…
2. AvePoint Company Overview AvePoint Confidential and Proprietary World’s Largest Provider of Integrated SharePoint Infrastructure Management Backup & Recovery, Administration, Replication, Migration, Compliance, Storage Optimization Products & Customer Growth Products Customers
3. Reduced total migration time to Microsoft's internal hosted SharePoint 2010 environment by two months Consolidation is actively happening Migrated 12,000 site collections from SharePoint 2007 to SharePoint 2010 Transferred approximately 200 lists to SharePoint 2010 while maintaining customizations, metadata, and field values Minimized business disruption by scheduling migration jobs to automatically occur off-hours http://www.avepoint.com/about/mtc-migration-to-2010/
4. Real World Scalability Examples Access 15TB of file-share data within SharePoint without migration Reduce project time by 9-12 months Enabled full SharePoint presentation & management of legacy file-share content without extra storage cost http://www.avepoint.com/resources/case-studies/ AvePoint Confidential and Proprietary
6. Growing With SharePoint Enterprise Content Management Line of Business Applications Return on Investment Collaboration Tool More Valuable Content Repository More Complex AvePoint Confidential and Proprietary
8. SharePoint Lessons AvePoint Confidential and Proprietary Information Architecture is ongoing Changing Topology Changing Taxonomy Consolidation is actively happening Global Farms – Central Farms Service applications are the future SharePoint as a Business O.S.
9. Implications of Growing Deployments Platform availability and integrity Scalability on settings, permissions, and policies High cost of storage Binary large objects’ (BLOBs) impact on performance and scalability
10. Optimizing Scalability Architect for Scale and Global Access Physical Architecture Administration Considerations Network Considerations Bandwidth Considerations Accommodating Growth: Storage RBS or EBS (Plus a 3rd Party Provider) FileStream
11. Architecting for Scalability:Physical Architecture Build redundancy into production- decrease downtime Recommend using a multi-stage approach Development Testing / Quality Assurance Staging / Pre-production Production Ensure all multi-stage environments are identical
16. Example: Multiple Farms Sharing Services Farm B Farm A Remote farm consumes published services via HTTP/S Servers providing service apps can publish specific apps Other servers in the farm
21. Content Publication Consistency is Key – Sharepoint Ecosystem Two-way replication with conflict resolution Local server for data survivability Publication of solutions / applications Business-rule driven replication and publication AvePoint Confidential and Proprietary
22. Plan for Growth: Scaling Administration Distribute Admin tasks Don’t forget about governance! Who can create sites and subsites? Who can delete them? What are my main content types and what metadata should be required for each? Who manages term stores and content type hubs? Who can add terms? Who can add content? Is there a review process? Who can add users and edit permissions? What are the security groups? Consider 3rd Party Administration Tools
24. Planning for Growth: The Big Picture Problem begins with initial migration Need data for legal retention SLAs still cover ALL SharePoint content Data in SQL Server
25. Storage Decisions for SharePoint Comfort level vs. Cost of Storage What makes the most sense for SharePoint Data? AvePoint Confidential and Proprietary
26. What is stored in SharePoint? BLOB(Binary Large OBject) Basically, a file = 21
31. Preventative Measures Set site quotas and alerts! 10 GB quota, 8 GB alert is my favorite Monitor growth trends Sites: slow over time or large jump in size? Overall content DB size Split Content DBs if they get “too big”
32. Modify your storage architecture Extend BLOBs out of SQL BLOBs: Binary Large Objects SharePoint Content = BLOB + Metadata Content DB = database of … BLOBs + Metadata Archive content
33. Default SharePoint Storage SharePoint WFE SharePoint Object Model BLOBs & Metadata SQL Server Content DB Config DB
34. BLOB Externalization: RBS & EBS RBS: Remote BLOB Storage For 2010 only introduced in SQL Server 2008R2 Feature Pack EBS: External BLOB Storage Introduced in SharePoint 2007 SP1 On deprecation list in SP2010 EBS to RBS migration can be performed with Powershell or 3rd party tool
35. RBS SharePoint WFE Not unique to SharePoint, available to any application A Provider Library can be associated with each database SharePoint Object Model BLOB & Metadata SQL Server Relational Access RBS Client Library Metadata BLOB Provider Library X Provider Library Y Content DB X Content DB Y BLOB Store BLOB Store
36. Anticipate Growth from the Start Leverage RBS in SharePoint – 3rd party tools User and API driven Transparent user access Transparent to development Stub Metadata BLOB Upload Database Extender File Disk Storage WebFront-end User
37. Architecture Scalability: Anticipate Growth Web Front-End Servers ApplicationServer ApplicationServer Extender Connector Storage Storage Storage Access Access Cloud Storage File Server Clustered SQL Server
58. Granular Platform vs. Granular Backup Contents within database Quickest for day-to-day recovery Flexible for aggressive SLA Segment data by business unit Full farm consistency Consistency / DR Requires staging / indexing Larger roll-back points VSS / hardware point of integration Platform AvePoint Confidential and Proprietary
59. Not All Data is Created Equal Hourly Hourly Daily WikisSupport FAQs/ReferencesDocument Libraries etc. Ongoing projectsActive meeting sitesetc. Sales leadsCustomer recordsetc. WikisSupport FAQs/ReferencesDocument Libraries etc. Ongoing projectsActive meeting sitesetc. Sales leadsCustomer recordsetc. Hourly Daily Weekly Support User GuidesTraining MaterialsBlogs Time SheetsPrice SheetsOther meeting sites etc. Financial reportsDaily sales reportsetc. Support User GuidesTraining MaterialsBlogs Time SheetsPrice SheetsOther meeting sites etc. Financial reportsDaily sales reportsetc. SQLDatabase Daily Weekly Weekly HR employee guidesPersonal sitesVacation Policies etc. Marketing brochuresSales materialsPre-sales literature etc. Annual reportsMo. sales reportsBoard reports etc. HR employee guidesPersonal sitesVacation Policies etc. Marketing brochuresSales materialsPre-sales literature etc. Annual reportsMo. sales reportsBoard reports etc. AvePoint Confidential and Proprietary
60. BLOB Backup and Recovery Options Hardware snapshots (if externalizing to same Hardware, e.g. NetApp) Cloud storage (Offers built-in redundancy for DR) Most SLAs will be for entire databases/content stores, many may not have granular recovery SLAs, or allow for synchronous backups DFSR - Replication of File Shares storing BLOBs Restore from replicated location Most SLAs will be for entire databases/content stores, consider data corruption, ability to perform synchronous backups, etc 3rd party platform tools Are synchronous backups of File Shares and SharePoint DBs achievable? Insert / Header & Footer to change 38
61. Planning for Platform Recovery Account for: Data corruption Accidental deletions Etc… Test! How long does it take? What are the compliance implications? If metadata (author, time, etc.) changes on restore, have I “falsified records”? If I can’t recover a single document, have we accidentally “destroyed data”? 39 Don’t forget about your item-level recovery strategy
62. Managing the content lifecycle of BLOBs Archiving for RM: Records Center Another SharePoint site Higher % inactive content Consider separate Content DB, with an RBS provider implemented for this DB Archiving for Storage Savings: Backup and delete Workflow 3rd Party tools solutions
63. 3rd Party Archiving Tools What rules are available? Last modified time Author Versions What scope can I apply rules to? (farm to item) Does it use RBS/EBS APIs? Does it integrate with other infrastructure management tools? (backup, replication, etc.)
64.
65. DocAve Architecture SharePoint 2010 Hosted SharePoint SharePoint 2007 SQL Databases Cloud Storage File Server AvePoint Confidential and Proprietary
Every company has a vision to use SharePoint as the sole ECM. There are many steps and challenges along the way. The greatest leaps are going from ‘collaboration’ to ‘development’ and customization, and then again from ‘development’ to full ‘enterprise content management’ (ECM). You must have tools to get you there, and DocAve will accelerate your growth.
Single farm to full, multi-platformSaaS
Considerations for scaling your SharePoint environment for growth
Best practice for the recommended architecture:Redundancy & HA for each tier in your production env.Maintain a test environment where you QA all changes and code before deploying in production…improve production stability
Scaling brings StabilityStability brings AvailabilityAvailability through deployment (Dev -> Prod)Deployment through Architecture (Virtual is common)One of the best ways to ensure production stability is to abide by the proven practice of maintaining a multiple farm environment- keeping all development and testing completely separate from production. This means that if for some reason, something like a looping workflow is created- only the dev or test farm will be brought down, and production will remain unaffected. If we were testing in a separate web-application, we would have been screwed. Maintaining this staged environment means all workflows, features, customizations, and metadata modifications, for instance, are all properly tested, and the affect they’ll have on production is known before anything’s even installed. But assuming we set up global SharePoint environments, and comply with MSFT best practices of maintaining separate SharePoint farms for Dev/Testing/Staging/Prod- how do we manage changes- change to design elements, solutions, features, workflow, site content- across these sharepoint deployments? Documentation is key here- if a step is missed when installing a feature, if we forget to reset IIS or forget that one file that was stored on the other web front end… repeatability is key. You’ll have to be diligent at documenting change, and who’s responsible for which components of change- To make your lives easier in this respect, there are 3rd party tools available to help you thru this process.All Inclusive Deployment – Design, Solutions, WFE
One of the great new features of service applications is that they can be published and shared between farms. This is achieved by the source server “publishing” one or more of its service applications. When a service application is published a URL is created that can be fed into the second farm, being the consumer of the service. The servers are required to be configured appropriately first, by doing things such as exchanging security certificates so the farms will trust each other, but once that is done the consumer will be able to interact with the service as if it was local, and it can associate it with its local web applications.
This theory of publishing a service application opens up some interesting possibilities around how you can architect environments that have multiple farms, and one such way to work with this is the model of having a services farm. The premise is that you can have one farm that will run service applications that would be common to your organisation, such as user profile import, or enterprise search, or managed metadata for an enterprise taxonomy. These could then be managed and run in only one location and then published out for other farms in the organisation to consume. This means that user profiles can be imported once and used globally, or content can be indexed in one location and the index published out to other farms, or a taxonomy can be built in one place and then shared to the entire organisation.
A fully distributed global architecture will provide quick access to local SharePoint content with good user experience;You want to replicate only the relevant/global content You also want to handle special remote location (like the Alaska oil ring in this slide) via local Infrastructure and replication; Requires 3rd party tool
Distribute admin tasks- site collection administration, permissions management, etc … It may suit you to break up into separate site collections for different business units in order to achieve the desired GovernanceCan only distribute tasks so muchWill require additional personnel or an admin tool
There are also significant cost savings to be realized by moving your data through different tiers of storage. I’ve provided an example for a 1TB content database stored on Tier 1 storage. In this example, the customer saved $11k for 1 single database by moving data to a cheaper tier of storage.Discussion:Are these numbers accurate? What are people paying for Tier 1 storage these days? Tier 2? Tier 3?In my example, I’ve referenced Cloud Storage in Tier 3. Is anyone storing data in the cloud? Are people comfortable storing data in the cloud?
Going back one step, what is actually stored in SharePoint and how is it stored? 95% of what’s stored are BLOBs = Binary Large Objects BLOB = anything I’m adding to SharePoint (> 256kb)
So, SharePoint content = BLOB + Metadata
So, content DB = database of … BLOBs + Metadata
This graph shows requests-per-second for varying number of users. Requests-per-second for a SQL DB is on the left, RPS with RBS is on the right.For SharePoint, the ability to scale is critical.We see with the first graph- regarding SharePoint’s Ability to Scale, that as we increase user threads, the number of requests SharePoint’s able to handle per second decreases- this correlates to the green line here- showing us that as user count increases, the time it takes for SharePoint to respond to a request drastically increases. Now lets look at a SharePoint environment with its BLOBs externalized… here we see right off the bat that SharePoint’s able to handle many more requests per second than before, and the time of response per request stays extremely low. This is definitely something that requires further investigation- to see what happens when we get up into the thousands. For the purpose of this test, we kept the user threads to 500 and under, as this is where we see drastic changes is how many requests per second SharePoint’s able to handle. Now, this test was also conducted on SharePoint 2007, so it’d also be interesting to see how this changes between SharePoint 2007 and 2010, natively, and then with EBS or RBS enabled.
For SharePoint 2010, Microsoft recommends you split content db when it hits 200GBFor 2007, split when it’s around 90GB
Here’s a look at a very basic SharePoint architecture. Here we have the front end with the object model, and the SharePoint content (blobs and metadata) are stored in SQL.
No native archiving toolsEBS extended to include RBS for BLOB removal Available only in SQL Server 2008 SP2Only accessible via APIBCS (BDC in 2007) extended to allow for easier connectivity with legacy data systems Not intended for controlling growth, only exposing additional data from other systemsIntegrated with SharePoint!Users can access contents by:Clicking and downloading directly through SharePointOpening the file using their Office clientReferencing the URLSearching for contents natively in SharePointUsers can interact with contents by:Modifying metadata and content typesModifying permissionsApplying alertsUsing workflows or publishing templatesUsing site Quotas and Locks
So now let’s take a look at RBS… As RBS is SQL specific, it can be used across applications that leverage SQL, not just SharePoint, so this gives you more of an enterprise-wide storage architecture versus EBS, and here’s how enabling an RBS provider would affect your SharePoint storage architecture. You can have an RBS provider per database. No context, no ability to manage the object
Free to help with your migration planning.
Anticipation starts when you plan your architecture
When you anticipate growth and architect your environment for scalability, you’d reap the benefits!HOWEVER…. We do need to realize that externalization, while great for performance and user experience and cost, does create complexities in how we manage our infrastructure. Backup and Recovery operations- for instance- we still will need a way to back up BLOBs SYNCHRONOUSLY. If we want to leverage content management tools, for restructuring, or replication technology, for keeping multiple sites or farms in synch, those tools need to be able to account for externalized content- whether these tools copy or move BLOBs as well, whether they just copy or move the stubs, or even whether replication technology can copy the stubs and redirect with DFSR.Backup Implications - Need a method to backup BLOBs synchronously or your DBs can get out of sync with your filesystem SharePoint 2007 – this isn’t very efficient SharePoint 2010 – this works very wellTo BLOB or not to BLOB (MIT Research for Microsoft)<256kb, SQL better256kb to 1mb, SQL and file system comparable>1mb, file system betterMicrosoft will provide a powershellsolution to migrate from EBS to RBS (check this fact!)AND LAST BUT NOT LEAST>>>> IT”S COMPLETELY SEAMLESS FOR ENDUSERS!!!! End users can still access, and INTERACT with content! Content still works with workflows, alerts, office applications, etc.
It is extremely critical to factor in BLOB externalization to your Data Protection Strategies- just as you should consider it in other content management strategies as well.And here’s why…
These are just a few options for planning for BLOB externalization- The thing to remember here- is that even if some of these methods will keep BLOBs protected- if they’re not allowing you to perform SYNCHRONOUS backups of your databases AND BLOB store- whether its in the cloud, etc- the BLOB backups will only be useful if, by chance, the timer jobs are in sync. Make sure and test whatever strategy you want to leverage.
To optimize user experience, and their reliance on SharePoint as a platform- a safe place to store their documents, etc- you’ll need to find away to provide for granular restore capabilities. Perfect example- I’ve set up countless SharePoint libraries internally- one, I made the mistake of not turning versioning on right away to keep track of document histories- critical to collaboration. While looking thru a SharePoint doc- I could have sworn I’d made other edits- so I thought, maybe the latest version is on my laptop. Instinctively, I uploaded the document and overwrote the file. Come to find out, I’d actually uploaded a previous version, overwritten my latest edits, and lost a week’s worth of work. This is a risk. Because I didn’t delete the file, it wasn’t in recycle bin- the document had to be restored for me to continue on.
Now lets look at Archiving- we mentioned this was another way to optimize storage to save costs, increase sharepoint’s scalability, and improve performance. So first I just want to do a re-cap on the types of Archiving… because depending on who you’re talking to- you could mean a couple different things. First, there’s archiving for compliance….Natively, SharePoint offers the records center. If you’re leveraging the Records center, be aware that it is essentially just another location, still in SQL, to store content. The best practice here would be to put the records center on its own database, and leverage RBS to offload content. Next, we have archiving for storage savings. Essentially, leveraging multiple tiers of storage- not just 0 or 1 or 1 and 2 like we do with BLOB externalization, but maybe leverage tiers 1-4, for example, to really achieve the greatest cost savings. Now, archiving… natively, I mentioned there were no tools, but in reality, you could essentially just create backup files of the content you’d want to “archive” and then delete them out of SharePoint. OR, you look at 3rd parties, like AvePoint’s DocAve Archiver to build business rules into your archival plans.
Things to look for in 3rd party tools- EBS/RBS is key- keeps content “in”- accessible, allows interaction