3. Mailbox Server Platform Evolution
Exchange 2010 Exchange 2013 Challenges Solutions
8 core 20+ cores • Store scales poorly past 12 cores • Multi-role servers
(2 socket) (2 socket) • Eggs/basket increase • Multi-process Store
Processor
32GB 96GB • n/a • n/a
Memory (8x4GB DIMM) (12x8GB DIMM)
3.5” 7.2K 2TB (35/server) 3.5” 7.2K 6TB • Disks getting larger but not faster • Reduce IOPS by +50%
(12/server) • DB Maintenance times • Multiple Databases/disk
• Seed reliability/duration • Tune DB maintenance
• Eggs/basket increase
Disk
4 x 1GBit Nics 2 x 10Gbit Nics • How do you leverage the • Faster seeding with multiple
bandwidth for seeding databases/disk
Network
4. Exchange Server 2013 Goals
Decrease hardware costs
Increase reliability and availability
Provide data protection enhancements
Enable faster root cause analysis through better diagnostics
Deliver core platform investments for future innovations
5.
6. Decrease Hardware Costs
Reduce IOPS by 50% compared with Exchange 2010
Disk sizes increasing (8TB) with no corresponding increase in IOPS
Larger mailboxes (100GB)
Mailbox schema and ESE pre-read optimizations
Support multiple databases per volume
Maximize disk space utilization without increased reseed times
Distribute active users across available database volumes
Drive higher adoption of JBOD deployment
Take advantage of low-cost locally attached storage
7. IOPS Reductions
Improvements to logical contiguity of store schema
Property blobs are used to store actual message properties
Several messages per page means fewer large IOs to retrieve message properties
Use of long-value storage is reduced, though when accessed, large sequential IOs are used
Reduction in passive copy IO
100MB checkpoint depth reduces write IO
Transaction log code refactored for faster failover
8. Tables Optimized for Sequential IO
Global Tables
Catalog – registry of tables existing in database
Globals – database version, etc
Mailbox – MailboxNumber, Owner Info, Locale, LastLogonTime, etc
DeliveredTo – duplicate delivery information
Events – reliable events for assistants
Tables partitioned by MailboxNumber
Folder - FolderId, Item Count, Size, PropertyBlob
Message – DocumentId, MessageId, FolderId, PropertyBlob, OffPagePropertyBlob,
MessageClass ordered by DateReceived
Attachment – AttachmentId, Name, Size, CreationTime, etc
PhysicalIndexes (partitioned by LogicalIndex)
9. Message Table Property Storage
Blobs used to store collection of MAPI properties
Referred to as On-page and Off-page property blobs
ESE compression optimizes physical storage of blob data
Compression more efficient when input contains more properties
PropertyBlob
Contains properties previously stored in Header table in message table column
Property promotion OffPagePropertyBlob PropertyBlob possible
Blob size limited to eliminate LV tree access for core message properties
OffPagePropertyBlob
ESE LV Hints push storage of this blob into separate LV tree
Reading LV tree involves large sequential I/O
10.
11. Higher Reliability and Availability
Improved isolation from hardware/software failures
Store process per database, faster failover and disk failure handling
Built-in monitoring and availability management
Best copy and server selection includes health of entire protocol stack
Service recovery through failover and/or restart
Non-stop operations
No scheduled mailbox database maintenance
Autoreseed automatically restores redundancy on disk failure
Maintain data protection without manual intervention
Dynamically uses spare disks to restore database copy health
13. Managed Store
Store service/process (Microsoft.Exchange.Store.Service.exe)
Microsoft Information Store service
Manages worker process lifetime based on mount/dismount
Logs failure item when store worker process problems detected
Terminates store worker process in response to “dirty” dismount during failover
Store worker process (Microsoft.Exchange.Store.Worker.exe)
One process per database, RPC endpoint instance is database GUID
Responsible for block-mode replication for passive databases
Fast transition to active when mounted
Transition from passive to active increases ESE cache size 5X
14. Microsoft Exchange Replication service
Replication service process (MSExchangeRepl.exe)
Detecting unexpected database failures
Issues mount/dismount operations to Store
Provides administrative interface for management tasks
Initiates failovers on failures reported by ESE, Store and Responders
15. ESE Cache Management
Allocates 25% of memory for store worker process ESE cache
This is referred to as the max cache target
Amount allocated to each store worker process based on number of hosted database copies and value
of MaximumActiveDatabases
Static amount of cache allocated to passive and active copies
Store worker process will only use max cache target when copy is
active
Passive database copies allocate 20% of max cache target
Max cache target computed at service process startup
Restart Store service process when adding/removing copies or changing value of
MaximumActiveDatabases
17. Recurring Maintenance
Scheduled maintenance is eliminated in Exchange 2013
Recurring maintenance now part of time-based assistant (TBA)
infrastructure
StoreMaintenance: lazy index maintenance, isinteg
StoreDirectoryServiceMaintenance: disconnected mailbox expiration
Workload Management monitors CPU, RPC latency, and replication
health
Task execution throttled/deferred when resource pressure exists
Background ESE database scanning further throttled
Based on datacenter disk failure analysis, target to complete background database scan within 4
weeks (using multiple databases on 8 TB disks)
18. Managed Availability
Tests determine viability of various components on Mailbox server
Database connectivity and replication
Protocol services (Outlook, OWA, EAS, IMAP, POP)
Recommend HA actions when service-impacting condition found
Database failover
Restart service
Restart computer
Escalate when auto recovery unsuccessful and service not restored
Integration with System Center to raise awareness of service-impacting conditions that cannot be
automatically resolved
19. Managed Availability
Name Trigger/Recovery sequence
Database Availability 12 logon failures in 16 minutes Escalate
Store service not running Restart service Bugcheck Escalate
Database Free space Free disk space drops below 10% Escalate
Store service process repeatedly crashing 3 crashes for store service in 1 hour Escalate
Store worker process repeatedly crashing 3 crashes for store work (across all workers) in 1 hour Escalate
Percent RPC requests 90% of available threads per database Database Failover Escalate
70ms RPC latency 70ms RPC Avg latency Determine impact scope Id/quarantine mailbox Escalate
150ms RPC latency 150ms RPC Avg latency Determine impact scope Id/quarantine mailbox Escalate
Mailbox quarantined More than 1 mailbox quarantined on database Escalate
Assistants service not running Restart service Escalate
Event assistants behind watermarks* Assistant watermark age exceeds threshold Escalate
Number of search tasks* Count of search tasks exceeds threshold Escalate
20. Mailbox Quota Management
Reduction in overhead to generate over-quota notification
At logon time, system evaluates mailbox quota against policy
Sends over-quota notification message once per notification interval, notifications are NOT sent to
inactive mailboxes
Mailbox size calculation is more accurate measurement of mailbox
database storage used
Includes both internal and end-user items/properties
Mailbox size will likely increase when moved to Exchange 2013
Search metadata stored on items increases overall mailbox size
No increase in database footprint
Should plan to increase quota per mailbox
21.
22. Data Protection Improvements
Autoreseed automatically restores redundancy on disk failure
Lag copies can “care for themselves”
Play down when low on space, during page patching, and when required for availability (no other
copies available)
Lagged copy activation can be simplified with transport enhancements
24. Data Protection Improvements
Continued support of VSS backup API
Windows Server Backup support, supports both active and passive database copies
3rd party VSS applications should be compatible with Exchange 2013 without major changes1
VSS full, copy, incremental, and differential backup/restore supported
Windows Server Backup supports backup of both active and
passive databases
Scheduled backup succeeds regardless of mount state
1 Backup vendors responsible for integration and supportability statements
25.
26. Diagnostic Improvements
Insight into runtime without dumps or external clients
PowerShell access to mailbox database internals and in-memory state
Eliminate need for end-user repro
Always on tracing to capture “outlier” operational behavior
30. Core Investments
Integration of new search engine – Search Foundation
Same search engine used by SharePoint 2013
C# Development Platform
Improved developer productivity through better tools
Larger community of developers within team available to contribute
Better layering of implementation
Implementation of physical layer isolates underlying database engine
from upper (logical) layers and MAPI implementation
31. Virus Scanning API (VSAPI)
Exchange 2013 does not support VSAPI
It does include transport extensibility to scan messages in-flight
3rd party A/V extensions no longer run in Store process
On-demand scanning not considered an effective solution with
clients that cache data (Outlook, OWA, EAS, POP, IMAP)
EWS is available for scheduled and on-demand scan scenarios
32.
33. E14 vs. E15: DITL Performance Comparison
4
0.70 Online Mode | Cached Mode
3.5
0.60
• 48 | 76% reduction in disk
3 IOPS
0.50
2.5
0.65 • 18 | 41% reduction in Average
0.40
2 RPC Latency
0.30
1.5
• 17 | 34% increase in CPU per
0.20
1
RPC processed
0.10
0.16
0.5 • ~4x increase in Store memory
0.00
overhead
0
RPC Average Mcycles per RPC Store Memory
DB IOPS/Mailbox Latency packet per Mailbox (MB)
34. E14 vs. E15: DITL Performance Comparison
LoadGen Simulation – 10 DBs/1000 users
4
0.70 3.5 Two profiles: Online and Cached (Default/Optimized)
3
0.60
2.5
0.50
0.65
2
1.5
Perf gains are not free – increase in CPU and memory
0.40
1
0.30
0.5
0.20
0.16
0 CPU increase is factor of optimizing for two-socket servers
0.10 RPC
Average
Mcycles
per RPC
Store
Memory and moving to multi-process architecture
0.00 Latency packet per
DB IOPS/Mailbox
Mailbox Enables us to scale out using multi-core processors
without having to cross processor bridge to access
(MB)
shared L2 cache
Online Mode | Cached Mode
Some CPU overhead comes from using managed code
• 48 | 76% reduction in disk IOPS
• 18 | 41% reduction in Average RPC Latency Memory increase is also factor of multi-processor
• 17 | 34% increase in CPU per RPC processed architecture
• ~4x increase in Store memory overhead Most of the memory is in small and large object heaps
in .NET primarily used for object allocation and cleanup
36. Summary
Mailbox storage has…
Reduced IOPS by 50-70%...again!
Optimized for large disks (8TB) and larger mailboxes (+100GB)
Better isolation leading to higher reliability
Built-in monitoring and recovery to drive higher availability
Improved data protection to reduce risk of data loss
Use on-page blob in message table instead of header table with ESE columns to store properties used frequently in view operationsReduce use of LV storage, optimize use of LV storage when necessary
Single process per DB, drives faster failover. Previously, failover could take longer than 90 sec in some (~5%) scenarios.We eliminate this in Exchange 2013, as we can stop the database in a very consistent manor. Dirty database mount in Exchange 2013 is very timely.1% of failovers slow, causing up to many 10-min outages per week.We now also have the ability to abort a mount (for example, to choose a better copy). HA can choose a different copy that may be better if current mount doesn’t succeed within timeout interval.No more scheduled maintenance, as well.
The ability to kill an individual worker without a negative impact on other databases. (All about process isolation).40 databases = 41 processes. The 41st process is the Store service process controller. Each Database has its own individual worker process.The store process controller is very thin and very reliable, but if it dies all worker processes die (detect service process is gone and exit). The store process controller monitors the health of all store worker processes on the server.Forcible or unexpected termination the Microsoft.Exchange.Store.Service.exe causes an immediate failover of all active database copies.
Moving to a world of priority and health of service.Run discretionary workload tasks when resources are available:As systems degrade, degrade them as gracefully as possible. Try to get the server to run as best as it can in the absence of good management.TBA monitors CPU, RPC Latency, and replication feedbackTask execution throttled/deferred when resource pressure exists, higher priority tasks run instead of lower priority tasksLazy index maintenance is opportunistic (can be deferred) but the longer it is deferred the more likely that maintenance will need to be applied to index in “real-time” to service RPC operation (like QueryRows).Maintenance records. Can reduce IO be reducing operations by deferring operations and coalescing IOPS.DB SCAN was put in place to run on order of week. It is now reasonable to throttle and scan the entire DB once every 4 weeks targeting 8 TB disks.
This list are the ones that are enabled by default in on-premises.
Administrators need to allocate more quota but not more capacityBackground TasksQuota Notification Inefficiency increases as number of mailboxes per database increasesIn E14 was done based on a user logon, only generating for active mailboxes in E15 it will be done in the RPC logon if they have not received notification in the last 24 hours.E15Will put a message once every 24 hours as long as they doing activity of if the client is performing RPC operations.Randomly distribute across the usersGives them a current calculationQuota calculation is measured differently in E151. E14 compute on a subset of the content in Mailbox, i.e. iceberg on top of water.2. IN E15 the entire size will be attributed to the mailbox Quota.3. A property that requires storage contributes to the quota calculation4. In O365 have problems with item properties counting against storage but not account against Quota.5. Should not matter to end user as user will get large mailbox6. This extra space accounted against the quota will be measured 14 GB is really 14 GB even if it appeared as 12 GB to the user in E14, will only have affect against mailboxes if administrator does not increase quota. Move will be allowed to succeed on an intra org move within the forest. Will now degrade the experience. If you are going to a cloud or cross forest migration then we will not allow the move.E15 focuses on the administrators experience for Quota not the users perception. Allows the Administrator to manage storage. The virtual storage is what we base the Quota on, not the actual size. This may mean the on disk footprint is smaller than the actual Logical size if on disk. Depends on at what stage. Realize the item may change size as it moves through the Exchange system. The Quota is based on pre-compression calculation. As long as we do not overprovision.Analogy: NTFS quotas work on logical file size (not impacted when enabling NTFS file compression). Both store and NTFS can store content compressed, use of physical compression does not impact logical size of content used for quota calculation but does reduce storage required.
The VS API was the only way historically (prior to Exchange 2007). Since then we have moved to a set of solutions that solve most of the scenarios in other ways:We could scan messages in flight, EWS has enabled access to content via bulk interface. Being able to take the content of the mailbox and back it up is a bulk example.These new methods allow us not to drop support for VSAPI. 3rd party code ran in store and directly impacted the reliability of the product. VSAPI allowed problems to be introduced inside of our software. Now the store layer will not allow this to run in store. We dropped on demand scanning, scenario not relevant with cached clients (Outlook, OWA, ActiveSync, IMAP, POP). We are no longer in the age of 25 MB mailboxes which means we cannot do some of the simplistic methods of scanning.Blindly rerunning virus scan on a scheduled basis does not offer a lot of value. You have to have detection and protection at the desktop. The OS has to be stronger, too.
DITL = day in the lifeResults based on daily Outlook cached mode Loadgen simulations (10 databases, 1000 users) to measure key metrics used to identify performance improvements/regressions.Since the beginning of the Exchange 2013 release cycle, we have been running a standard daily performance test that compares Exchange 2013 (blue) with Exchange 2010 (yellow) using identical LoadGen profiles for both. There are two profiles: one for Outlook in Online Mode and one for Outlook in Cached Mode.The charts above depict the default profile setting, which is Outlook in Cache Mode (which Exchange 2013 is tuned for, and which is the best case example of IOPS reductions). The chart on the left represents the reduction in IOPS from Exchange 2010 (.65) to Exchange 2013 (.16) per mailbox.We also look at three other metrics that are important from a performance standpoint because they relate to the user experience and the hardware aspects. We especially need to ensure that we maintain a low RPC Averaged Latency, as that relates directly to the quality of the user experience in terms of response time. But because we have reduced IO, we’re actually able to reduce RPC Averaged Latency between 18 (online mode) and 41% (cached mode).These gains are not for free, though. There is a corresponding increase in both CPU and memory utilization on the server. The CPU increase stems from the fact that we have optimized for two-socket servers and moved to a multi-process architecture for the store. This enables us to run store processes on separate cores, and effectively scale out the store across the cores. In a multi-process environment, you have to leverage shared L2 cache. In a four-socket environment, instructions need to cross the processor bridge in order to access to the L2 cache on another core. In a two-socket environment, we can scale out by using multi-core processors and a multi-process architecture for our databases. This means that processes can be distributed across multiple cores, but accessing the L2 cache on another core does not require crossing the bridge.But there is a cost. The multi-process architecture we use is written in managed code (C#). That’s a higher level language that the C++ we used for all previous versions of Store.exe. C# has some higher CPU requirements, as well. So there is a corresponding increase in CPU requirements of up to 35%; however, with more and more powerful processors and the ability to add more cores, we feel that some if not much of these gains will be offset by that.Above and beyond the ESE cache, the Store uses memory to create the objects that it needs to in its overhead. In Exchange 2010, we used about 1GB of memory. In Exchange 2013, we use about 4GB of memory. This is a large increase, but in terms of the amount of memory in the server (e.g., 48, 64 and 96GB servers), it can be a small overall increase.The memory increase is also a factor of the multi-process environment. In the chart on the right, this is a server instantiating 10 databases. There is memory overhead associated with each database that is instantiated.Essentially, this is the overhead price that is paid for having the process isolation provided by multi-process, as well as optimizing our logic around multi-process, in order to benefit from multi-process.Where does it go? Its in the heap. We spend a lot of time in small object and large object heaps in .NET to try to minimize the amount of memory used in the process as part of object allocation and cleanup, but there are simply things in .NET we cannot solve, and so we have accepted this increase as something we will live with.
DITL = day in the lifeResults based on daily Outlook cached mode Loadgen simulations (10 databases, 1000 users) to measure key metrics used to identify performance improvements/regressions.Since the beginning of the Exchange 2013 release cycle, we have been running a standard daily performance test that compares Exchange 2013 (blue) with Exchange 2010 (yellow) using identical LoadGen profiles for both. There are two profiles: one for Outlook in Online Mode and one for Outlook in Cached Mode.The charts above depict the default profile setting, which is Outlook in Cache Mode (which Exchange 2013 is tuned for, and which is the best case example of IOPS reductions). The chart on the left represents the reduction in IOPS from Exchange 2010 (.65) to Exchange 2013 (.16) per mailbox.We also look at three other metrics that are important from a performance standpoint because they relate to the user experience and the hardware aspects. We especially need to ensure that we maintain a low RPC Averaged Latency, as that relates directly to the quality of the user experience in terms of response time. But because we have reduced IO, we’re actually able to reduce RPC Averaged Latency between 18 (online mode) and 41% (cached mode).These gains are not for free, though. There is a corresponding increase in both CPU and memory utilization on the server. The CPU increase stems from the fact that we have optimized for two-socket servers and moved to a multi-process architecture for the store. This enables us to run store processes on separate cores, and effectively scale out the store across the cores. In a multi-process environment, you have to leverage shared L2 cache. In a four-socket environment, instructions need to cross the processor bridge in order to access to the L2 cache on another core. In a two-socket environment, we can scale out by using multi-core processors and a multi-process architecture for our databases. This means that processes can be distributed across multiple cores, but accessing the L2 cache on another core does not require crossing the bridge.But there is a cost. The multi-process architecture we use is written in managed code (C#). That’s a higher level language that the C++ we used for all previous versions of Store.exe. C# has some higher CPU requirements, as well. So there is a corresponding increase in CPU requirements of up to 35%; however, with more and more powerful processors and the ability to add more cores, we feel that some if not much of these gains will be offset by that.Above and beyond the ESE cache, the Store uses memory to create the objects that it needs to in its overhead. In Exchange 2010, we used about 1GB of memory. In Exchange 2013, we use about 4GB of memory. This is a large increase, but in terms of the amount of memory in the server (e.g., 48, 64 and 96GB servers), it can be a small overall increase.The memory increase is also a factor of the multi-process environment. In the chart on the right, this is a server instantiating 10 databases. There is memory overhead associated with each database that is instantiated.Essentially, this is the overhead price that is paid for having the process isolation provided by multi-process, as well as optimizing our logic around multi-process, in order to benefit from multi-process.Where does it go? Its in the heap. We spend a lot of time in small object and large object heaps in .NET to try to minimize the amount of memory used in the process as part of object allocation and cleanup, but there are simply things in .NET we cannot solve, and so we have accepted this increase as something we will live with.