SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
What is Business Continuity?

    Ÿ   Business Continuity is the preparation for, response to, and recovery from an application
        outage that adversely affects business operations
    Ÿ   Business Continuity Solutions address systems unavailability, degraded application
        performance, or unacceptable recovery strategies

There are many factors that need to be considered when calculating the cost of downtime. A
formula to calculate the costs of the outage should capture both the cost of lost productivity of
employees and the cost of lost income from missed sales.
            Ÿ The Estimated average cost of 1 hour of downtime = (Employee costs per hour)
                *( Number of employees affected by outage) + (Average Income per hour).
            Ÿ Employee costs per hour is simply the total salaries and benefits of all employees
                per week, divided by the average number of working hours per week.
            Ÿ Average income per hour is just the total income of an institution per week,
                divided by average number of hours per week that an institution is open for
                business.

Recovery Point Objective (RPO) is the point in time to which systems and data must be
recovered after an outage. This defines the amount of data loss a business can endure. Different
business units within an organization may have varying RPOs.

Recovery Time Objective (RTO) is the period of time within which systems, applications, or
functions must be recovered after an outage. This defines the amount of downtime that a
business can endure, and survive.

Disaster Recovery versus Disaster Restart

    Ÿ   Most business critical applications have some level of data interdependencies
    Ÿ   Disaster recovery
            – Restoring previous copy of data and applying logs to that copy to bring it to a
                known point of consistency
            – Generally implies the use of backup technology
            – Data copied to tape and then shipped off-site
            – Requires manual intervention during the restore and recovery processes
    Ÿ   Disaster restart
            – Process of restarting mirrored consistent copies of data and applications
            – Allows restart of all participating DBMS to a common point of consistency utilizing
                automated application of recovery logs during DBMS initialization
            – The restart time is comparable to the length of time required for the application
                to restart after a power failure.

Elevated demand for increased application availability confirms the need to ensure business
continuity practices are consistent with business needs.
Interruptions are classified as either planned or unplanned. Failure to address these specific
outage categories seriously compromises a company’s ability to meet business goals.
Planned downtime is expected and scheduled, but it is still downtime causing data to be
unavailable. Causes of planned downtime include:
             Ÿ New hardware installation/integration/maintenance
             Ÿ Software upgrades/patches
             Ÿ Backups
             Ÿ Application and data restore
             Ÿ Data center disruptions from facility operations (renovations, construction, other)
Ÿ   Refreshing a testing or development environment with production data
            Ÿ   Porting testing/development environment over to production environment

Today, the most critical component of an organization is information. Any disaster occurrence will
affect information availability critical to run normal business operations.
In our definition of disaster, the organization’s primary systems, data, applications are damaged
or destroyed. Not all unplanned disruptions constitute a disaster.

Business Continuity is a holistic approach to planning, preparing, and recovering from an adverse
event. The focus is on prevention, identifying risks, and developing procedures to ensure the
continuity of business function. Disaster recovery planning should be included as part of
business continuity.
BC Objectives include:
             Ÿ    Facilitate uninterrupted business support despite the occurrence of problems.
             Ÿ    Create plans that identify risks and mitigate them wherever possible.
             Ÿ    Provide a road map to recover from any event.
Disaster Recovery is more about specific cures, to restore service and damaged assets after an
adverse event. In our context, Disaster Recovery is the coordinated process of restoring systems,
data, and infrastructure required to support key ongoing business operations.

Business Continuity Planning (BCP) is a risk management discipline. It involves the entire
business--not just IT. BCP proactively identifies vulnerabilities and risks, planning in advance how
to prepare for and respond to a business disruption. A business with strong BC practices in place
is better able to continue running the business through the disruption and to return to “business
as usual.”
BCP actually reduces the risk and costs of an adverse event because the process often uncovers
and mitigates potential problems.
The Business Continuity Planning process includes the following stages:
    1. Objectives
             Ÿ Determine business continuity requirements and objectives including scope and
                  budget
             Ÿ Team selection (include all areas of the business and subject matter expertise
                  (internal/external)
             Ÿ Create the project plan
    2. Perform analysis
             Ÿ Collect information on data, business processes, infrastructure supports,
                  dependencies, frequency of use
             Ÿ Identify critical needs and assign recovery priorities.
             Ÿ Create a risk analysis (areas of exposure) and mitigation strategies wherever
                  possible.
             Ÿ Create a Business Impact Analysis (BIA)
             Ÿ Create a Cost/benefit analysis – identify the cost (per hour/day, etc.) to the
                  business when data is unavailable.
             Ÿ Evaluate Options
3. Design and Develop the BCP/Strategies
             Ÿ Evaluate options
             Ÿ Define roles/responsibilities
             Ÿ Develop contingency scenarios
             Ÿ Develop emergency response procedures
             Ÿ Detail recovery, resumption, and restore procedures
             Ÿ Design data protection strategies and develop infrastructure
             Ÿ Implement risk management/mitigation procedures
    4. Train, test, and document
5. Implement, maintain, and assess

This is an example of Business Impact Analysis (BIA). The dollar values are arbitrary and are
used just for illustration. BIA quantifies the impact that an outage will have to the business and
potential costs associated with the interruption. It helps businesses channel their resources
based on probability of failure and associated costs.




Identifying Single Points of Failure

                                                             Configure multiple HBAs, and use
                                                             multi-pathing software
                                                                         Ø Protects against HBA
                                                                             failure
                                                                         Ø Can provide improved
                                                                             performance (vendor
                                                                             dependent)
Planning and configuring clusters is a complex task. At a high level:
Ÿ        A cluster is two or more hosts with access to the same set of storage (array) devices
Ÿ        Simplest configuration is a two node (host) cluster
Ÿ        One of the nodes would be the production server while the other would be configured as
a standby. This configuration is described as Active/Passive.
Ÿ        Participating nodes exchange “heart-beats” or “keep-alives” to inform each other about
their health.
Ÿ        In the event of the primary node failure, cluster management software will shift the
production workload to the standby server.
Ÿ        Implementation of the cluster failover process is vendor specific.
Ÿ        A more complex configuration would be to have both the nodes run production workload
on the same set of devices. Either cluster software or application/database should then provide a
locking mechanism so that the nodes do not try to update the same areas on disk
simultaneously. This would be an Active/Active configuration.
Local Replication:

    Ÿ   Data from the production devices is copied over to a set of target (replica) devices.
    Ÿ   After some time, the replica devices will contain identical data as those on the production
        devices.
    Ÿ   Subsequently copying of data can be halted. At this point-in-time, the replica devices can
        be used independently of the production devices.
    Ÿ   The replicas can then be used for restore operations in the event of data corruption or
        other events.
    Ÿ   Alternatively the data from the replica devices can be copied to tape. This off-loads the
        burden of backup from the production devices.

Remote Replication:

    Ÿ   The goals of remote replication are the same as local replication except that data is
        replicated to different storage arrays
    Ÿ   Storage arrays can be side by side or thousands of miles apart.
    Ÿ   If replicated to remote location, business can continue with little or no interruption and
        little or no loss of data if primary site is lost.

Backup Restore:

    Ÿ   Backup to tape has been the predominant method for ensuring data availability and
        business continuity.
    Ÿ   Low cost, high capacity disk drives are now being used for backup to disk. This
        considerably speeds up the backup and the restore process.
    Ÿ   Frequency of backup will be dictated by defined RPO/RTO requirements as well as the
        rate of change of data.

Powerpath:

PowerPath is host-based software that resides between the application and the disk device
layers. Every I/O from the host to the array must pass through the PowerPath driver software.
This allows PowerPath to work in conjunction with the array and connectivity environment to
provide intelligent I/O path management. This includes path failover and dynamic load balancing,
while remaining transparent to any application I/O requests as it automatically detects and
recovers from host-to-array path failures.
PowerPath is supported on various hosts and Operating Systems such as Sun- Solaris, IBM-AIX,
HP-UX, Microsoft Windows, Linux, and Novell. Storage arrays from EMC, Hitachi, HP, and IBM are
supported. The level of OS and array models supported will vary between PowerPath software
versions.




PowerPath maximizes application availability, optimizes performance, and automates online
storage management while reducing complexity and cost, all from one powerful data path
management solution. PowerPath supports the following features:

   Ÿ Multiple path support - PowerPath supports multiple paths between a logical device and a
host. Multiple paths enables the host to access a logical device, even if a specific path is
unavailable. Also, multiple paths enable sharing of the I/O workload to a given logical device.
   Ÿ Dynamic load balancing - PowerPath is designed to use all paths at all times. PowerPath
distributes I/O requests to a logical device across all available paths, rather than requiring a
single path to bear the entire I/O burden.
   Ÿ Proactive path testing and automatic path recovery - PowerPath uses a path test to ascertain
the viability of a path. After a path fails, PowerPath continues testing it periodically to determine
if it is fixed. If the path passes the test, PowerPath restores it to service and resumes sending I/O
to it.
   Ÿ Automatic path failover - If a path fails, PowerPath redistributes I/O traffic from that path to
functioning paths.
   Ÿ Online configuration and management - PowerPath management interfaces include a
command line interface and a GUI interface on Windows.
   Ÿ High availability cluster support - PowerPath is particularly beneficial in cluster environments,
as it can prevent operational interruptions and costly downtime.

Without PowerPath, if a host needed access to 40 devices, and there were four host bus
adapters, you would most likely configure it to present 10 unique devices each host bus adapter.
With PowerPath, you would configure it in a way to allow all 40 devices could be “seen” by all
four host bus adapters.
PowerPath supports up to 32
paths to a logical volume. The
host can be connected to the
array using a number of
interconnect topologies such as
SAN, SCSI, or iSCSI.




The PowerPath filter driver is a platform independent driver that resides between the application
and HBA driver.
The driver identifies all paths that read and write to the same device and builds a routing table
called a volume path set for the device. A volume path set is created for each shared device in
the array .
PowerPath can use any path in the set to service an I/O request. If a path fails, PowerPath can
redirect an I/O request from that path to any other available path in the set. This redirection is
transparent to the application, which does not receive an error.




This example depicts how PowerPath failover works. When a failure occurs, PowerPath
transparently redirects the I/O down the most suitable alternate path. The PowerPath filter driver
looks at the volume path set for the device, considers current workload, load balancing, and
device priority settings, and chooses the best path to send the I/O down. In the example,
PowerPath has three remaining paths to redirect the failed I/O and to load balance.

A Backup is a copy of the online data that resides on primary storage. The backup copy is
created and retained for the sole purpose of recovering deleted, broken, or corrupted data on the
primary disk.
The backup copy is usually retained over a period of time, depending on the type of the data,
and on the type of backup. There are three derivatives for backup: disaster recovery, Archival,
and operational backup. We will review them in more detail, on the next slide.
The data that is backed up may be on such media as disk or tape, depending on the backup
derivative the customer is targeting. For example, backing up to disk may be more efficient than
tape in operational backup environments.

Several choices are available to get the data written to the backup media.

1. You can simply copy the data from the primary storage to the secondary storage (disk or
   tape), onsite. This is a simple strategy, easily implemented, but impacts the production
   server where the data is located, since it will use the server’s resources. This may be
   tolerated on some applications, but not high demand ones.
2. To avoid an impact on the production application, and to perform serverless backups, you
   can mirror (or snap) a production volume. For example, you can mount it on a separate
   server and then copy it to the backup media (disk or tape). This option will completely free
   up the production server, with the added infrastructure cost associated with additional
   resources.
3. Remote Backup, can be used to comply with offsite requirements. A copy from the primary
   storage is done directly to the backup media that is sitting on another site. The backup media
   can be a real library, a virtual library or even a remote filesystem.
4. You can do a copy to a first set of backup media, which will be kept onsite for operational
   restore requirements, and then duplicate it to another set of media for offsite purposes. To
   simplify thr procedure, you can replicate it to an offsite location to remove any manual
   procedures associated with moving the backup media to another site.

Disaster Recovery addresses the requirement to be able to restore all, or a large part of, an IT
infrastructure in the event of a major disaster.
Archival is a common requirement used to preserve transaction records, email, and other
business work products for regulatory compliance. The regulations could be internal,
governmental, or perhaps derived from specific industry requirements.
Operational is typically the collection of data for the eventual purpose of restoring, at some
point in the future, data that has become lost or corrupted.

Reasons for a backup plan include:

Ÿ   Physical damage to a storage element (such as a disk) that can result in data loss.
Ÿ   People make mistakes and unhappy employees or external hackers may breach security and
    maliciously destroy data.
Ÿ   Software failures can destroy or lose data and viruses can destroy data, impact data
    integrity, and halt key operations.
Ÿ   Physical security breaches can destroy equipment that contains data and applications.
Ÿ   Natural disasters and other events such as earthquakes, lightning strikes, floods, tornados,
    hurricanes, accidents, chemical spills, and power grid failures can cause not only the loss of
    data but also the loss of an entire computer facility. Offsite data storage is often justified to
    protect a business from these types of events.
Ÿ   Government regulations may require certain data to be kept for extended timeframes.
    Corporations may establish their own extended retention policies for intellectual property to
    protect them against litigation. The regulations and business requirements that drive data as
    an archive generally require data to be retained at an offsite location.

Backup products vary, but they do have some common characteristics. The basic architecture of
a backup system is client-server, with a backup server and some number of backup clients or
agents. The backup server directs the operations and owns the backup catalog (the
information about the backup). The catalog contains the table-of-contents for the data set. It
also contains information about the backup session itself.
The backup server depends on the backup client to gather the data to be backed up. The backup
client can be local or it can reside on another system, presumably to backup the data visible to
that system. A backup server receives backup metadata from backup clients to perform its
activities.
There is another component called a storage node. The storage node is the entity responsible
for writing the data set to the backup device. Typically there is a storage node packaged with the
backup server and the backup device is attached directly to the backup server’s host platform.
Storage nodes play an important role in backup planning as it can be used to consolidate backup
servers.

The following represents a typical Backup
process:

Ÿ   The Backup Server initiates the
    backup process (starts the backup
    application).
Ÿ   The Backup Server sends a request to
    a server to “send me your data”.
Ÿ   The server sends the data to the
    Backup Server and/or Storage Node.
Ÿ   The Storage Node sends the data to
    the tape storage device and the
    Backup Server begins building the
    catalog (metadata) of the backup
    session.
Ÿ   When all of the data has been
    transferred from the server to the
    Backup Server, the Backup Server
    writes the catalog to a disk file and
    closes the connection to the tape
    device.


Some important decisions that need consideration before implementing a Backup/Restore
solution are shown above. Some examples include:
Ÿ The Recovery Point Objective (RPO)
Ÿ The Recovery Time Objective (RTO)
Ÿ The media type to be used (disk or tape)
Ÿ Where and when the restore operations will occur – especially if an alternative host will be
    used to receive the restore data.
Ÿ When to perform backups.
Ÿ The granularity of backups – Full, Incremental or cumulative.
Ÿ   How long to keep the backup – for example, some backups need to be retained for 4 years,
    others just for 1 month
Ÿ   Is it necessary to take copies of the backup or not.

Data Considerations: File Characteristics

Ÿ   Location: Many organizations have dozens of heterogeneous platforms that support a
    complex application. Consider a data warehouse where data from many sources is fed into
    the warehouse. When this scenario is viewed as “The Data Warehouse Application”, it easily
    fits this model. Some of the issues are:
−   How the backups for subsets of the data are synchronized
−   How these applications are restored
Ÿ   Size: Backing up a large amount of data that consists of a few big files may have less system
    overhead than backing up a large number of small files. If a file system contains millions of
    small files, the very nature of searching the file system structures for changed files can take
    hours, since the entire file structure is searched.
Ÿ   Number: a file system containing one million files with a ten-percent daily change rate will
    potentially have to create 100,000 entries in the backup catalog. This brings up other issues
    such as:
−   How a massive file system search impacts the system
−   Search time/Media impact
−   Is there an impact on tape start/stop processing?

Data Considerations: Data Compression

Many backup devices such as tape drives, have built-in hardware compression technologies. To
effectively use these technologies, it is important to understand the characteristics of the data.
Some data, such as application binaries, do not compress well. Text data can compress very well,
while other data, such as JPEG and ZIP files, are already compressed.

Data Considerations: Retention Periods

As mentioned before, there are three types of backup models (Operational, Disaster Recovery,
and Archive). Each can be defined by its retention period. Retention Periods are the length
of time that a particular version of a dataset is available to be restored.
Retention periods are driven by the type of recovery the business is trying to achieve:
Ÿ For operational restore, data sets could be maintained on a disk primary backup storage
     target for a period of time, where most restore requests are likely to be achieved, and then
     moved to a secondary backup storage target, such as tape, for long term offsite storage.
Ÿ For disaster recovery, backups must be done and moved to an offsite location.
Ÿ For archiving, requirements usually will be driven by the organization’s policy and regulatory
     conformance requirements. Tapes can be used for some applications, but for others a more
     robust and reliable solution, such as disks, may be more appropriate.

Backup Methods:

Backing up databases can occur useing two different methods:
              Ÿ A Hot backup, which means that the application is still up and running, with
                  users accessing it, while backup is taking place.
              Ÿ A Cold backup, which means that the application will be shut down for the
                  backup to take place.
Most backup applications offer various Backup Agents to do these kinds of operations. There
will be different agents for different types of data and applications.
The granularity and levels for backups depend on business needs, and, to some extent,
technological limitations. Some backup strategies define as many as ten levels of backup. IT
organizations use a combination of these to fulfill their requirements. Most use some combination
of Full, Cumulative, and Incremental backups.
A Full backup is a backup of all data on the target volumes, regardless of any changes made to
the data itself.
An Incremental backup contains the changes since the last backup, of any type, whichever
was most recent.
A Cumulative backup, also known as a Differential backup, is a type of incremental that
contains changes made to a file since the last full backup.




Following is an example of an incremental backup and restore:

    1. A full backup of the business data is taken on Monday evening. Each day after that, an
       incremental backup is taken. These incremental backups only backup files that are new
       or that have changed since the last full or incremental backup.
2. On Tuesday, a new file is added, File 4. No other files have changed. Since File 4 is a
      new file added after the previous backup on Monday evening, it will be backed up
      Tuesday evening.
   3. On Wednesday, there are no new files added since Tuesday, but File 3 has changed.
      Since File 3 was changed after the previous evening backup (Tuesday), it will be backed
      up Wednesday evening.
   4. On Thursday, no files have changed but a new file has been added, File 5. Since File 5
      was added after the previous evening backup, it will be backed up Thursday evening.
   5. On Friday morning, there is a data corruption, so the data must be restored from tape.
          a. The first step is to restore the full backup from Monday evening. Then, every
               incremental backup that was done since the last full backup must be applied,
               which, in this example, means the:
          b. Tuesday,
          c. Wednesday, and
          d. Thursday incremental backups.




The following is an example of cumulative backup and restore:

   1. A full backup of the data is taken on Monday evening. Each day after that, a cumulative
      backup is taken. These cumulative backups backup ALL FILES that have changed since
      the LAST FULL BACKUP.
   2. On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last
      full backup, it will be backed up Tuesday evening.
   3. On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have
      been added or changed since the last full backup, both files will be backed up
      Wednesday evening.
   4. On Thursday, File 6 is added. Again, File 4, File 5, and File 6 are files that have been
      added or changed since the last full backup; all three files will be backed up Thursday
      evening.
   5. On Friday morning, there is a corruption of the data, so the data must be restored from
      tape.
           a. The first step is to restore the full backup from Monday evening.
           b. Then, only the backup from Thursday evening is restored because it contains all
               the new/changed files from Tuesday, Wednesday, and Thursday.
Backup Architecture Topologies


   Ÿ   There are 3 basic backup topologies:
          – Direct Attached Based Backup
          – LAN Based Backup
          – SAN Based Backup
   Ÿ   These topologies can be integrated, forming a “mixed” topology

Direct Attached Based Backups




Lan Based Backups:
SAN Based backups (LAN Free):




SAN/LAN Mixed Based Backups:
Backup Media:

    Ÿ   Tape                                             Ÿ   Disk
           –    Traditional destination for                      –   Random access
                backups                                          –   Protected by the storage
            –   Sequential access                                    array (RAID, hot spare, etc)
            –   No protection




    Ÿ   Multiple streams interleaved to achieve higher throughput on tape
            – Keeps the tape streaming, for maximum write performance
            – Helps prevent tape mechanical failure
            – Greatly increases time to restore

Tape drive streaming is recommended from all vendors, in order to keep the drive busy. If you
do not keep the drive busy during the backup process (writing), performance will suffer. Multiple
streaming helps to improve performance drastically, but it generates one issue as well: the
backup data becomes interleaved, and thus the recovery times are
increased
Backup to disk replaces tape and its associated devices, as the primary target for
backup, with disk. Backup to disk systems offer major advantages over equivalent
scale tape systems, in terms of capital costs, operating costs, support costs, and
quality of service. It can be implemented fully on day 1 or over a phased approach.

In a traditional approach for backup and archive, businesses take a backup of production.
Typically backup jobs use weekly full backups and nightly incremental backups. Based on
business requirements, they will then copy the backup jobs and eject the tapes to have them
sent offsite, where they will be stored for a specified amount of time.
The problem with this approach is simple - as the production environment grows, so does the
backup environment.

   Ÿ   Production environment grows
          – Requires constant tuning and data placement to maintain performance
          – Need to add more tier-1 storage
   Ÿ   Backup environment grows
          – Backup windows get longer and jobs do not complete
          – Restores take longer
          – Requires more tape drives and silos to keep up with service levels
   Ÿ   Archive environment grows
          – Impact flexibility to retrieve content when requested
          – Requires more media, adding management cost
          – No investment protection for long term retention requirements

Differences Between Backup / Recovery & Archive:




The recovery process is much more important than the backup process. It is based on the
appropriate recovery-point objectives (RPOs) and recovery-time objectives (RTOs). The process
usually drives a decision to have a combination of technologies in place, from online local
replicas, to backup to disk, to backup to tape for long-term, passive RPOs.
Archive processes are determined not only by the required retention times, but also by retrieval-
time service levels and the availability requirements of the information in the archive.
For both processes, a combination of hardware and software is needed to deliver the appropriate
service level. The best way to discover the appropriate service level is to classify the data and
align the business applications with it.




Replication: What is replication?

Local replication is a technique for ensuring Business Continuity by making exact copies of data.
With replication, data on the replica will be identical to the data on the original at the point-in-
time that the replica was created.
Examples:
Ÿ Copy a specific file
Ÿ Copy all the data used by a database application
Ÿ Copy all the data in a UNIX Volume Group (including underlying logical volumes, file systems,
    etc.)
Ÿ Copy data on a storage array to a remote storage array

Replicas can be used to address a number of Business Continuity functions:
Ÿ Provide an alternate source for backup to alleviate the impact on production.
Ÿ Provide a source for fast recovery to facilitate faster RPO and RTO.
Ÿ Decision Support activities such as reporting.
−                       For example, a company may have a requirement to generate periodic
   reports. Running the reports off of the replicas greatly reduces the burden placed on the
   production volumes. Typically reports would need to be generated once a day or once a
   week, etc.
Ÿ Developing and testing proposed changes to an application or an operating environment.
−                       For example, the application can be run on an alternate server using the
   replica volumes and any proposed design changes can be tested.
Ÿ Data migration.
−                       Migration can be as simple as moving applications from one server to the
   next, or as complicated as migrating entire data centers from one location to another.

Key factors to consider with replicas:
Ÿ What makes a replica good:
−                        Recoverability from a failure on the production volumes. The replication
    technology must allow for the restoration of data from the replicas to the production and
    then allow production to resume with a minimal RPO an RTO.
−                        Consistency/re-startability is very important if data on the replicas will be
    accessed directly or if the replicas will be used for restore operations.
Ÿ Replicas can either be Point-in-Time (PIT) or continuous:
−                        Point-in-Time (PIT) - the data on the replica is an identical image of the
    production at some specific timestamp
Ø                                 For example, a replica of a file system is created at 4:00 PM on
    Monday. This replica would then be referred to as the Monday 4:00 PM Point-in-Time copy.
Ø                              Note: The RPO will be a finite value with any PIT. The RPO will
  map to the time when the PIT was created to the time when any kind of failure on the
  production occurred. If there is a failure on the production at 8:00 PM and there is a 4:00 PM
  PIT available, the RPO would be 4 hours (8 – 4 = 4). To minimize RPO with PITs, take
  periodic PITs.
− Continuous replica - the data on the replica is synchronized with the production data at all
  times.
Ø The objective with any continuous replication is to reduce the RPO to zero.




Database replication can be offline or online:

    •   Offline – replication takes place when the
        database and the application are shutdown.
    •   Online – replication takes place when the database
        and the application are running.




    Ÿ   Databases/Applications maintain integrity by following the “Dependent Write I/O
        Principle”
            – Dependent Write: A write I/O that will not be issued by an application until a
                 prior related write I/O has completed
                     Ø A logical dependency, not a time dependency
            – Inherent in all Database Management Systems (DBMS)
                     Ø e.g. Page (data) write is dependent write I/O based on a successful log
                          write
            – Applications can also use this technology
            – Necessary for protection against local outages
                     Ø Power failures create a dependent write consistent image
                     Ø A Restart transforms the dependent write consistent to transactionally
                          consistent
                              v i.e. Committed transactions will be recovered, in-flight
                                  transactions will be discarded
Database applications require that for a transaction to be deemed complete a series of writes
have to occur in a particular order (Dependent Write I/O), these writes would be recorded on the
various devices/file systems.
            Ÿ In this example, steps 1-4 must complete for the transaction to be deemed
                complete.
                      − Step 4 is dependent on Step 3 and will occur only if Step 3 is complete
                      − Step 3 is dependent on Step 2 will occur only if Step 2 is complete
                      − Step 2 is dependent on Step 1 will occur only if Step 1 is complete
            Ÿ Steps 1-4 are written to the database’s buffer and then to the physical disks.




At the point in time when the replica is created, all the writes to the source devices must be
captured on the replica devices to ensure data consistency on the replica.
Ÿ In this example, steps 1-4 on the source devices must be captured on the replica devices for
    the data on the replicas to be consistent.

                                                      Creating a PIT for multiple devices happens quickly,
                                                      but not instantaneously.
                                                                      Ÿ   Steps 1-4 which are dependent
                                                                          write I/Os have occurred and
                                                                          have been recorded successfully
                                                                          on the source devices
                                                                      Ÿ   It is possible that steps 3 and 4
                                                                          were copied to the replica
                                                                          devices, while steps 1 and 2
                                                                          were not copied.
                                                                      Ÿ   In this case, the data on the
                                                                          replica is inconsistent with the
                                                                          data on the source. If a restart
                                                                          were to be performed on the
                                                                          replica devices, Step 4 which is
                                                                          available on the replica might
                                                                          indicate that a particular
                                                                          transaction is complete, but all
                                                                          the data associated with the
                                                                          transaction will be unavailable
                                                                          on the replica making the replica
                                                                          inconsistent.
Database replication can be performed with the application offline (i.e., application is shutdown,
no I/O activity) or online (i.e., while the application is up and running). If the application is
offline, the replica will be consistent because there is no activity. However, consistency is an
issue if the database application is replicated while it is up and running.

Online Replication
– Some database applications allow replication while the application is up and running
– The production database would have to be put in a state which would allow it to be
   replicated while it is active
– Some level of recovery must be performed on the replica to make the replica consistent




An alternative way to ensure that an online replica is consistent is to:

            1. Hold I/O to all the devices at the same instant.
            2. Create the replica.
            3. Release the I/O.
Holding I/O is similar to a power failure and most databases have the ability to restart from a
power failure.
Note: While holding I/O simultaneously one ensures that the data on the replica is identical to
that on the source devices, the database application will timeout if I/O is held for too long.




Changes will occur on the production volume after the creation of a PIT, changes could also
occur on the target. Typically the target device will be re-synchronized with the source device at
some future time in order to obtain a more recent PIT.
Note: The replication technology employed should have a mechanism to keep track of changes.
This makes the re-synchronization process will be much faster. If the replication technology does
not track changes between the source and target, every resynchronization operation will have to
be a full operation.

Replication technologies can classified by:

            Ÿ   Distance over which replication is performed - local or remote
            Ÿ   Where the replication is performed - host or array based
                     − Host based - all the replication is performed by using the CPU resources
                           of the host using software that is running on the host.
                     − Array based - all replication is performed on the storage array using CPU
                           resources on the array via the array’s operating environment.
Note: In the context of this discussion, local replication refers to replication that is performed
within a data center if it is host based and within a storage array if it is array based.

    Ÿ   Host based
            – Logical Volume Manager (LVM) based mirroring
            – File System Snapshots
    Ÿ   Storage Array based
            – Full volume mirroring
            – Full volume: Copy on First Access
            – Pointer based: Copy on First Write

    Ÿ   Host resident software responsible for creating and controlling host level logical storage
           – Physical view of storage is converted to a logical view by mapping. Logical data
                blocks are mapped to physical data blocks.
           – Logical layer resides between the physical layer (physical devices and device
                drivers) and the application layer (OS and applications see logical view of
                storage).
    Ÿ   Usually offered as part of the operating system or as third party host software
    Ÿ   LVM Components:
           – Physical Volumes
           – Volume Groups
           – Logical Volumes

A Volume Group is created by grouping together one or more Physical Volumes. Physical
Volumes:
              Ÿ Can be added or removed from a Volume Group dynamically.
              Ÿ Cannot be shared between Volume Groups, the entire Physical Volume becomes
                 part of a Volume Group.
Each Physical Volume is partitioned into equal-sized data blocks. The size of a Logical Volume is
based on a multiple of the equal-sized data block.
The Volume Group is handled as a single unit by the LVM.
              Ÿ A Volume Group as a whole can be activated or deactivated.
              Ÿ A Volume Group would typically contain related information. For example, each
                 host would have a Volume Group which holds all the OS data, while applications
                 would be on separate Volume Groups.
Logical Volumes are created within a given Volume Group. A Logical Volume can be thought of as
a virtual disk partition, while the Volume Group itself can be though of as a disk. A Volume Group
can have a number of Logical Volumes.
Logical Volumes (LV) form the basis of logical storage. They contain logically contiguous data
blocks (or logical partitions) within the volume group. Each logical partition is mapped to at least
one physical partition on a physical volume within the Volume Group. The OS treats an LV like a
physical device and accesses it via device special files (character or block). A Logical Volume:
             Ÿ Can only belong to one Volume Group. However, a Volume Group can have
                 multiple LVs.
             Ÿ Can span multiple physical volumes.
             Ÿ Can be made up of physical disk blocks that are not physically contiguous.
             Ÿ Appears as a series of contiguous data blocks to the OS.
             Ÿ Can contain a file system or be used directly. Note: There is a one-to-one
                 relationship between LV and a File System.
Note: Under normal circumstances there is a one-to-one mapping between a logical and physical
Partition. A one-to-many mapping between a logical and physical partition leads to mirroring of
Logical Volumes.
Ÿ   LVM based replicas add overhead on host CPUs
    Ÿ   If host devices are already Storage Array devices then the added redundancy provided
        by LVM mirroring is unnecessary
            – The devices will have some RAID protection already
    Ÿ   Host based replicas can be usually presented back to the same server
    Ÿ   Keeping track of changes after the replica has been created

With storage array based local replication:

            Ÿ   Replication performed by the Array Operating Environment
                    − Array CPU resources are used for the replication operations
                    − Host CPU resources can be devoted to production operations instead of
                         replication operations
            Ÿ   Replicas are on the same array
                    − Can be accessed by an alternate host for any BC operations
            Ÿ   Typically array based replication is performed at a array device level.
                    − Need to map storage components used by an application back to the
                         specific array devices used – then replicate those devices on the array.
                    − A database could be laid out on over multiple physical volumes which
                         belong. One would have to replicate all the devices for a PIT copy of the
                         database.




    Ÿ   For future re-synchronization to be incremental, most vendors have the ability to track
        changes at some level of granularity (e.g., 512 byte block, 32 KB, etc.)
– Tracking is typically done with some kind of bitmap
    Ÿ   Target device must be at least as large as the Source device
            – For full volume copies the minimum amount of storage required is the same as
                the size of the source

Copy on First Access (COFA) provides an alternate method to create full volume copies. Unlike
Full Volume mirrors, the replica is immediately available when the session is started (no waiting
for full synchronization).
             Ÿ The PIT is determined by the time of activation of the session. Just like the full
                 volume mirror technology this method requires the Target devices to be at least
                 as large as the source devices.
             Ÿ A protection map is created for all the data on the Source device at some level of
                 granularity (e.g., 512 byte block, 32 KB, etc.). Then the data is copied from the
                 source to the target in the background based on the mode with which the
                 replication session was invoked.




In the Copy on First Access mode (or the deferred mode), data is copied from the source to the
target only when:
     Ÿ    A write is issued for the first time after the PIT to a specific address on the source
     Ÿ    A read or write is issued for the first time after the PIT to a specific address on the
target.
Since data is only copied when required, if the replication session is terminated the target device
will only have data that was copied (not the entire contents of the source at the PIT). In this
scenario, the data on the Target cannot be used as it is incomplete.

    Ÿ   Targets do not hold actual data, but hold pointers to where the data is located
            – Actual storage requirement for the replicas is usually a small fraction of the size
                 of the source volumes
    Ÿ   A replication session is setup between the Source and Target devices and started
–   When the session is setup based on the specific vendors implementation a
                protection map is created for all the data on the Source device at some level of
                granularity (e.g 512 byte block, 32 KB etc.)
            –   Target devices are accessible immediately when the session is started
            –   At the start of the session the Target device holds pointers to the data on the
                Source device




The original data block from the Source is copied to the save location, when a data block is first
written to after the PIT.
             Ÿ Prior to a new write to the source or target device:
                       − Data is copied from the source to a “save” location
                       − The pointer for that specific address on the Target then points to the
                           “save” location
                       − Writes to the Target result in writes to the “save” location and the
                           updating of the pointer to the “save” location
             Ÿ If a write is issued to the source for the first time after the PIT the original data
                  block is copied to the save location and the pointer is updated from the Source
                  to the save location.
             Ÿ If a write is issued to the Target for the first time after the PIT the original data
                  is copied from the Source to the Save location, the pointer is updated and then
                  the new data is written to the save location.
             Ÿ Reads from the Target are serviced by the Source device or from the save
                  location based on the where the pointer directs the read.
                       − Source – When data has not changed since PIT
                       − Save Location – When data has changed since PIT
Data on the replica is a combined view of unchanged data on the Source and the save location.
Hence if the Source device becomes unavailable the replica will no longer have valid data.
Most array based replication technologies will allow the creation of Consistent replicas by holding
IO to all devices simultaneously when the PIT is created.
             Ÿ Typically applications are spread out over multiple devices
                     − Could be on the same array or multiple arrays
             Ÿ Replication technology must ensure that the PIT for the whole application is
                 consistent
                     − Need mechanism to ensure that updates do not occur while PIT is
                         created
             Ÿ Hold IO to all devices simultaneously for an instant, create PIT and release IO
                     − Cannot hold IO for too long, application will timeout

Mechanisms to hold IO
            Ÿ Host based
                     − Some host based application could be used to hold IO to all the array
                         devices that are to be replicated when the PIT is created
                     − Typically achieved at the device driver level or above before the IO
                         reaches the HBAs
                             Ø Some vendors implement this at the multi-pathing software layer
            Ÿ Array based
                     − IOs can be held for all the array devices that are to be replicated by the
                         Array Operating Environment in the array itself when the PIT is created
What if the application straddles multiple hosts and multiple arrays?
            Ÿ Federated Databases
            Ÿ Some array vendors are able to ensure consistency in this situation

Array Replicas: Restore/Restart Considerations

    Ÿ   Production has a failure
            – Logical Corruption
            – Physical failure of production devices
            – Failure of Production server
    Ÿ   Solution
–    Restore data from replica to production
                    Ø The restore would typically be done in an incremental manner and the
                        Applications would be restarted even before the synchronization is
                        complete leading to very small RTO
-----OR------
           –    Start production on replica
                    Ø Resolve issues with production while continuing operations on replicas
                    Ø After issue resolution restore latest data on replica to production

   –   Before a Restore
           – Stop all access to the Production devices and the Replica devices
           – Identify Replica to be used for restore
                    Ø Based on RPO and Data Consistency
           – Perform Restore
   –   Before starting production on Replica
           – Stop all access to the Production devices and the Replica devices
           – Identify Replica to be used for restart
                    Ø Based on RPO and Data Consistency
           – Create a “Gold” copy of Replica
                    Ø As a precaution against further failures
           – Start production on Replica
   –   RTO drives choice of replication technology

   Ÿ   Full Volume Replicas
            – Restores can be performed to either the original source device or to any other
                device of like size
                    Ø Restores to the original source could be incremental in nature
                    Ø Restore to a new device would involve a full synchronization
   Ÿ   Pointer Based Replicas
            – Restores can be performed to the original source or to any other device of like
                size as long as the original source device is healthy
                    Ø Target only has pointers
                             v Pointers to source for data that has not been written to after PIT
                             v Pointers to the “save” location for data was written after PIT
                    Ø Thus to perform a restore to an alternate volume the source must be
                         healthy to access data that has not yet been copied over to the target

   Ÿ   Full Volume Replica
            – Replica is a full physical copy of the source device
            – Storage requirement is identical to the source device
            – Restore does not require a healthy source device
            – Activity on replica will have no performance impact on the source device
            – Good for full backup, decision support, development, testing and restore to last
                PIT
            – RPO depends on when the last PIT was created
            – RTO is extremely small

   Ÿ   Pointer based - COFW
           – Replica contains pointers to data
                   Ø Storage requirement is a fraction of the source device (lower cost)
           – Restore requires a healthy source device
           – Activity on replica will have some performance impact on source
Ø Any first write to the source or target will require data to be copied to
                         the save location and move pointer to save location
                    Ø Any read IO to data not in the save location will have to be serviced by
                         the source device
            –   Typically recommended if the changes to the source are less than 30%
            –   RPO depends on when the last PIT was created
            –   RTO is extremely small

    Ÿ   Full Volume – COFA Replicas
             – Replica only has data that was accessed
             – Restore requires a healthy source device
             – Activity on replica will have some performance impact
                     Ø Any first access on target will require data to be copied to target before
                         the I/O to/from target can be satisfied
             – Typically replicas created with COFA only are not as useful as replicas created
                 with the full copy mode – Recommendation would be to use the full copy mode it
                 the technology allows such an option




EMC – Local Replication Solutions

All the local replication solutions that were discussed in this module are available on EMC
Symmetrix and CLARiiON arrays.
              Ÿ EMC TimeFinder/Mirror and EMC TimeFinder/Clone are full volume replication
                   solutions on the Symmetrix arrays, while EMC TimeFinder/Snap is a pointer
                   based replication solution on the Symmetrix. EMC SnapView on the CLARiiON
                   arrays allows full volume replication via SnapView Clone and pointer based
                   replication via SnapView Snapshot.
              Ÿ EMC TimeFinder/Mirror: Highly available, ultra-performance mirror images of
                   Symmetrix volumes that can be non-disruptively split off and used as point-in-
                   time copies for backups, restores, decision support, or contingency uses.
              Ÿ EMC TimeFinder/Clone: Highly functional, high-performance, full volume copies
                   of Symmetrix volumes that can be used as point-in-time copies for data
                   warehouse refreshes, backups, online restores, and volume migrations.
              Ÿ EMC SnapView Clone: Highly functional, high-performance, full volume copies of
                   CLARiiON volumes that can be used as point-in-time copies for data warehouse
                   refreshes, backups, online restores, and volume migrations.
Ÿ   EMC TimeFinder/Snap: High function, space-saving, pointer-based copies (logical
                images) of Symmetrix volumes that can be used for fast and efficient disk-based
                restores.
            Ÿ EMC SnapView Snapshot: High function, space-saving, pointer-based copies
                (logical images) of CLARiiON volumes that can be used for fast and efficient disk-
                based restores.
We will discuss EMC TimeFinder/Mirror and EMC SnapView Snapshot in more detail in the next
few slides.

    Ÿ   Array based local replication technology for Full Volume Mirroring on EMC Symmetrix
        Storage Arrays
            – Create Full Volume Mirrors of an EMC Symmetrix device within an Array
    Ÿ   TimeFinder/Mirror uses special Symmetrix devices called Business Continuance Volumes
        (BCV). BCVs:
            – Are devices dedicated for Local Replication
            – Can be dynamically, non-disruptively established with a Standard device. They
                can be subsequently split instantly to create a PIT copy of data.
    Ÿ   The PIT copy of data can be used in a number of ways:
            – Instant restore – Use BCVs as standby data for recovery
            – Decision Support operations
            – Backup – Reduce application downtime to a minimum (offline backup)
            – Testing
    Ÿ   TimeFinder/Mirror is available in both Open Systems and Mainframe environments

    Ÿ   Establish
           – Synchronize the Standard volume to the BCV volume
           – BCV is set to a Not Ready state when established
                  Ø BCV cannot be independently addressed
           – Re-synchronization is incremental
           – BCVs cannot be established to other BCVs
           – Establish operation is non-disruptive to the Standard device
           – Operations to the Standard can proceed as normal during the establish

    Ÿ   Split
            –   Time of Split is the Point-in-Time
            –   BCV is made accessible for BC Operations
            –   Consistency
                   Ø Consistent Split
            –   Changes tracked

The TimeFinder/Mirror Consistent Split option ensures that the data on the BCVs is consistent
with the data on the Standard devices. Consistent Split holds I/O across a group of devices using
a single Consistent Split command, thus all the BCVs in the group are consistent point-in-time
copies. Used to create a consistent point-in-time copy of an entire system, an entire database, or
any associated set of volumes.
The holding of I/Os can be either done by the EMC PowerPath multi-pathing software or by the
Symmetrix Microcode (Enginuity Consistency Assist). PowerPath-based consistent split executed
by the host doing the I/O, I/O is held at the host before the split.
Enginuity Consistency Assist (ECA) based consistent split can be executed, by the host doing the
I/O or by a control host in an environment where there are distributed and/or related databases.
I/O held at the Symmetrix until the split operation is completed. Since I/O is held at the
Symmetrix, ECA can be used to perform consistent splits on BCV pairs across multiple,
heterogeneous hosts.
Ÿ   Restore
           – Synchronize contents of BCV volume to the Standard volume
           – Restore can be full or incremental
           – BCV is set to a Not Ready state
           – I/Os to the Standard and BCVs should be stopped before the restore is initiated
    Ÿ   Query
           – Provide current status of BCV/Standard volume pairs




TimeFinder/Mirror allows a given Standard device to maintain incremental relationships with
multiple BCVs.
This means that different BCVs can be established and then split incrementally from a standard
volume at different times of the day. For example a BCV that was split at 4:00 a.m. can be re-
established incrementally even though another BCV was established and split at 5:00 a.m. In this
way, a user can split and incrementally re-establish volumes throughout the day or night and still
keep re-establish times to a minimum.
Incremental information can be retained between a STD device and multiple BCV devices,
provided the BCV devices have not been paired with different STD devices.
The incremental relationship is maintained between each STD/BCV pairing by the Symmetrix
Microcode.

    Ÿ   Two BCVs can be established concurrently with the same Standard device
    Ÿ   Establish BCVs simultaneously or one after the other
    Ÿ   BCVs can be split individually or simultaneously.
    Ÿ   Simultaneous. “Concurrent Restores”, are not allowed
SnapView is software that runs on the CLARiiON Storage Processors, and is part of the CLARiiON
Replication Software suite of products, which includes SnapView, MirrorView and SAN Copy.
SnapView can be used to make point in time (PIT) copies in 2 different ways – Clones, also called
BCVs or Business Continuity Volumes, are full copies, whereas Snapshots use a pointer-based
mechanism. Full copies are covered later, when we look at Symmetrix TimeFinder; SnapView
Snapshots will be covered here.
The generic pointer-based mechanism has been discussed in a previous section, so we’ll
concentrate on SnapView here.
Snapshots require a save area, called the Reserved LUN Pool. The ‘Reserved’ part of the name
implies that the LUNs are reserved for use by CLARiiON software, and can therefore not be
assigned to a host. LUNs which cannot be assigned to a host are known as private LUNs in the
CLARiiON environment.
To keep the number of pointers, and therefore the pointer map, at a reasonable size, SnapView
divides the LUN to be snapped, called a Source LUN, into areas of 64 kB in size. Each of these
areas is known as a chunk. Any change to data inside a chunk will cause that chunk to be written
to the Reserved LUN Pool, if it is being modified for the first time. The 64 kB copied from the
Source LUN must fit into a 64 kB area in the Reserved LUN, so Reserved LUNs are also divided
into chunks for tracking purposes.
The next 2 slides show more detail on the Reserved LUN Pool, and allocation of Reserved LUNs
to a Source LUN.




The CLARiiON storage system must be configured with a Reserved LUN Pool in order to use
SnapView Snapshot features. The Reserved LUN Pool consists of 2 parts: LUNs for use by SPA
and LUNs for use by SPB. Each of those parts is made up of one or more Reserved LUNs. The
LUNs used are bound in the normal manner. However, they are not placed in storage groups and
allocated to hosts, they are used internally by the storage system software. These are known as
private LUNs because they cannot be used, or seen, by attached hosts.
Like any LUN, a Reserved LUN will be owned by only one SP at any time and they may be
trespassed if the need should arise (i.e., if an SP should fail).
Just as each storage system model has a maximum number of LUNs it will support, each also has
a maximum number of LUNs which may be added to the Reserved LUN Pool.
The first step in SnapView configuration will usually be the assignment of LUNs to the Reserved
LUN Pool. Only then will SnapView Sessions be allowed to start. Remember that as snapable
Remote Replication Concepts

LUNs are added to the storage system, the LUN Pool size will have to be reviewed. Changes may
be made online.
LUNs used in the Reserved LUN Pool are not host-visible, though they do count towards the
maximum number of LUNs allowed on a storage system.

   Ÿ   Replica is available at a remote facility
          – Could be a few miles away or half way around the world
          – Backup and Vaulting are not considered remote replication
   Ÿ   Synchronous Replication
          – Replica is identical to source at all times – Zero RPO
   Ÿ   Asynchronous Replication
          – Replica is behind the source by a finite margin – Small RPO
   Ÿ   Connectivity
          – Network infrastructure over which data is transported from source site to remote
                site

Synchronous Replication

   Ÿ   A write has to be secured on the remote replica and the source before it is acknowledged
       to the host
   Ÿ   Ensures that the source and remote replica have identical data at all times
           – Write ordering is maintained at all times
                   Ø Replica receives writes in exactly the same order as the source
   Ÿ   Synchronous replication provides the lowest RPO and RTO
           – Goal is zero RPO
           – RTO is as small as the time it takes to start application on the remote site

   Ÿ   Response Time Extension
           – Application response time will be extended due to synchronous replication
                   Ø Data must be transmitted to remote site before write can be
                         acknowledged
                   Ø Time to transmit will depend on distance and bandwidth
   Ÿ   Bandwidth
           – To minimize impact on response time, sufficient bandwidth must be provided for
               at all times
   Ÿ   Rarely deployed beyond 200 km

Asynchronous Replication

   Ÿ   Write is acknowledged to host as soon as it is received by the source
   Ÿ   Data is buffered and sent to remote
            – Some vendors maintain write ordering
            – Other vendors do not maintain write ordering, but ensure that the replica will
                always be a consistent re-startable image
   Ÿ   Finite RPO
            – Replica will be behind the Source by a finite amount
            – Typically configurable
   Ÿ   Response Time unaffected
   Ÿ   Bandwidth
            – Need sufficient bandwidth on average
   Ÿ   Buffers
– Need sufficient buffers
   Ÿ   Can be deployed over long distances

Remote Replication Technologies

   Ÿ   Host based
           – Logical Volume Manager (LVM)
                   Ø Synchronous/Asynchronous
           – Log Shipping
   Ÿ   Storage Array based
           – Synchronous
           – Asynchronous
           – Disk Buffered - Consistent PITs
                   Ø Combination of Local and Remote Replication

   Ÿ   Duplicate Volume Groups at local and remote sites
   Ÿ   All writes to the source Volume Group are replicated to the remote Volume Group by the
       LVM
            – Synchronous or Asynchronous




LVM Based Remote Replication

   Ÿ   In the event of a network failure
           –   Writes are queued in the log file
           –   When the issue is resolved the queued writes are sent over to the remote
           –   The maximum size of the log file determines the length of outage that can be
               withstood
   Ÿ   In the event of a failure at the source site, production operations can be transferred to
       the remote site
   Ÿ   Advantages
           – Different storage arrays and RAID protection can be used at the source and
               remote sites
           – Standard IP network can be used for replication
           – Response time issue can be eliminated with asynchronous mode, with extended
               RPO
   Ÿ   Disadvantages
           – Extended network outages require large log files
           – CPU overhead on host
                   Ÿ For maintaining and shipping log files
Host Based Log Shipping

Log Shipping is a host based replication technology for databases offered by most DB Vendors
           Ÿ Initial State - All the relevant storage components that make up the database are
                replicated to a standby server (done over IP or other means) while the database
                is shutdown
           Ÿ Database is started on the production server – As and when log switches occur
                the log file that was closed is sent over IP to the standby server
           Ÿ Database is started in standby mode on the standby server, as and when log
                files arrive they are applied to the standby database
           Ÿ Standby database is consistent up to the last log file that was applied
Advantages
           Ÿ Minimal CPU overhead on production server
           Ÿ Low bandwidth (IP) requirement
           Ÿ Standby Database consistent to last applied log
                     − RPO can be reduced by controlling log switching
Disadvantages
           Ÿ Need host based mechanism on production server to periodically ship logs
           Ÿ Need host based mechanism on standby server to periodically apply logs and
                check for consistency
           Ÿ IP network outage could lead to standby database falling further behind

Array Based – Remote Replication

Replication Process
            Ÿ A Write is initiated by an application/server
            Ÿ Received by the source array
            Ÿ Source array transmits the write to the remote array via dedicated channels
                (ESCON, Fibre Channel or Gigabit Ethernet) over a dedicated or shared network
                infrastructure
            Ÿ Write received by the remote array
Only Writes are forwarded to the remote array
            Ÿ Reads are from the source devices

Array Based – Synchronous Replication
Array Based – Asynchronous Replication




    Ÿ   No impact on response time
    Ÿ   Extended distances between arrays
    Ÿ   Lower bandwidth as compared to Synchronous

    Ÿ   Ensuring Consistency
           – Maintain write ordering
                   Ø Some vendors attach a time stamp and sequence number with each of
                        the writes, then ship the writes to the remote array and apply the writes
                        to the remote devices in the exact order based on the time stamp and
                        sequence numbers
                   Ø Remote array applies the writes in the exact order they were received,
                        just like synchronous
           – Dependent write consistency
                   Ø Some vendors buffer the writes in the cache of the source array for a
                        period of time (between 5 and 30 seconds)
                   Ø At the end of this time the current buffer is closed in a consistent
                        manner and the buffer is switched, new writes are received in the new
                        buffer
                   Ø The closed buffer is then transmitted to the remote array
                   Ø Remote replica will contain a consistent, re-startable image on the
                        application

Disk buffered consistent PITs is a combination of Local and Remote replications technologies.
The idea is to make a Local PIT replica and then create a Remote replica of the Local PIT. The
advantage of disk buffered PITs is lower bandwidth requirements and the ability to replicate over
extended distances. Disk buffered replication is typically used when the RPO requirements are of
the order of hours or so, thus a lower bandwidth network can be used to transfer data from the
Local PIT copy to the remote site. The data transfer may take a while, but the solution would be
designed to meet the RPO.
We will take a look at a two disk buffered PIT solutions.

Disk buffered replication allows for the incremental resynchronization between a Local Replica
which acts as a source for a Remote Replica.
Benefits include:
             Ÿ Reduction in communication link cost and improved resynchronization time for
                 long-distance replication implementations
             Ÿ The ability to use the various replicas to provide disaster recovery testing, point-
                 in-time backups, decision support operations, third-party software testing, and
                 application upgrade testing or the testing of new applications.




Synchronous + Extended Distance Consistent PIT




Synchronous + Extended Distance Buffered Replication benefits include:
          Ÿ Bunker site provides a zero RPO DR Replica
          Ÿ The ability to resynchronize only changed data between the intermediate Bunker
              site and the final target site, reducing required network bandwidth
          Ÿ Reduction in communication link cost and improved resynchronization time for
              long-distance replication implementations
          Ÿ The ability to use the replicas to provide disaster recovery testing, point-in-time
              backups, decision support operations, third-party software testing, and
              application upgrade testing or the testing of new applications.

Remote Replicas – Tracking Changes

    Ÿ   Remote replicas can be used for BC Operations
            – Typically remote replication operations will be suspended when the remote
                replicas are used for BC Operations
    Ÿ   During BC Operations changes will/could happen to both the source and remote replicas
            – Most remote replication technologies have the ability to track changes made to
                the source and remote replicas to allow for incremental re-synchronization
            – Resuming remote replication operations will require re-synchronization between
                the source and replica
Primary Site Failure – Operations at Remote Site

   Ÿ   Remote replicas are typically not available for use while the replication session is in
       progress
   Ÿ   In the event of a primary site failure the replicas have to be made accessible for use
   Ÿ   Create a local replica of the remote devices at the remote site
   Ÿ   Start operations at the Remote site
           – No remote protection while primary site issues are resolved
   Ÿ   After issue resolution at Primary Site
           – Stop activities at remote site
           – Restore latest data from remote devices to source
           – Resume operations at Primary (Source) Site

Array Based – Which Technology?

   Ÿ   Synchronous
           – Is a must if zero RPO is required
           – Need sufficient bandwidth at all times
           – Application response time elongation will prevent extended distance solutions
               (rarely above 125 miles)
   Ÿ   Asynchronous
           – Extended distance solutions with minimal RPO (order of minutes)
           – No Response time elongation
           – Generally requires lower Bandwidth than synchronous
           – Must design with adequate cache/buffer or sidefile/logfile capacity
   Ÿ   Disk Buffered Consistent PITs
           – Extended distance solution with RPO in the order of hours
           – Generally lower bandwidth than synchronous or asynchronous

Storage Array Based – Remote Replication

   Ÿ   Network Options
          – Most vendors support ESCON or Fibre Channel adapters for remote replication
                  Ø Can connect to any optical or IP networks with appropriate protocol
                      converters for extended distances
                         v DWDM
                         v SONET
                         v IP Networks
          – Some Vendors have native Gigabit Ethernet adapters which allows the array to
              be connected directly to IP Networks without the need for protocol converters

Dense Wavelength Division Multiplexing (DWDM)

   Ÿ   DWDM is a technology that puts data from different sources together on an optical fiber
       with each signal carried on its own separate light wavelength (commonly referred to as a
       lambda or λ).
   Ÿ   Up to 32 protected and 64 unprotected separate wavelengths of data can be multiplexed
       into a light stream transmitted on a single optical fiber.
Synchronous Optical Network (SONET)

Synchronous Optical Networks (SONET) is a standard for optical telecommunications transport
formulated by the Exchange Carriers Standards Association (ECSA) for the American National
Standards Institute (ANSI). The equivalent international standard is referred to as Synchronous
Digital Hierarchy and is defined by the European Telecommunications Standards Institute (ETSI).
Within Metropolitan Area Networks (MANs) today, SONET/SDH rings are used to carry both voice
and data traffic over fiber.

    Ÿ   SONET is Time Division Multiplexing (TDM) technology where traffic from multiple
        subscribers is multiplexed together and sent out onto the SONET ring as an optical signal
    Ÿ   Synchronous Digital Hierarchy (SDH) similar to SONET but is the European standard
    Ÿ   SONET/SDH, offers the ability to service multiple locations, its reliability/availability,
        automatic protection switching, and restoration

EMC – Remote Replication Solutions

    Ÿ   EMC Symmetrix Arrays
           – EMC SRDF/Synchronous
           – EMC SRDF/Asynchronous
           – EMC SRDF/Automated Replication
    Ÿ   EMC CLARiiON Arrays
           – EMC MirrorView/Synchronous
           – EMC MirrorView/Asynchronous

All remote replication solutions that were discussed in this module are available on EMC
Symmetrix and CLARiiON Arrays.
The SRDF (Symmetrix Remote Data Facility) family of products provides Synchronous,
Asynchronous and Disk Buffered remote replication solutions on the EMC Symmetrix Arrays.
The MirrorView family of products provides Synchronous and Asynchronous remote replication
solutions on the EMC CLARiiON Arrays.
SRDF/Synchronous (SRDF/S): High-performance, host-independent, real-time synchronous
remote replication from one Symmetrix to one or more Symmetrix systems.
MirrorView/Synchronous (MirrorView/S): Host-independent, real-time synchronous remote
replication from one CLARiiON to one or more CLARiiON systems.
SRDF/Asynchronous (SRDF/A): High-performance extended distance asynchronous
replication for Symmetrix arrays using a Delta Set architecture for reduced bandwidth
requirements and no host performance impact. Ideal for Recovery Point Objectives of the order
of minutes.
MirrorView/Asynchronous (MirrorView/A): Asynchronous remote replication on CLARiiON
arrays. Designed with low-bandwidth requirements, delivers a cost-effective remote replication
solution ideal for Recovery Point Objectives (RPOs) of 30 minutes or greater.
SRDF/Automated Replication: Rapid business restart over any distance with no data
exposure through advanced single-hop and multi-hop configurations using combinations of
TimeFinder/Mirror and SRDF on Symmetrix Arrays.

EMC SRDF/Synchronous - Introduction

    Ÿ   Array based Synchronous Remote Replication technology for EMC Symmetrix Storage
        Arrays
            – Facility for maintaining real-time physically separate mirrors of selected volumes
    Ÿ   SRDF/Synchronous uses special Symmetrix devices
            – Source arrays have SRDF R1 devices
– Target arrays have SRDF R2 devices
           – Data written to R1 devices are replicated to R2 devices
    Ÿ   SRDF uses dedicated channels to send data from source to target array
           – ESCON, Fibre Channel or Gigabit Ethernet are supported
    Ÿ   SRDF is available in both Open Systems and Mainframe environments

SRDF Source and Target Volumes

    Ÿ   SRDF R1 and R2 Volumes can have any local RAID Protection
           – E.g. Volumes could have RAID-1 or RAID-5 protection
    Ÿ   SRDF R2 volumes are in a Read Only state when remote replication is in effect
           – Changes cannot be made to the R2 volumes
    Ÿ   SRDF R2 volumes are accessed under certain circumstances
           – Failover – Invoked when the primary volumes become unavailable
           – Split – Invoked when the R2 volumes need to be concurrently accessed for BC
               operations

SRDF/Synchronous




SRDF Operations - Failover

Failover operations are performed if the SRDF R1 Volumes become unavailable and the decision
is made to start operations on the R2 Devices. Failover could also be performed when DR
processes are being tested or for any maintenance tasks that have to be performed at the source
site.

If failing over for a Maintenance operation: For a clean, consistent, coherent point in time copy
which can be used with minimal recovery on the target side some or all of the following steps
may have to be taken on the source side:
             Ÿ     Stop All Applications (DB or what ever)
             Ÿ     Unmount file system.
             Ÿ     Deactivate the Volume Group
             Ÿ     A failover leads to a RO state on the source side. If a device suddenly becomes
                  RO from a RW state the reaction of the host can be unpredictable if the device is
                  in use. Hence the suggestion to stop applications, un-mount and deactivation of
                  Volume Groups.
SRDF Operations - Failback
SRDF Operations – Establish/Restore

   Ÿ   Establish - Resume SRDF operation retaining data from source and overwriting any
       changed data on target
   Ÿ   Restore - SRDF operation retaining data on target and overwriting any changed data on
       source
SRDF Operations - Split

   Ÿ   Enables read and write operations on both source and target volumes
   Ÿ   Suspends replication




EMC CLARiiON MirrorView/A Overview

   Ÿ   Optional storage system software for remote replication on EMC CLARiiON arrays
           – No host cycles used for data replication
   Ÿ   Provides a remote image for disaster recovery
           – Remote image updated periodically - asynchronously
           – Remote image cannot be accessed by hosts while replication is active
           – Snapshot of mirrored data can be host-accessible at remote site
   Ÿ   Mirror topology (connecting primary array to secondary arrays)
           – Direct connect and switched FC topology supported
           – WAN connectivity supported using specialized hardware
Mirrorview Terms:
   Ÿ   Primary storage system
           – Holds the local image for a given mirror
   Ÿ   Secondary storage system
           – Holds the local image for a given mirror
   Ÿ   Bidirectional mirroring
           – A storage system can hold local and remote images
   Ÿ   Mirror Synchronization
           – Process that copies data from local image to remote image
   Ÿ   MirrorView Fractured state
           – Condition when a Secondary storage system is unreachable by the Primary
                storage system
Mirrorview Configuration:
   Ÿ   MirrorView/A Setup
           – MirrorView/A software must be loaded on both Primary and Secondary storage
               system
           – Remote LUN must be exactly the same size as local LUN
           – Secondary LUN does not need to be the same RAID type as Primary
           – Reserved LUN Pool space must be configured
   Ÿ   Management via Navisphere Manager and CLI
Consistency Groups allow all LUNs belonging to a given application, usually a database, to be
treated as a single entity, and managed as a whole. This helps to ensure that the remote images
are consistent, i.e. all made at the same point in time. As a result, the remote images are always
restartable copies of the local images, though they may contain data which is not as new as that
on the primary images.
It is a requirement that all the local images of a Consistency Group be on the same CLARiiON,
and that all the remote images for a Consistency Group be on the same remote CLARiiON. All
information related to the Consistency Group will be sent to the remote CLARiiON from the local
CLARiiON.
The operations which can be performed on a Consistency Group match those which may be
performed on a single mirror, and will affect all mirrors in the Consistency Group. If, for some
reason, an operation cannot be performed on one or more mirrors in the Consistency Group,
then that operation will fail, and the images will be unchanged.

Más contenido relacionado

La actualidad más candente

What Is A BCP?
What Is A BCP?What Is A BCP?
What Is A BCP?wattersr
 
Field Services overview for Data Centers
Field Services overview for Data CentersField Services overview for Data Centers
Field Services overview for Data CentersAnton Svinenkov
 
BCM Roadmap
BCM RoadmapBCM Roadmap
BCM Roadmapbtrmuray
 
Create a right sized disaster recovery plan
Create a right sized disaster recovery planCreate a right sized disaster recovery plan
Create a right sized disaster recovery planInfo-Tech Research Group
 
Managing and Implementing a National BCM Programme: A World's First
Managing and Implementing a National BCM Programme: A World's FirstManaging and Implementing a National BCM Programme: A World's First
Managing and Implementing a National BCM Programme: A World's FirstBCM Institute
 
Joe Honan Virtualization Trends
Joe Honan   Virtualization TrendsJoe Honan   Virtualization Trends
Joe Honan Virtualization Trends1velocity
 
Reduce Costs Through Printer Consolidation
Reduce Costs Through Printer ConsolidationReduce Costs Through Printer Consolidation
Reduce Costs Through Printer ConsolidationAdrian Boucek
 
Improve HR System Capabilities SilverRoad Solutions
Improve HR System Capabilities   SilverRoad SolutionsImprove HR System Capabilities   SilverRoad Solutions
Improve HR System Capabilities SilverRoad SolutionsTom Sonde
 
Time to Talk Throughput Webinar
Time to Talk Throughput WebinarTime to Talk Throughput Webinar
Time to Talk Throughput Webinarabrowne44
 
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12peak10marketing
 
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]Dmg emc-avamar-optimized-backup-recovery-dedupe[1]
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]Nitesh Bhat
 
HR Best Practices for Improving HRMS / HRIS - Tom Sonde - SilverRoad Solutions
HR Best Practices for Improving HRMS / HRIS  -  Tom Sonde - SilverRoad SolutionsHR Best Practices for Improving HRMS / HRIS  -  Tom Sonde - SilverRoad Solutions
HR Best Practices for Improving HRMS / HRIS - Tom Sonde - SilverRoad SolutionsTom Sonde
 
HRIS Optimization: Methods to Improve your HR System Capabilities
HRIS Optimization:Methods to Improve your HR System Capabilities HRIS Optimization:Methods to Improve your HR System Capabilities
HRIS Optimization: Methods to Improve your HR System Capabilities Tom Sonde
 
What You Need to Know Before Implementing a New HR System Tom Sonde SilverR...
What You Need to Know Before Implementing a New HR System   Tom Sonde SilverR...What You Need to Know Before Implementing a New HR System   Tom Sonde SilverR...
What You Need to Know Before Implementing a New HR System Tom Sonde SilverR...Tom Sonde
 
Strain Sprain Injury Prevention Programs
Strain Sprain Injury Prevention ProgramsStrain Sprain Injury Prevention Programs
Strain Sprain Injury Prevention ProgramsDale Rhodes
 
Best practices in printer fleet management
Best practices in printer fleet managementBest practices in printer fleet management
Best practices in printer fleet managementEKM Global Marketing
 

La actualidad más candente (20)

What Is A BCP?
What Is A BCP?What Is A BCP?
What Is A BCP?
 
Field Services overview for Data Centers
Field Services overview for Data CentersField Services overview for Data Centers
Field Services overview for Data Centers
 
BCM Roadmap
BCM RoadmapBCM Roadmap
BCM Roadmap
 
Firstcomm construction of a DR plan
Firstcomm construction of a DR planFirstcomm construction of a DR plan
Firstcomm construction of a DR plan
 
Create a right sized disaster recovery plan
Create a right sized disaster recovery planCreate a right sized disaster recovery plan
Create a right sized disaster recovery plan
 
Vandana Yadav
Vandana YadavVandana Yadav
Vandana Yadav
 
Managing and Implementing a National BCM Programme: A World's First
Managing and Implementing a National BCM Programme: A World's FirstManaging and Implementing a National BCM Programme: A World's First
Managing and Implementing a National BCM Programme: A World's First
 
Om 5
Om 5Om 5
Om 5
 
Joe Honan Virtualization Trends
Joe Honan   Virtualization TrendsJoe Honan   Virtualization Trends
Joe Honan Virtualization Trends
 
Reduce Costs Through Printer Consolidation
Reduce Costs Through Printer ConsolidationReduce Costs Through Printer Consolidation
Reduce Costs Through Printer Consolidation
 
Improve HR System Capabilities SilverRoad Solutions
Improve HR System Capabilities   SilverRoad SolutionsImprove HR System Capabilities   SilverRoad Solutions
Improve HR System Capabilities SilverRoad Solutions
 
Raghavendra_TSM_2.2yr
Raghavendra_TSM_2.2yrRaghavendra_TSM_2.2yr
Raghavendra_TSM_2.2yr
 
Time to Talk Throughput Webinar
Time to Talk Throughput WebinarTime to Talk Throughput Webinar
Time to Talk Throughput Webinar
 
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
 
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]Dmg emc-avamar-optimized-backup-recovery-dedupe[1]
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]
 
HR Best Practices for Improving HRMS / HRIS - Tom Sonde - SilverRoad Solutions
HR Best Practices for Improving HRMS / HRIS  -  Tom Sonde - SilverRoad SolutionsHR Best Practices for Improving HRMS / HRIS  -  Tom Sonde - SilverRoad Solutions
HR Best Practices for Improving HRMS / HRIS - Tom Sonde - SilverRoad Solutions
 
HRIS Optimization: Methods to Improve your HR System Capabilities
HRIS Optimization:Methods to Improve your HR System Capabilities HRIS Optimization:Methods to Improve your HR System Capabilities
HRIS Optimization: Methods to Improve your HR System Capabilities
 
What You Need to Know Before Implementing a New HR System Tom Sonde SilverR...
What You Need to Know Before Implementing a New HR System   Tom Sonde SilverR...What You Need to Know Before Implementing a New HR System   Tom Sonde SilverR...
What You Need to Know Before Implementing a New HR System Tom Sonde SilverR...
 
Strain Sprain Injury Prevention Programs
Strain Sprain Injury Prevention ProgramsStrain Sprain Injury Prevention Programs
Strain Sprain Injury Prevention Programs
 
Best practices in printer fleet management
Best practices in printer fleet managementBest practices in printer fleet management
Best practices in printer fleet management
 

Destacado

Srdf overview latency_v.5
Srdf overview latency_v.5Srdf overview latency_v.5
Srdf overview latency_v.5jas3399
 
White Paper on Disaster Recovery in Geographically dispersed cross site virtu...
White Paper on Disaster Recovery in Geographically dispersed cross site virtu...White Paper on Disaster Recovery in Geographically dispersed cross site virtu...
White Paper on Disaster Recovery in Geographically dispersed cross site virtu...EMC Forum India
 
Micro Datacenters Venema Advies Nigeria Limited
Micro Datacenters Venema Advies Nigeria LimitedMicro Datacenters Venema Advies Nigeria Limited
Micro Datacenters Venema Advies Nigeria LimitedDick Venema
 
EMC: Business Continuity a Disaster Recovery pre virtuálne prostredia
EMC: Business Continuity a Disaster Recovery pre virtuálne prostrediaEMC: Business Continuity a Disaster Recovery pre virtuálne prostredia
EMC: Business Continuity a Disaster Recovery pre virtuálne prostrediaASBIS SK
 
Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...
Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...
Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...EMC Forum India
 
Veritas Software Foundations
Veritas Software FoundationsVeritas Software Foundations
Veritas Software Foundations.Gastón. .Bx.
 
Navisphere manager resume
Navisphere manager resumeNavisphere manager resume
Navisphere manager resume.Gastón. .Bx.
 
Enterprise-Grade Disaster Recovery Without Breaking the Bank
Enterprise-Grade Disaster Recovery Without Breaking the BankEnterprise-Grade Disaster Recovery Without Breaking the Bank
Enterprise-Grade Disaster Recovery Without Breaking the BankCloudEndure
 
Designing a Modern Disaster Recovery Environment
Designing a Modern Disaster Recovery EnvironmentDesigning a Modern Disaster Recovery Environment
Designing a Modern Disaster Recovery EnvironmentEagle Technologies
 
Logical Volume Manager. An Introduction
Logical Volume Manager. An IntroductionLogical Volume Manager. An Introduction
Logical Volume Manager. An IntroductionJuan A. Suárez Romero
 
Symm configuration management
Symm configuration managementSymm configuration management
Symm configuration management.Gastón. .Bx.
 
AIX Advanced Administration Knowledge Share
AIX Advanced Administration Knowledge ShareAIX Advanced Administration Knowledge Share
AIX Advanced Administration Knowledge Share.Gastón. .Bx.
 
Replication for Business Continuity, Disaster Recovery and High Availability
Replication for Business Continuity, Disaster Recovery and High AvailabilityReplication for Business Continuity, Disaster Recovery and High Availability
Replication for Business Continuity, Disaster Recovery and High AvailabilityTony Pearson
 
Enterprise grade disaster recovery without breaking the bank
Enterprise grade disaster recovery without breaking the bankEnterprise grade disaster recovery without breaking the bank
Enterprise grade disaster recovery without breaking the bankactualtechmedia
 
Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...
Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...
Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...Citrix Online
 

Destacado (20)

4. storage lvm
4. storage   lvm4. storage   lvm
4. storage lvm
 
Srdf overview latency_v.5
Srdf overview latency_v.5Srdf overview latency_v.5
Srdf overview latency_v.5
 
White Paper on Disaster Recovery in Geographically dispersed cross site virtu...
White Paper on Disaster Recovery in Geographically dispersed cross site virtu...White Paper on Disaster Recovery in Geographically dispersed cross site virtu...
White Paper on Disaster Recovery in Geographically dispersed cross site virtu...
 
Micro Datacenters Venema Advies Nigeria Limited
Micro Datacenters Venema Advies Nigeria LimitedMicro Datacenters Venema Advies Nigeria Limited
Micro Datacenters Venema Advies Nigeria Limited
 
EMC: Business Continuity a Disaster Recovery pre virtuálne prostredia
EMC: Business Continuity a Disaster Recovery pre virtuálne prostrediaEMC: Business Continuity a Disaster Recovery pre virtuálne prostredia
EMC: Business Continuity a Disaster Recovery pre virtuálne prostredia
 
Lvm
LvmLvm
Lvm
 
Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...
Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...
Best Practices from EMC: Ingest High Availability Performance, Trust and Effi...
 
Symm basics
Symm basicsSymm basics
Symm basics
 
Veritas Software Foundations
Veritas Software FoundationsVeritas Software Foundations
Veritas Software Foundations
 
Replistor Resume
Replistor ResumeReplistor Resume
Replistor Resume
 
Navisphere manager resume
Navisphere manager resumeNavisphere manager resume
Navisphere manager resume
 
Enterprise-Grade Disaster Recovery Without Breaking the Bank
Enterprise-Grade Disaster Recovery Without Breaking the BankEnterprise-Grade Disaster Recovery Without Breaking the Bank
Enterprise-Grade Disaster Recovery Without Breaking the Bank
 
Designing a Modern Disaster Recovery Environment
Designing a Modern Disaster Recovery EnvironmentDesigning a Modern Disaster Recovery Environment
Designing a Modern Disaster Recovery Environment
 
Logical Volume Manager. An Introduction
Logical Volume Manager. An IntroductionLogical Volume Manager. An Introduction
Logical Volume Manager. An Introduction
 
Provissioning storage
Provissioning storageProvissioning storage
Provissioning storage
 
Symm configuration management
Symm configuration managementSymm configuration management
Symm configuration management
 
AIX Advanced Administration Knowledge Share
AIX Advanced Administration Knowledge ShareAIX Advanced Administration Knowledge Share
AIX Advanced Administration Knowledge Share
 
Replication for Business Continuity, Disaster Recovery and High Availability
Replication for Business Continuity, Disaster Recovery and High AvailabilityReplication for Business Continuity, Disaster Recovery and High Availability
Replication for Business Continuity, Disaster Recovery and High Availability
 
Enterprise grade disaster recovery without breaking the bank
Enterprise grade disaster recovery without breaking the bankEnterprise grade disaster recovery without breaking the bank
Enterprise grade disaster recovery without breaking the bank
 
Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...
Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...
Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And ...
 

Similar a Business Continuity Knowledge Share

Disaster Recovery & Business Continuity Overview
Disaster Recovery & Business Continuity Overview Disaster Recovery & Business Continuity Overview
Disaster Recovery & Business Continuity Overview Aventis Systems, Inc.
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanRishu Mehra
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data  Disaster  Recovery  PlanCreating And Implementing A Data  Disaster  Recovery  Plan
Creating And Implementing A Data Disaster Recovery PlanRishu Mehra
 
Maximizing Business Continuity Success
Maximizing Business Continuity SuccessMaximizing Business Continuity Success
Maximizing Business Continuity SuccessSymantec
 
Business Continuity for Mission Critical Applications
Business Continuity for Mission Critical ApplicationsBusiness Continuity for Mission Critical Applications
Business Continuity for Mission Critical ApplicationsDataCore Software
 
Disaster Recovery: Develop Efficient Critique for an Emergency
Disaster Recovery: Develop Efficient Critique for an EmergencyDisaster Recovery: Develop Efficient Critique for an Emergency
Disaster Recovery: Develop Efficient Critique for an Emergencysco813f8ko
 
What every IT audit should know about backup and recovery
What every IT audit should know about backup and recoveryWhat every IT audit should know about backup and recovery
What every IT audit should know about backup and recoveryessbaih
 
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery:  Understanding Trend, Methodology, Solution, and StandardDisaster Recovery:  Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery: Understanding Trend, Methodology, Solution, and StandardPT Datacomm Diangraha
 
Disaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differenceDisaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differencejeetendra mandal
 
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...MaryJWilliams2
 
Business Continuity Getting Started
Business Continuity Getting StartedBusiness Continuity Getting Started
Business Continuity Getting Startedmxp5714
 
Map r whitepaper_zeta_architecture
Map r whitepaper_zeta_architectureMap r whitepaper_zeta_architecture
Map r whitepaper_zeta_architectureNarender Kumar
 
Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...
Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...
Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...CA Technologies
 
Business continuity planning and disaster recovery
Business continuity planning and disaster recoveryBusiness continuity planning and disaster recovery
Business continuity planning and disaster recoveryKrutiShah114
 
Module 4 disaster recovery student slides ver 1.0
Module 4 disaster recovery   student slides ver 1.0Module 4 disaster recovery   student slides ver 1.0
Module 4 disaster recovery student slides ver 1.0Aladdin Dandis
 
MDS.BackupandRecoveryServices.2011.1006.B
MDS.BackupandRecoveryServices.2011.1006.BMDS.BackupandRecoveryServices.2011.1006.B
MDS.BackupandRecoveryServices.2011.1006.BTracy Hawkey
 
Business Continuity and Recovery Planning for Power Outages
Business Continuity and Recovery Planning for Power OutagesBusiness Continuity and Recovery Planning for Power Outages
Business Continuity and Recovery Planning for Power OutagesARC Advisory Group
 
Not having a good backup
Not having a good backupNot having a good backup
Not having a good backupRita Crawford
 
Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...
Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...
Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...Jade Global
 

Similar a Business Continuity Knowledge Share (20)

Disaster Recovery & Business Continuity Overview
Disaster Recovery & Business Continuity Overview Disaster Recovery & Business Continuity Overview
Disaster Recovery & Business Continuity Overview
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery Plan
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data  Disaster  Recovery  PlanCreating And Implementing A Data  Disaster  Recovery  Plan
Creating And Implementing A Data Disaster Recovery Plan
 
Maximizing Business Continuity Success
Maximizing Business Continuity SuccessMaximizing Business Continuity Success
Maximizing Business Continuity Success
 
Business Continuity for Mission Critical Applications
Business Continuity for Mission Critical ApplicationsBusiness Continuity for Mission Critical Applications
Business Continuity for Mission Critical Applications
 
Disaster Recovery: Develop Efficient Critique for an Emergency
Disaster Recovery: Develop Efficient Critique for an EmergencyDisaster Recovery: Develop Efficient Critique for an Emergency
Disaster Recovery: Develop Efficient Critique for an Emergency
 
What every IT audit should know about backup and recovery
What every IT audit should know about backup and recoveryWhat every IT audit should know about backup and recovery
What every IT audit should know about backup and recovery
 
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery:  Understanding Trend, Methodology, Solution, and StandardDisaster Recovery:  Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
 
Disaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differenceDisaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the difference
 
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
Shielding Data Assets: Exploring Data Protection and Disaster Recovery Strate...
 
Business Continuity Getting Started
Business Continuity Getting StartedBusiness Continuity Getting Started
Business Continuity Getting Started
 
DRP.ppt
DRP.pptDRP.ppt
DRP.ppt
 
Map r whitepaper_zeta_architecture
Map r whitepaper_zeta_architectureMap r whitepaper_zeta_architecture
Map r whitepaper_zeta_architecture
 
Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...
Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...
Case Study: Vivo Automated IT Capacity Management to Optimize Usage of its Cr...
 
Business continuity planning and disaster recovery
Business continuity planning and disaster recoveryBusiness continuity planning and disaster recovery
Business continuity planning and disaster recovery
 
Module 4 disaster recovery student slides ver 1.0
Module 4 disaster recovery   student slides ver 1.0Module 4 disaster recovery   student slides ver 1.0
Module 4 disaster recovery student slides ver 1.0
 
MDS.BackupandRecoveryServices.2011.1006.B
MDS.BackupandRecoveryServices.2011.1006.BMDS.BackupandRecoveryServices.2011.1006.B
MDS.BackupandRecoveryServices.2011.1006.B
 
Business Continuity and Recovery Planning for Power Outages
Business Continuity and Recovery Planning for Power OutagesBusiness Continuity and Recovery Planning for Power Outages
Business Continuity and Recovery Planning for Power Outages
 
Not having a good backup
Not having a good backupNot having a good backup
Not having a good backup
 
Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...
Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...
Getting Most Out of Your Disaster Recovery Infrastructure Using Active Data G...
 

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Business Continuity Knowledge Share

  • 1. What is Business Continuity? Ÿ Business Continuity is the preparation for, response to, and recovery from an application outage that adversely affects business operations Ÿ Business Continuity Solutions address systems unavailability, degraded application performance, or unacceptable recovery strategies There are many factors that need to be considered when calculating the cost of downtime. A formula to calculate the costs of the outage should capture both the cost of lost productivity of employees and the cost of lost income from missed sales. Ÿ The Estimated average cost of 1 hour of downtime = (Employee costs per hour) *( Number of employees affected by outage) + (Average Income per hour). Ÿ Employee costs per hour is simply the total salaries and benefits of all employees per week, divided by the average number of working hours per week. Ÿ Average income per hour is just the total income of an institution per week, divided by average number of hours per week that an institution is open for business. Recovery Point Objective (RPO) is the point in time to which systems and data must be recovered after an outage. This defines the amount of data loss a business can endure. Different business units within an organization may have varying RPOs. Recovery Time Objective (RTO) is the period of time within which systems, applications, or functions must be recovered after an outage. This defines the amount of downtime that a business can endure, and survive. Disaster Recovery versus Disaster Restart Ÿ Most business critical applications have some level of data interdependencies Ÿ Disaster recovery – Restoring previous copy of data and applying logs to that copy to bring it to a known point of consistency – Generally implies the use of backup technology – Data copied to tape and then shipped off-site – Requires manual intervention during the restore and recovery processes Ÿ Disaster restart – Process of restarting mirrored consistent copies of data and applications – Allows restart of all participating DBMS to a common point of consistency utilizing automated application of recovery logs during DBMS initialization – The restart time is comparable to the length of time required for the application to restart after a power failure. Elevated demand for increased application availability confirms the need to ensure business continuity practices are consistent with business needs. Interruptions are classified as either planned or unplanned. Failure to address these specific outage categories seriously compromises a company’s ability to meet business goals. Planned downtime is expected and scheduled, but it is still downtime causing data to be unavailable. Causes of planned downtime include: Ÿ New hardware installation/integration/maintenance Ÿ Software upgrades/patches Ÿ Backups Ÿ Application and data restore Ÿ Data center disruptions from facility operations (renovations, construction, other)
  • 2. Ÿ Refreshing a testing or development environment with production data Ÿ Porting testing/development environment over to production environment Today, the most critical component of an organization is information. Any disaster occurrence will affect information availability critical to run normal business operations. In our definition of disaster, the organization’s primary systems, data, applications are damaged or destroyed. Not all unplanned disruptions constitute a disaster. Business Continuity is a holistic approach to planning, preparing, and recovering from an adverse event. The focus is on prevention, identifying risks, and developing procedures to ensure the continuity of business function. Disaster recovery planning should be included as part of business continuity. BC Objectives include: Ÿ Facilitate uninterrupted business support despite the occurrence of problems. Ÿ Create plans that identify risks and mitigate them wherever possible. Ÿ Provide a road map to recover from any event. Disaster Recovery is more about specific cures, to restore service and damaged assets after an adverse event. In our context, Disaster Recovery is the coordinated process of restoring systems, data, and infrastructure required to support key ongoing business operations. Business Continuity Planning (BCP) is a risk management discipline. It involves the entire business--not just IT. BCP proactively identifies vulnerabilities and risks, planning in advance how to prepare for and respond to a business disruption. A business with strong BC practices in place is better able to continue running the business through the disruption and to return to “business as usual.” BCP actually reduces the risk and costs of an adverse event because the process often uncovers and mitigates potential problems. The Business Continuity Planning process includes the following stages: 1. Objectives Ÿ Determine business continuity requirements and objectives including scope and budget Ÿ Team selection (include all areas of the business and subject matter expertise (internal/external) Ÿ Create the project plan 2. Perform analysis Ÿ Collect information on data, business processes, infrastructure supports, dependencies, frequency of use Ÿ Identify critical needs and assign recovery priorities. Ÿ Create a risk analysis (areas of exposure) and mitigation strategies wherever possible. Ÿ Create a Business Impact Analysis (BIA) Ÿ Create a Cost/benefit analysis – identify the cost (per hour/day, etc.) to the business when data is unavailable. Ÿ Evaluate Options 3. Design and Develop the BCP/Strategies Ÿ Evaluate options Ÿ Define roles/responsibilities Ÿ Develop contingency scenarios Ÿ Develop emergency response procedures Ÿ Detail recovery, resumption, and restore procedures Ÿ Design data protection strategies and develop infrastructure Ÿ Implement risk management/mitigation procedures 4. Train, test, and document
  • 3. 5. Implement, maintain, and assess This is an example of Business Impact Analysis (BIA). The dollar values are arbitrary and are used just for illustration. BIA quantifies the impact that an outage will have to the business and potential costs associated with the interruption. It helps businesses channel their resources based on probability of failure and associated costs. Identifying Single Points of Failure Configure multiple HBAs, and use multi-pathing software Ø Protects against HBA failure Ø Can provide improved performance (vendor dependent)
  • 4. Planning and configuring clusters is a complex task. At a high level: Ÿ A cluster is two or more hosts with access to the same set of storage (array) devices Ÿ Simplest configuration is a two node (host) cluster Ÿ One of the nodes would be the production server while the other would be configured as a standby. This configuration is described as Active/Passive. Ÿ Participating nodes exchange “heart-beats” or “keep-alives” to inform each other about their health. Ÿ In the event of the primary node failure, cluster management software will shift the production workload to the standby server. Ÿ Implementation of the cluster failover process is vendor specific. Ÿ A more complex configuration would be to have both the nodes run production workload on the same set of devices. Either cluster software or application/database should then provide a locking mechanism so that the nodes do not try to update the same areas on disk simultaneously. This would be an Active/Active configuration.
  • 5. Local Replication: Ÿ Data from the production devices is copied over to a set of target (replica) devices. Ÿ After some time, the replica devices will contain identical data as those on the production devices. Ÿ Subsequently copying of data can be halted. At this point-in-time, the replica devices can be used independently of the production devices. Ÿ The replicas can then be used for restore operations in the event of data corruption or other events. Ÿ Alternatively the data from the replica devices can be copied to tape. This off-loads the burden of backup from the production devices. Remote Replication: Ÿ The goals of remote replication are the same as local replication except that data is replicated to different storage arrays Ÿ Storage arrays can be side by side or thousands of miles apart. Ÿ If replicated to remote location, business can continue with little or no interruption and little or no loss of data if primary site is lost. Backup Restore: Ÿ Backup to tape has been the predominant method for ensuring data availability and business continuity. Ÿ Low cost, high capacity disk drives are now being used for backup to disk. This considerably speeds up the backup and the restore process. Ÿ Frequency of backup will be dictated by defined RPO/RTO requirements as well as the rate of change of data. Powerpath: PowerPath is host-based software that resides between the application and the disk device layers. Every I/O from the host to the array must pass through the PowerPath driver software. This allows PowerPath to work in conjunction with the array and connectivity environment to provide intelligent I/O path management. This includes path failover and dynamic load balancing, while remaining transparent to any application I/O requests as it automatically detects and recovers from host-to-array path failures.
  • 6. PowerPath is supported on various hosts and Operating Systems such as Sun- Solaris, IBM-AIX, HP-UX, Microsoft Windows, Linux, and Novell. Storage arrays from EMC, Hitachi, HP, and IBM are supported. The level of OS and array models supported will vary between PowerPath software versions. PowerPath maximizes application availability, optimizes performance, and automates online storage management while reducing complexity and cost, all from one powerful data path management solution. PowerPath supports the following features: Ÿ Multiple path support - PowerPath supports multiple paths between a logical device and a host. Multiple paths enables the host to access a logical device, even if a specific path is unavailable. Also, multiple paths enable sharing of the I/O workload to a given logical device. Ÿ Dynamic load balancing - PowerPath is designed to use all paths at all times. PowerPath distributes I/O requests to a logical device across all available paths, rather than requiring a single path to bear the entire I/O burden. Ÿ Proactive path testing and automatic path recovery - PowerPath uses a path test to ascertain the viability of a path. After a path fails, PowerPath continues testing it periodically to determine if it is fixed. If the path passes the test, PowerPath restores it to service and resumes sending I/O to it. Ÿ Automatic path failover - If a path fails, PowerPath redistributes I/O traffic from that path to functioning paths. Ÿ Online configuration and management - PowerPath management interfaces include a command line interface and a GUI interface on Windows. Ÿ High availability cluster support - PowerPath is particularly beneficial in cluster environments, as it can prevent operational interruptions and costly downtime. Without PowerPath, if a host needed access to 40 devices, and there were four host bus adapters, you would most likely configure it to present 10 unique devices each host bus adapter. With PowerPath, you would configure it in a way to allow all 40 devices could be “seen” by all
  • 7. four host bus adapters. PowerPath supports up to 32 paths to a logical volume. The host can be connected to the array using a number of interconnect topologies such as SAN, SCSI, or iSCSI. The PowerPath filter driver is a platform independent driver that resides between the application and HBA driver. The driver identifies all paths that read and write to the same device and builds a routing table called a volume path set for the device. A volume path set is created for each shared device in the array . PowerPath can use any path in the set to service an I/O request. If a path fails, PowerPath can redirect an I/O request from that path to any other available path in the set. This redirection is transparent to the application, which does not receive an error. This example depicts how PowerPath failover works. When a failure occurs, PowerPath transparently redirects the I/O down the most suitable alternate path. The PowerPath filter driver looks at the volume path set for the device, considers current workload, load balancing, and
  • 8. device priority settings, and chooses the best path to send the I/O down. In the example, PowerPath has three remaining paths to redirect the failed I/O and to load balance. A Backup is a copy of the online data that resides on primary storage. The backup copy is created and retained for the sole purpose of recovering deleted, broken, or corrupted data on the primary disk. The backup copy is usually retained over a period of time, depending on the type of the data, and on the type of backup. There are three derivatives for backup: disaster recovery, Archival, and operational backup. We will review them in more detail, on the next slide. The data that is backed up may be on such media as disk or tape, depending on the backup derivative the customer is targeting. For example, backing up to disk may be more efficient than tape in operational backup environments. Several choices are available to get the data written to the backup media. 1. You can simply copy the data from the primary storage to the secondary storage (disk or tape), onsite. This is a simple strategy, easily implemented, but impacts the production server where the data is located, since it will use the server’s resources. This may be tolerated on some applications, but not high demand ones. 2. To avoid an impact on the production application, and to perform serverless backups, you can mirror (or snap) a production volume. For example, you can mount it on a separate server and then copy it to the backup media (disk or tape). This option will completely free up the production server, with the added infrastructure cost associated with additional resources. 3. Remote Backup, can be used to comply with offsite requirements. A copy from the primary storage is done directly to the backup media that is sitting on another site. The backup media can be a real library, a virtual library or even a remote filesystem. 4. You can do a copy to a first set of backup media, which will be kept onsite for operational restore requirements, and then duplicate it to another set of media for offsite purposes. To simplify thr procedure, you can replicate it to an offsite location to remove any manual procedures associated with moving the backup media to another site. Disaster Recovery addresses the requirement to be able to restore all, or a large part of, an IT infrastructure in the event of a major disaster. Archival is a common requirement used to preserve transaction records, email, and other business work products for regulatory compliance. The regulations could be internal, governmental, or perhaps derived from specific industry requirements. Operational is typically the collection of data for the eventual purpose of restoring, at some point in the future, data that has become lost or corrupted. Reasons for a backup plan include: Ÿ Physical damage to a storage element (such as a disk) that can result in data loss. Ÿ People make mistakes and unhappy employees or external hackers may breach security and maliciously destroy data. Ÿ Software failures can destroy or lose data and viruses can destroy data, impact data integrity, and halt key operations. Ÿ Physical security breaches can destroy equipment that contains data and applications. Ÿ Natural disasters and other events such as earthquakes, lightning strikes, floods, tornados, hurricanes, accidents, chemical spills, and power grid failures can cause not only the loss of data but also the loss of an entire computer facility. Offsite data storage is often justified to protect a business from these types of events.
  • 9. Ÿ Government regulations may require certain data to be kept for extended timeframes. Corporations may establish their own extended retention policies for intellectual property to protect them against litigation. The regulations and business requirements that drive data as an archive generally require data to be retained at an offsite location. Backup products vary, but they do have some common characteristics. The basic architecture of a backup system is client-server, with a backup server and some number of backup clients or agents. The backup server directs the operations and owns the backup catalog (the information about the backup). The catalog contains the table-of-contents for the data set. It also contains information about the backup session itself. The backup server depends on the backup client to gather the data to be backed up. The backup client can be local or it can reside on another system, presumably to backup the data visible to that system. A backup server receives backup metadata from backup clients to perform its activities. There is another component called a storage node. The storage node is the entity responsible for writing the data set to the backup device. Typically there is a storage node packaged with the backup server and the backup device is attached directly to the backup server’s host platform. Storage nodes play an important role in backup planning as it can be used to consolidate backup servers. The following represents a typical Backup process: Ÿ The Backup Server initiates the backup process (starts the backup application). Ÿ The Backup Server sends a request to a server to “send me your data”. Ÿ The server sends the data to the Backup Server and/or Storage Node. Ÿ The Storage Node sends the data to the tape storage device and the Backup Server begins building the catalog (metadata) of the backup session. Ÿ When all of the data has been transferred from the server to the Backup Server, the Backup Server writes the catalog to a disk file and closes the connection to the tape device. Some important decisions that need consideration before implementing a Backup/Restore solution are shown above. Some examples include: Ÿ The Recovery Point Objective (RPO) Ÿ The Recovery Time Objective (RTO) Ÿ The media type to be used (disk or tape) Ÿ Where and when the restore operations will occur – especially if an alternative host will be used to receive the restore data. Ÿ When to perform backups. Ÿ The granularity of backups – Full, Incremental or cumulative.
  • 10. Ÿ How long to keep the backup – for example, some backups need to be retained for 4 years, others just for 1 month Ÿ Is it necessary to take copies of the backup or not. Data Considerations: File Characteristics Ÿ Location: Many organizations have dozens of heterogeneous platforms that support a complex application. Consider a data warehouse where data from many sources is fed into the warehouse. When this scenario is viewed as “The Data Warehouse Application”, it easily fits this model. Some of the issues are: − How the backups for subsets of the data are synchronized − How these applications are restored Ÿ Size: Backing up a large amount of data that consists of a few big files may have less system overhead than backing up a large number of small files. If a file system contains millions of small files, the very nature of searching the file system structures for changed files can take hours, since the entire file structure is searched. Ÿ Number: a file system containing one million files with a ten-percent daily change rate will potentially have to create 100,000 entries in the backup catalog. This brings up other issues such as: − How a massive file system search impacts the system − Search time/Media impact − Is there an impact on tape start/stop processing? Data Considerations: Data Compression Many backup devices such as tape drives, have built-in hardware compression technologies. To effectively use these technologies, it is important to understand the characteristics of the data. Some data, such as application binaries, do not compress well. Text data can compress very well, while other data, such as JPEG and ZIP files, are already compressed. Data Considerations: Retention Periods As mentioned before, there are three types of backup models (Operational, Disaster Recovery, and Archive). Each can be defined by its retention period. Retention Periods are the length of time that a particular version of a dataset is available to be restored. Retention periods are driven by the type of recovery the business is trying to achieve: Ÿ For operational restore, data sets could be maintained on a disk primary backup storage target for a period of time, where most restore requests are likely to be achieved, and then moved to a secondary backup storage target, such as tape, for long term offsite storage. Ÿ For disaster recovery, backups must be done and moved to an offsite location. Ÿ For archiving, requirements usually will be driven by the organization’s policy and regulatory conformance requirements. Tapes can be used for some applications, but for others a more robust and reliable solution, such as disks, may be more appropriate. Backup Methods: Backing up databases can occur useing two different methods: Ÿ A Hot backup, which means that the application is still up and running, with users accessing it, while backup is taking place. Ÿ A Cold backup, which means that the application will be shut down for the backup to take place. Most backup applications offer various Backup Agents to do these kinds of operations. There will be different agents for different types of data and applications.
  • 11. The granularity and levels for backups depend on business needs, and, to some extent, technological limitations. Some backup strategies define as many as ten levels of backup. IT organizations use a combination of these to fulfill their requirements. Most use some combination of Full, Cumulative, and Incremental backups. A Full backup is a backup of all data on the target volumes, regardless of any changes made to the data itself. An Incremental backup contains the changes since the last backup, of any type, whichever was most recent. A Cumulative backup, also known as a Differential backup, is a type of incremental that contains changes made to a file since the last full backup. Following is an example of an incremental backup and restore: 1. A full backup of the business data is taken on Monday evening. Each day after that, an incremental backup is taken. These incremental backups only backup files that are new or that have changed since the last full or incremental backup.
  • 12. 2. On Tuesday, a new file is added, File 4. No other files have changed. Since File 4 is a new file added after the previous backup on Monday evening, it will be backed up Tuesday evening. 3. On Wednesday, there are no new files added since Tuesday, but File 3 has changed. Since File 3 was changed after the previous evening backup (Tuesday), it will be backed up Wednesday evening. 4. On Thursday, no files have changed but a new file has been added, File 5. Since File 5 was added after the previous evening backup, it will be backed up Thursday evening. 5. On Friday morning, there is a data corruption, so the data must be restored from tape. a. The first step is to restore the full backup from Monday evening. Then, every incremental backup that was done since the last full backup must be applied, which, in this example, means the: b. Tuesday, c. Wednesday, and d. Thursday incremental backups. The following is an example of cumulative backup and restore: 1. A full backup of the data is taken on Monday evening. Each day after that, a cumulative backup is taken. These cumulative backups backup ALL FILES that have changed since the LAST FULL BACKUP. 2. On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last full backup, it will be backed up Tuesday evening. 3. On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have been added or changed since the last full backup, both files will be backed up Wednesday evening. 4. On Thursday, File 6 is added. Again, File 4, File 5, and File 6 are files that have been added or changed since the last full backup; all three files will be backed up Thursday evening. 5. On Friday morning, there is a corruption of the data, so the data must be restored from tape. a. The first step is to restore the full backup from Monday evening. b. Then, only the backup from Thursday evening is restored because it contains all the new/changed files from Tuesday, Wednesday, and Thursday.
  • 13. Backup Architecture Topologies Ÿ There are 3 basic backup topologies: – Direct Attached Based Backup – LAN Based Backup – SAN Based Backup Ÿ These topologies can be integrated, forming a “mixed” topology Direct Attached Based Backups Lan Based Backups:
  • 14. SAN Based backups (LAN Free): SAN/LAN Mixed Based Backups:
  • 15. Backup Media: Ÿ Tape Ÿ Disk – Traditional destination for – Random access backups – Protected by the storage – Sequential access array (RAID, hot spare, etc) – No protection Ÿ Multiple streams interleaved to achieve higher throughput on tape – Keeps the tape streaming, for maximum write performance – Helps prevent tape mechanical failure – Greatly increases time to restore Tape drive streaming is recommended from all vendors, in order to keep the drive busy. If you do not keep the drive busy during the backup process (writing), performance will suffer. Multiple streaming helps to improve performance drastically, but it generates one issue as well: the backup data becomes interleaved, and thus the recovery times are increased
  • 16. Backup to disk replaces tape and its associated devices, as the primary target for backup, with disk. Backup to disk systems offer major advantages over equivalent scale tape systems, in terms of capital costs, operating costs, support costs, and quality of service. It can be implemented fully on day 1 or over a phased approach. In a traditional approach for backup and archive, businesses take a backup of production. Typically backup jobs use weekly full backups and nightly incremental backups. Based on business requirements, they will then copy the backup jobs and eject the tapes to have them sent offsite, where they will be stored for a specified amount of time. The problem with this approach is simple - as the production environment grows, so does the backup environment. Ÿ Production environment grows – Requires constant tuning and data placement to maintain performance – Need to add more tier-1 storage Ÿ Backup environment grows – Backup windows get longer and jobs do not complete – Restores take longer – Requires more tape drives and silos to keep up with service levels Ÿ Archive environment grows – Impact flexibility to retrieve content when requested – Requires more media, adding management cost – No investment protection for long term retention requirements Differences Between Backup / Recovery & Archive: The recovery process is much more important than the backup process. It is based on the appropriate recovery-point objectives (RPOs) and recovery-time objectives (RTOs). The process usually drives a decision to have a combination of technologies in place, from online local replicas, to backup to disk, to backup to tape for long-term, passive RPOs.
  • 17. Archive processes are determined not only by the required retention times, but also by retrieval- time service levels and the availability requirements of the information in the archive. For both processes, a combination of hardware and software is needed to deliver the appropriate service level. The best way to discover the appropriate service level is to classify the data and align the business applications with it. Replication: What is replication? Local replication is a technique for ensuring Business Continuity by making exact copies of data. With replication, data on the replica will be identical to the data on the original at the point-in- time that the replica was created. Examples: Ÿ Copy a specific file Ÿ Copy all the data used by a database application Ÿ Copy all the data in a UNIX Volume Group (including underlying logical volumes, file systems, etc.) Ÿ Copy data on a storage array to a remote storage array Replicas can be used to address a number of Business Continuity functions: Ÿ Provide an alternate source for backup to alleviate the impact on production. Ÿ Provide a source for fast recovery to facilitate faster RPO and RTO. Ÿ Decision Support activities such as reporting. − For example, a company may have a requirement to generate periodic reports. Running the reports off of the replicas greatly reduces the burden placed on the production volumes. Typically reports would need to be generated once a day or once a week, etc. Ÿ Developing and testing proposed changes to an application or an operating environment. − For example, the application can be run on an alternate server using the replica volumes and any proposed design changes can be tested. Ÿ Data migration. − Migration can be as simple as moving applications from one server to the next, or as complicated as migrating entire data centers from one location to another. Key factors to consider with replicas: Ÿ What makes a replica good: − Recoverability from a failure on the production volumes. The replication technology must allow for the restoration of data from the replicas to the production and then allow production to resume with a minimal RPO an RTO. − Consistency/re-startability is very important if data on the replicas will be accessed directly or if the replicas will be used for restore operations. Ÿ Replicas can either be Point-in-Time (PIT) or continuous: − Point-in-Time (PIT) - the data on the replica is an identical image of the production at some specific timestamp Ø For example, a replica of a file system is created at 4:00 PM on Monday. This replica would then be referred to as the Monday 4:00 PM Point-in-Time copy.
  • 18. Ø Note: The RPO will be a finite value with any PIT. The RPO will map to the time when the PIT was created to the time when any kind of failure on the production occurred. If there is a failure on the production at 8:00 PM and there is a 4:00 PM PIT available, the RPO would be 4 hours (8 – 4 = 4). To minimize RPO with PITs, take periodic PITs. − Continuous replica - the data on the replica is synchronized with the production data at all times. Ø The objective with any continuous replication is to reduce the RPO to zero. Database replication can be offline or online: • Offline – replication takes place when the database and the application are shutdown. • Online – replication takes place when the database and the application are running. Ÿ Databases/Applications maintain integrity by following the “Dependent Write I/O Principle” – Dependent Write: A write I/O that will not be issued by an application until a prior related write I/O has completed Ø A logical dependency, not a time dependency – Inherent in all Database Management Systems (DBMS) Ø e.g. Page (data) write is dependent write I/O based on a successful log write – Applications can also use this technology – Necessary for protection against local outages Ø Power failures create a dependent write consistent image Ø A Restart transforms the dependent write consistent to transactionally consistent v i.e. Committed transactions will be recovered, in-flight transactions will be discarded
  • 19. Database applications require that for a transaction to be deemed complete a series of writes have to occur in a particular order (Dependent Write I/O), these writes would be recorded on the various devices/file systems. Ÿ In this example, steps 1-4 must complete for the transaction to be deemed complete. − Step 4 is dependent on Step 3 and will occur only if Step 3 is complete − Step 3 is dependent on Step 2 will occur only if Step 2 is complete − Step 2 is dependent on Step 1 will occur only if Step 1 is complete Ÿ Steps 1-4 are written to the database’s buffer and then to the physical disks. At the point in time when the replica is created, all the writes to the source devices must be captured on the replica devices to ensure data consistency on the replica. Ÿ In this example, steps 1-4 on the source devices must be captured on the replica devices for the data on the replicas to be consistent. Creating a PIT for multiple devices happens quickly, but not instantaneously. Ÿ Steps 1-4 which are dependent write I/Os have occurred and have been recorded successfully on the source devices Ÿ It is possible that steps 3 and 4 were copied to the replica devices, while steps 1 and 2 were not copied. Ÿ In this case, the data on the replica is inconsistent with the data on the source. If a restart were to be performed on the replica devices, Step 4 which is available on the replica might indicate that a particular transaction is complete, but all the data associated with the transaction will be unavailable on the replica making the replica inconsistent.
  • 20. Database replication can be performed with the application offline (i.e., application is shutdown, no I/O activity) or online (i.e., while the application is up and running). If the application is offline, the replica will be consistent because there is no activity. However, consistency is an issue if the database application is replicated while it is up and running. Online Replication – Some database applications allow replication while the application is up and running – The production database would have to be put in a state which would allow it to be replicated while it is active – Some level of recovery must be performed on the replica to make the replica consistent An alternative way to ensure that an online replica is consistent is to: 1. Hold I/O to all the devices at the same instant. 2. Create the replica. 3. Release the I/O. Holding I/O is similar to a power failure and most databases have the ability to restart from a power failure. Note: While holding I/O simultaneously one ensures that the data on the replica is identical to that on the source devices, the database application will timeout if I/O is held for too long. Changes will occur on the production volume after the creation of a PIT, changes could also occur on the target. Typically the target device will be re-synchronized with the source device at some future time in order to obtain a more recent PIT. Note: The replication technology employed should have a mechanism to keep track of changes. This makes the re-synchronization process will be much faster. If the replication technology does
  • 21. not track changes between the source and target, every resynchronization operation will have to be a full operation. Replication technologies can classified by: Ÿ Distance over which replication is performed - local or remote Ÿ Where the replication is performed - host or array based − Host based - all the replication is performed by using the CPU resources of the host using software that is running on the host. − Array based - all replication is performed on the storage array using CPU resources on the array via the array’s operating environment. Note: In the context of this discussion, local replication refers to replication that is performed within a data center if it is host based and within a storage array if it is array based. Ÿ Host based – Logical Volume Manager (LVM) based mirroring – File System Snapshots Ÿ Storage Array based – Full volume mirroring – Full volume: Copy on First Access – Pointer based: Copy on First Write Ÿ Host resident software responsible for creating and controlling host level logical storage – Physical view of storage is converted to a logical view by mapping. Logical data blocks are mapped to physical data blocks. – Logical layer resides between the physical layer (physical devices and device drivers) and the application layer (OS and applications see logical view of storage). Ÿ Usually offered as part of the operating system or as third party host software Ÿ LVM Components: – Physical Volumes – Volume Groups – Logical Volumes A Volume Group is created by grouping together one or more Physical Volumes. Physical Volumes: Ÿ Can be added or removed from a Volume Group dynamically. Ÿ Cannot be shared between Volume Groups, the entire Physical Volume becomes part of a Volume Group. Each Physical Volume is partitioned into equal-sized data blocks. The size of a Logical Volume is based on a multiple of the equal-sized data block. The Volume Group is handled as a single unit by the LVM. Ÿ A Volume Group as a whole can be activated or deactivated. Ÿ A Volume Group would typically contain related information. For example, each host would have a Volume Group which holds all the OS data, while applications would be on separate Volume Groups. Logical Volumes are created within a given Volume Group. A Logical Volume can be thought of as a virtual disk partition, while the Volume Group itself can be though of as a disk. A Volume Group can have a number of Logical Volumes.
  • 22. Logical Volumes (LV) form the basis of logical storage. They contain logically contiguous data blocks (or logical partitions) within the volume group. Each logical partition is mapped to at least one physical partition on a physical volume within the Volume Group. The OS treats an LV like a physical device and accesses it via device special files (character or block). A Logical Volume: Ÿ Can only belong to one Volume Group. However, a Volume Group can have multiple LVs. Ÿ Can span multiple physical volumes. Ÿ Can be made up of physical disk blocks that are not physically contiguous. Ÿ Appears as a series of contiguous data blocks to the OS. Ÿ Can contain a file system or be used directly. Note: There is a one-to-one relationship between LV and a File System. Note: Under normal circumstances there is a one-to-one mapping between a logical and physical Partition. A one-to-many mapping between a logical and physical partition leads to mirroring of Logical Volumes.
  • 23. Ÿ LVM based replicas add overhead on host CPUs Ÿ If host devices are already Storage Array devices then the added redundancy provided by LVM mirroring is unnecessary – The devices will have some RAID protection already Ÿ Host based replicas can be usually presented back to the same server Ÿ Keeping track of changes after the replica has been created With storage array based local replication: Ÿ Replication performed by the Array Operating Environment − Array CPU resources are used for the replication operations − Host CPU resources can be devoted to production operations instead of replication operations Ÿ Replicas are on the same array − Can be accessed by an alternate host for any BC operations Ÿ Typically array based replication is performed at a array device level. − Need to map storage components used by an application back to the specific array devices used – then replicate those devices on the array. − A database could be laid out on over multiple physical volumes which belong. One would have to replicate all the devices for a PIT copy of the database. Ÿ For future re-synchronization to be incremental, most vendors have the ability to track changes at some level of granularity (e.g., 512 byte block, 32 KB, etc.)
  • 24. – Tracking is typically done with some kind of bitmap Ÿ Target device must be at least as large as the Source device – For full volume copies the minimum amount of storage required is the same as the size of the source Copy on First Access (COFA) provides an alternate method to create full volume copies. Unlike Full Volume mirrors, the replica is immediately available when the session is started (no waiting for full synchronization). Ÿ The PIT is determined by the time of activation of the session. Just like the full volume mirror technology this method requires the Target devices to be at least as large as the source devices. Ÿ A protection map is created for all the data on the Source device at some level of granularity (e.g., 512 byte block, 32 KB, etc.). Then the data is copied from the source to the target in the background based on the mode with which the replication session was invoked. In the Copy on First Access mode (or the deferred mode), data is copied from the source to the target only when: Ÿ A write is issued for the first time after the PIT to a specific address on the source Ÿ A read or write is issued for the first time after the PIT to a specific address on the target. Since data is only copied when required, if the replication session is terminated the target device will only have data that was copied (not the entire contents of the source at the PIT). In this scenario, the data on the Target cannot be used as it is incomplete. Ÿ Targets do not hold actual data, but hold pointers to where the data is located – Actual storage requirement for the replicas is usually a small fraction of the size of the source volumes Ÿ A replication session is setup between the Source and Target devices and started
  • 25. When the session is setup based on the specific vendors implementation a protection map is created for all the data on the Source device at some level of granularity (e.g 512 byte block, 32 KB etc.) – Target devices are accessible immediately when the session is started – At the start of the session the Target device holds pointers to the data on the Source device The original data block from the Source is copied to the save location, when a data block is first written to after the PIT. Ÿ Prior to a new write to the source or target device: − Data is copied from the source to a “save” location − The pointer for that specific address on the Target then points to the “save” location − Writes to the Target result in writes to the “save” location and the updating of the pointer to the “save” location Ÿ If a write is issued to the source for the first time after the PIT the original data block is copied to the save location and the pointer is updated from the Source to the save location. Ÿ If a write is issued to the Target for the first time after the PIT the original data is copied from the Source to the Save location, the pointer is updated and then the new data is written to the save location. Ÿ Reads from the Target are serviced by the Source device or from the save location based on the where the pointer directs the read. − Source – When data has not changed since PIT − Save Location – When data has changed since PIT Data on the replica is a combined view of unchanged data on the Source and the save location. Hence if the Source device becomes unavailable the replica will no longer have valid data.
  • 26. Most array based replication technologies will allow the creation of Consistent replicas by holding IO to all devices simultaneously when the PIT is created. Ÿ Typically applications are spread out over multiple devices − Could be on the same array or multiple arrays Ÿ Replication technology must ensure that the PIT for the whole application is consistent − Need mechanism to ensure that updates do not occur while PIT is created Ÿ Hold IO to all devices simultaneously for an instant, create PIT and release IO − Cannot hold IO for too long, application will timeout Mechanisms to hold IO Ÿ Host based − Some host based application could be used to hold IO to all the array devices that are to be replicated when the PIT is created − Typically achieved at the device driver level or above before the IO reaches the HBAs Ø Some vendors implement this at the multi-pathing software layer Ÿ Array based − IOs can be held for all the array devices that are to be replicated by the Array Operating Environment in the array itself when the PIT is created What if the application straddles multiple hosts and multiple arrays? Ÿ Federated Databases Ÿ Some array vendors are able to ensure consistency in this situation Array Replicas: Restore/Restart Considerations Ÿ Production has a failure – Logical Corruption – Physical failure of production devices – Failure of Production server Ÿ Solution
  • 27. Restore data from replica to production Ø The restore would typically be done in an incremental manner and the Applications would be restarted even before the synchronization is complete leading to very small RTO -----OR------ – Start production on replica Ø Resolve issues with production while continuing operations on replicas Ø After issue resolution restore latest data on replica to production – Before a Restore – Stop all access to the Production devices and the Replica devices – Identify Replica to be used for restore Ø Based on RPO and Data Consistency – Perform Restore – Before starting production on Replica – Stop all access to the Production devices and the Replica devices – Identify Replica to be used for restart Ø Based on RPO and Data Consistency – Create a “Gold” copy of Replica Ø As a precaution against further failures – Start production on Replica – RTO drives choice of replication technology Ÿ Full Volume Replicas – Restores can be performed to either the original source device or to any other device of like size Ø Restores to the original source could be incremental in nature Ø Restore to a new device would involve a full synchronization Ÿ Pointer Based Replicas – Restores can be performed to the original source or to any other device of like size as long as the original source device is healthy Ø Target only has pointers v Pointers to source for data that has not been written to after PIT v Pointers to the “save” location for data was written after PIT Ø Thus to perform a restore to an alternate volume the source must be healthy to access data that has not yet been copied over to the target Ÿ Full Volume Replica – Replica is a full physical copy of the source device – Storage requirement is identical to the source device – Restore does not require a healthy source device – Activity on replica will have no performance impact on the source device – Good for full backup, decision support, development, testing and restore to last PIT – RPO depends on when the last PIT was created – RTO is extremely small Ÿ Pointer based - COFW – Replica contains pointers to data Ø Storage requirement is a fraction of the source device (lower cost) – Restore requires a healthy source device – Activity on replica will have some performance impact on source
  • 28. Ø Any first write to the source or target will require data to be copied to the save location and move pointer to save location Ø Any read IO to data not in the save location will have to be serviced by the source device – Typically recommended if the changes to the source are less than 30% – RPO depends on when the last PIT was created – RTO is extremely small Ÿ Full Volume – COFA Replicas – Replica only has data that was accessed – Restore requires a healthy source device – Activity on replica will have some performance impact Ø Any first access on target will require data to be copied to target before the I/O to/from target can be satisfied – Typically replicas created with COFA only are not as useful as replicas created with the full copy mode – Recommendation would be to use the full copy mode it the technology allows such an option EMC – Local Replication Solutions All the local replication solutions that were discussed in this module are available on EMC Symmetrix and CLARiiON arrays. Ÿ EMC TimeFinder/Mirror and EMC TimeFinder/Clone are full volume replication solutions on the Symmetrix arrays, while EMC TimeFinder/Snap is a pointer based replication solution on the Symmetrix. EMC SnapView on the CLARiiON arrays allows full volume replication via SnapView Clone and pointer based replication via SnapView Snapshot. Ÿ EMC TimeFinder/Mirror: Highly available, ultra-performance mirror images of Symmetrix volumes that can be non-disruptively split off and used as point-in- time copies for backups, restores, decision support, or contingency uses. Ÿ EMC TimeFinder/Clone: Highly functional, high-performance, full volume copies of Symmetrix volumes that can be used as point-in-time copies for data warehouse refreshes, backups, online restores, and volume migrations. Ÿ EMC SnapView Clone: Highly functional, high-performance, full volume copies of CLARiiON volumes that can be used as point-in-time copies for data warehouse refreshes, backups, online restores, and volume migrations.
  • 29. Ÿ EMC TimeFinder/Snap: High function, space-saving, pointer-based copies (logical images) of Symmetrix volumes that can be used for fast and efficient disk-based restores. Ÿ EMC SnapView Snapshot: High function, space-saving, pointer-based copies (logical images) of CLARiiON volumes that can be used for fast and efficient disk- based restores. We will discuss EMC TimeFinder/Mirror and EMC SnapView Snapshot in more detail in the next few slides. Ÿ Array based local replication technology for Full Volume Mirroring on EMC Symmetrix Storage Arrays – Create Full Volume Mirrors of an EMC Symmetrix device within an Array Ÿ TimeFinder/Mirror uses special Symmetrix devices called Business Continuance Volumes (BCV). BCVs: – Are devices dedicated for Local Replication – Can be dynamically, non-disruptively established with a Standard device. They can be subsequently split instantly to create a PIT copy of data. Ÿ The PIT copy of data can be used in a number of ways: – Instant restore – Use BCVs as standby data for recovery – Decision Support operations – Backup – Reduce application downtime to a minimum (offline backup) – Testing Ÿ TimeFinder/Mirror is available in both Open Systems and Mainframe environments Ÿ Establish – Synchronize the Standard volume to the BCV volume – BCV is set to a Not Ready state when established Ø BCV cannot be independently addressed – Re-synchronization is incremental – BCVs cannot be established to other BCVs – Establish operation is non-disruptive to the Standard device – Operations to the Standard can proceed as normal during the establish Ÿ Split – Time of Split is the Point-in-Time – BCV is made accessible for BC Operations – Consistency Ø Consistent Split – Changes tracked The TimeFinder/Mirror Consistent Split option ensures that the data on the BCVs is consistent with the data on the Standard devices. Consistent Split holds I/O across a group of devices using a single Consistent Split command, thus all the BCVs in the group are consistent point-in-time copies. Used to create a consistent point-in-time copy of an entire system, an entire database, or any associated set of volumes. The holding of I/Os can be either done by the EMC PowerPath multi-pathing software or by the Symmetrix Microcode (Enginuity Consistency Assist). PowerPath-based consistent split executed by the host doing the I/O, I/O is held at the host before the split. Enginuity Consistency Assist (ECA) based consistent split can be executed, by the host doing the I/O or by a control host in an environment where there are distributed and/or related databases. I/O held at the Symmetrix until the split operation is completed. Since I/O is held at the Symmetrix, ECA can be used to perform consistent splits on BCV pairs across multiple, heterogeneous hosts.
  • 30. Ÿ Restore – Synchronize contents of BCV volume to the Standard volume – Restore can be full or incremental – BCV is set to a Not Ready state – I/Os to the Standard and BCVs should be stopped before the restore is initiated Ÿ Query – Provide current status of BCV/Standard volume pairs TimeFinder/Mirror allows a given Standard device to maintain incremental relationships with multiple BCVs. This means that different BCVs can be established and then split incrementally from a standard volume at different times of the day. For example a BCV that was split at 4:00 a.m. can be re- established incrementally even though another BCV was established and split at 5:00 a.m. In this way, a user can split and incrementally re-establish volumes throughout the day or night and still keep re-establish times to a minimum. Incremental information can be retained between a STD device and multiple BCV devices, provided the BCV devices have not been paired with different STD devices. The incremental relationship is maintained between each STD/BCV pairing by the Symmetrix Microcode. Ÿ Two BCVs can be established concurrently with the same Standard device Ÿ Establish BCVs simultaneously or one after the other Ÿ BCVs can be split individually or simultaneously. Ÿ Simultaneous. “Concurrent Restores”, are not allowed
  • 31. SnapView is software that runs on the CLARiiON Storage Processors, and is part of the CLARiiON Replication Software suite of products, which includes SnapView, MirrorView and SAN Copy. SnapView can be used to make point in time (PIT) copies in 2 different ways – Clones, also called BCVs or Business Continuity Volumes, are full copies, whereas Snapshots use a pointer-based mechanism. Full copies are covered later, when we look at Symmetrix TimeFinder; SnapView Snapshots will be covered here. The generic pointer-based mechanism has been discussed in a previous section, so we’ll concentrate on SnapView here. Snapshots require a save area, called the Reserved LUN Pool. The ‘Reserved’ part of the name implies that the LUNs are reserved for use by CLARiiON software, and can therefore not be assigned to a host. LUNs which cannot be assigned to a host are known as private LUNs in the CLARiiON environment. To keep the number of pointers, and therefore the pointer map, at a reasonable size, SnapView divides the LUN to be snapped, called a Source LUN, into areas of 64 kB in size. Each of these areas is known as a chunk. Any change to data inside a chunk will cause that chunk to be written to the Reserved LUN Pool, if it is being modified for the first time. The 64 kB copied from the Source LUN must fit into a 64 kB area in the Reserved LUN, so Reserved LUNs are also divided into chunks for tracking purposes. The next 2 slides show more detail on the Reserved LUN Pool, and allocation of Reserved LUNs to a Source LUN. The CLARiiON storage system must be configured with a Reserved LUN Pool in order to use SnapView Snapshot features. The Reserved LUN Pool consists of 2 parts: LUNs for use by SPA and LUNs for use by SPB. Each of those parts is made up of one or more Reserved LUNs. The LUNs used are bound in the normal manner. However, they are not placed in storage groups and allocated to hosts, they are used internally by the storage system software. These are known as private LUNs because they cannot be used, or seen, by attached hosts. Like any LUN, a Reserved LUN will be owned by only one SP at any time and they may be trespassed if the need should arise (i.e., if an SP should fail). Just as each storage system model has a maximum number of LUNs it will support, each also has a maximum number of LUNs which may be added to the Reserved LUN Pool. The first step in SnapView configuration will usually be the assignment of LUNs to the Reserved LUN Pool. Only then will SnapView Sessions be allowed to start. Remember that as snapable
  • 32. Remote Replication Concepts LUNs are added to the storage system, the LUN Pool size will have to be reviewed. Changes may be made online. LUNs used in the Reserved LUN Pool are not host-visible, though they do count towards the maximum number of LUNs allowed on a storage system. Ÿ Replica is available at a remote facility – Could be a few miles away or half way around the world – Backup and Vaulting are not considered remote replication Ÿ Synchronous Replication – Replica is identical to source at all times – Zero RPO Ÿ Asynchronous Replication – Replica is behind the source by a finite margin – Small RPO Ÿ Connectivity – Network infrastructure over which data is transported from source site to remote site Synchronous Replication Ÿ A write has to be secured on the remote replica and the source before it is acknowledged to the host Ÿ Ensures that the source and remote replica have identical data at all times – Write ordering is maintained at all times Ø Replica receives writes in exactly the same order as the source Ÿ Synchronous replication provides the lowest RPO and RTO – Goal is zero RPO – RTO is as small as the time it takes to start application on the remote site Ÿ Response Time Extension – Application response time will be extended due to synchronous replication Ø Data must be transmitted to remote site before write can be acknowledged Ø Time to transmit will depend on distance and bandwidth Ÿ Bandwidth – To minimize impact on response time, sufficient bandwidth must be provided for at all times Ÿ Rarely deployed beyond 200 km Asynchronous Replication Ÿ Write is acknowledged to host as soon as it is received by the source Ÿ Data is buffered and sent to remote – Some vendors maintain write ordering – Other vendors do not maintain write ordering, but ensure that the replica will always be a consistent re-startable image Ÿ Finite RPO – Replica will be behind the Source by a finite amount – Typically configurable Ÿ Response Time unaffected Ÿ Bandwidth – Need sufficient bandwidth on average Ÿ Buffers
  • 33. – Need sufficient buffers Ÿ Can be deployed over long distances Remote Replication Technologies Ÿ Host based – Logical Volume Manager (LVM) Ø Synchronous/Asynchronous – Log Shipping Ÿ Storage Array based – Synchronous – Asynchronous – Disk Buffered - Consistent PITs Ø Combination of Local and Remote Replication Ÿ Duplicate Volume Groups at local and remote sites Ÿ All writes to the source Volume Group are replicated to the remote Volume Group by the LVM – Synchronous or Asynchronous LVM Based Remote Replication Ÿ In the event of a network failure – Writes are queued in the log file – When the issue is resolved the queued writes are sent over to the remote – The maximum size of the log file determines the length of outage that can be withstood Ÿ In the event of a failure at the source site, production operations can be transferred to the remote site Ÿ Advantages – Different storage arrays and RAID protection can be used at the source and remote sites – Standard IP network can be used for replication – Response time issue can be eliminated with asynchronous mode, with extended RPO Ÿ Disadvantages – Extended network outages require large log files – CPU overhead on host Ÿ For maintaining and shipping log files
  • 34. Host Based Log Shipping Log Shipping is a host based replication technology for databases offered by most DB Vendors Ÿ Initial State - All the relevant storage components that make up the database are replicated to a standby server (done over IP or other means) while the database is shutdown Ÿ Database is started on the production server – As and when log switches occur the log file that was closed is sent over IP to the standby server Ÿ Database is started in standby mode on the standby server, as and when log files arrive they are applied to the standby database Ÿ Standby database is consistent up to the last log file that was applied Advantages Ÿ Minimal CPU overhead on production server Ÿ Low bandwidth (IP) requirement Ÿ Standby Database consistent to last applied log − RPO can be reduced by controlling log switching Disadvantages Ÿ Need host based mechanism on production server to periodically ship logs Ÿ Need host based mechanism on standby server to periodically apply logs and check for consistency Ÿ IP network outage could lead to standby database falling further behind Array Based – Remote Replication Replication Process Ÿ A Write is initiated by an application/server Ÿ Received by the source array Ÿ Source array transmits the write to the remote array via dedicated channels (ESCON, Fibre Channel or Gigabit Ethernet) over a dedicated or shared network infrastructure Ÿ Write received by the remote array Only Writes are forwarded to the remote array Ÿ Reads are from the source devices Array Based – Synchronous Replication
  • 35. Array Based – Asynchronous Replication Ÿ No impact on response time Ÿ Extended distances between arrays Ÿ Lower bandwidth as compared to Synchronous Ÿ Ensuring Consistency – Maintain write ordering Ø Some vendors attach a time stamp and sequence number with each of the writes, then ship the writes to the remote array and apply the writes to the remote devices in the exact order based on the time stamp and sequence numbers Ø Remote array applies the writes in the exact order they were received, just like synchronous – Dependent write consistency Ø Some vendors buffer the writes in the cache of the source array for a period of time (between 5 and 30 seconds) Ø At the end of this time the current buffer is closed in a consistent manner and the buffer is switched, new writes are received in the new buffer Ø The closed buffer is then transmitted to the remote array Ø Remote replica will contain a consistent, re-startable image on the application Disk buffered consistent PITs is a combination of Local and Remote replications technologies. The idea is to make a Local PIT replica and then create a Remote replica of the Local PIT. The advantage of disk buffered PITs is lower bandwidth requirements and the ability to replicate over extended distances. Disk buffered replication is typically used when the RPO requirements are of the order of hours or so, thus a lower bandwidth network can be used to transfer data from the Local PIT copy to the remote site. The data transfer may take a while, but the solution would be designed to meet the RPO. We will take a look at a two disk buffered PIT solutions. Disk buffered replication allows for the incremental resynchronization between a Local Replica which acts as a source for a Remote Replica.
  • 36. Benefits include: Ÿ Reduction in communication link cost and improved resynchronization time for long-distance replication implementations Ÿ The ability to use the various replicas to provide disaster recovery testing, point- in-time backups, decision support operations, third-party software testing, and application upgrade testing or the testing of new applications. Synchronous + Extended Distance Consistent PIT Synchronous + Extended Distance Buffered Replication benefits include: Ÿ Bunker site provides a zero RPO DR Replica Ÿ The ability to resynchronize only changed data between the intermediate Bunker site and the final target site, reducing required network bandwidth Ÿ Reduction in communication link cost and improved resynchronization time for long-distance replication implementations Ÿ The ability to use the replicas to provide disaster recovery testing, point-in-time backups, decision support operations, third-party software testing, and application upgrade testing or the testing of new applications. Remote Replicas – Tracking Changes Ÿ Remote replicas can be used for BC Operations – Typically remote replication operations will be suspended when the remote replicas are used for BC Operations Ÿ During BC Operations changes will/could happen to both the source and remote replicas – Most remote replication technologies have the ability to track changes made to the source and remote replicas to allow for incremental re-synchronization – Resuming remote replication operations will require re-synchronization between the source and replica
  • 37. Primary Site Failure – Operations at Remote Site Ÿ Remote replicas are typically not available for use while the replication session is in progress Ÿ In the event of a primary site failure the replicas have to be made accessible for use Ÿ Create a local replica of the remote devices at the remote site Ÿ Start operations at the Remote site – No remote protection while primary site issues are resolved Ÿ After issue resolution at Primary Site – Stop activities at remote site – Restore latest data from remote devices to source – Resume operations at Primary (Source) Site Array Based – Which Technology? Ÿ Synchronous – Is a must if zero RPO is required – Need sufficient bandwidth at all times – Application response time elongation will prevent extended distance solutions (rarely above 125 miles) Ÿ Asynchronous – Extended distance solutions with minimal RPO (order of minutes) – No Response time elongation – Generally requires lower Bandwidth than synchronous – Must design with adequate cache/buffer or sidefile/logfile capacity Ÿ Disk Buffered Consistent PITs – Extended distance solution with RPO in the order of hours – Generally lower bandwidth than synchronous or asynchronous Storage Array Based – Remote Replication Ÿ Network Options – Most vendors support ESCON or Fibre Channel adapters for remote replication Ø Can connect to any optical or IP networks with appropriate protocol converters for extended distances v DWDM v SONET v IP Networks – Some Vendors have native Gigabit Ethernet adapters which allows the array to be connected directly to IP Networks without the need for protocol converters Dense Wavelength Division Multiplexing (DWDM) Ÿ DWDM is a technology that puts data from different sources together on an optical fiber with each signal carried on its own separate light wavelength (commonly referred to as a lambda or λ). Ÿ Up to 32 protected and 64 unprotected separate wavelengths of data can be multiplexed into a light stream transmitted on a single optical fiber.
  • 38. Synchronous Optical Network (SONET) Synchronous Optical Networks (SONET) is a standard for optical telecommunications transport formulated by the Exchange Carriers Standards Association (ECSA) for the American National Standards Institute (ANSI). The equivalent international standard is referred to as Synchronous Digital Hierarchy and is defined by the European Telecommunications Standards Institute (ETSI). Within Metropolitan Area Networks (MANs) today, SONET/SDH rings are used to carry both voice and data traffic over fiber. Ÿ SONET is Time Division Multiplexing (TDM) technology where traffic from multiple subscribers is multiplexed together and sent out onto the SONET ring as an optical signal Ÿ Synchronous Digital Hierarchy (SDH) similar to SONET but is the European standard Ÿ SONET/SDH, offers the ability to service multiple locations, its reliability/availability, automatic protection switching, and restoration EMC – Remote Replication Solutions Ÿ EMC Symmetrix Arrays – EMC SRDF/Synchronous – EMC SRDF/Asynchronous – EMC SRDF/Automated Replication Ÿ EMC CLARiiON Arrays – EMC MirrorView/Synchronous – EMC MirrorView/Asynchronous All remote replication solutions that were discussed in this module are available on EMC Symmetrix and CLARiiON Arrays. The SRDF (Symmetrix Remote Data Facility) family of products provides Synchronous, Asynchronous and Disk Buffered remote replication solutions on the EMC Symmetrix Arrays. The MirrorView family of products provides Synchronous and Asynchronous remote replication solutions on the EMC CLARiiON Arrays. SRDF/Synchronous (SRDF/S): High-performance, host-independent, real-time synchronous remote replication from one Symmetrix to one or more Symmetrix systems. MirrorView/Synchronous (MirrorView/S): Host-independent, real-time synchronous remote replication from one CLARiiON to one or more CLARiiON systems. SRDF/Asynchronous (SRDF/A): High-performance extended distance asynchronous replication for Symmetrix arrays using a Delta Set architecture for reduced bandwidth requirements and no host performance impact. Ideal for Recovery Point Objectives of the order of minutes. MirrorView/Asynchronous (MirrorView/A): Asynchronous remote replication on CLARiiON arrays. Designed with low-bandwidth requirements, delivers a cost-effective remote replication solution ideal for Recovery Point Objectives (RPOs) of 30 minutes or greater. SRDF/Automated Replication: Rapid business restart over any distance with no data exposure through advanced single-hop and multi-hop configurations using combinations of TimeFinder/Mirror and SRDF on Symmetrix Arrays. EMC SRDF/Synchronous - Introduction Ÿ Array based Synchronous Remote Replication technology for EMC Symmetrix Storage Arrays – Facility for maintaining real-time physically separate mirrors of selected volumes Ÿ SRDF/Synchronous uses special Symmetrix devices – Source arrays have SRDF R1 devices
  • 39. – Target arrays have SRDF R2 devices – Data written to R1 devices are replicated to R2 devices Ÿ SRDF uses dedicated channels to send data from source to target array – ESCON, Fibre Channel or Gigabit Ethernet are supported Ÿ SRDF is available in both Open Systems and Mainframe environments SRDF Source and Target Volumes Ÿ SRDF R1 and R2 Volumes can have any local RAID Protection – E.g. Volumes could have RAID-1 or RAID-5 protection Ÿ SRDF R2 volumes are in a Read Only state when remote replication is in effect – Changes cannot be made to the R2 volumes Ÿ SRDF R2 volumes are accessed under certain circumstances – Failover – Invoked when the primary volumes become unavailable – Split – Invoked when the R2 volumes need to be concurrently accessed for BC operations SRDF/Synchronous SRDF Operations - Failover Failover operations are performed if the SRDF R1 Volumes become unavailable and the decision is made to start operations on the R2 Devices. Failover could also be performed when DR processes are being tested or for any maintenance tasks that have to be performed at the source site. If failing over for a Maintenance operation: For a clean, consistent, coherent point in time copy which can be used with minimal recovery on the target side some or all of the following steps may have to be taken on the source side: Ÿ Stop All Applications (DB or what ever) Ÿ Unmount file system. Ÿ Deactivate the Volume Group Ÿ A failover leads to a RO state on the source side. If a device suddenly becomes RO from a RW state the reaction of the host can be unpredictable if the device is in use. Hence the suggestion to stop applications, un-mount and deactivation of Volume Groups.
  • 40. SRDF Operations - Failback SRDF Operations – Establish/Restore Ÿ Establish - Resume SRDF operation retaining data from source and overwriting any changed data on target Ÿ Restore - SRDF operation retaining data on target and overwriting any changed data on source
  • 41. SRDF Operations - Split Ÿ Enables read and write operations on both source and target volumes Ÿ Suspends replication EMC CLARiiON MirrorView/A Overview Ÿ Optional storage system software for remote replication on EMC CLARiiON arrays – No host cycles used for data replication Ÿ Provides a remote image for disaster recovery – Remote image updated periodically - asynchronously – Remote image cannot be accessed by hosts while replication is active – Snapshot of mirrored data can be host-accessible at remote site Ÿ Mirror topology (connecting primary array to secondary arrays) – Direct connect and switched FC topology supported – WAN connectivity supported using specialized hardware Mirrorview Terms: Ÿ Primary storage system – Holds the local image for a given mirror Ÿ Secondary storage system – Holds the local image for a given mirror Ÿ Bidirectional mirroring – A storage system can hold local and remote images Ÿ Mirror Synchronization – Process that copies data from local image to remote image Ÿ MirrorView Fractured state – Condition when a Secondary storage system is unreachable by the Primary storage system Mirrorview Configuration: Ÿ MirrorView/A Setup – MirrorView/A software must be loaded on both Primary and Secondary storage system – Remote LUN must be exactly the same size as local LUN – Secondary LUN does not need to be the same RAID type as Primary – Reserved LUN Pool space must be configured Ÿ Management via Navisphere Manager and CLI
  • 42. Consistency Groups allow all LUNs belonging to a given application, usually a database, to be treated as a single entity, and managed as a whole. This helps to ensure that the remote images are consistent, i.e. all made at the same point in time. As a result, the remote images are always restartable copies of the local images, though they may contain data which is not as new as that on the primary images. It is a requirement that all the local images of a Consistency Group be on the same CLARiiON, and that all the remote images for a Consistency Group be on the same remote CLARiiON. All information related to the Consistency Group will be sent to the remote CLARiiON from the local CLARiiON. The operations which can be performed on a Consistency Group match those which may be performed on a single mirror, and will affect all mirrors in the Consistency Group. If, for some reason, an operation cannot be performed on one or more mirrors in the Consistency Group, then that operation will fail, and the images will be unchanged.