SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Web Archiving: The Next Phase
                           in the Evolution of Archiving
by

                                         An Osterman Research White Paper
                                                                       Published November 2010

                                                                                   SPONSORED BY
                                                                         (
                                                                                                                  !
                                                                                                                  !
                                                                                                                  ! !
                                                                                                                  !
                                                                                                                    !

                                                                                                                  !
                                                                                                                  !
                                                                                                                  !!
                                                                                                                  !
                                                                                                                  !

     !


         !"#$!#%&'()*(
                            Osterman Research, Inc. • P.O. Box 1058 • Black Diamond, Washington 98010-1058
                                        Tel: +1 253 630 5839 • Fax: +1 253 458 0934 • info@ostermanresearch.com
                                                                www.ostermanresearch.com • Twitter: @mosterman
                                                                                                                  !
Web Archiving: The Next Phase in the Evolution of Archiving




Executive Summary
OVERVIEW
The web has become the primary communication and commerce channel for businesses
and government agencies. Digital media (web sites and other web-based content) has
all but replaced print media as the primary mode of communication with customers,
constituents, prospects, investors and others. The web is also becoming the primary
channel for transacting business, managing commerce for everything from online
purchases to tax payments.

However, business and governments do not yet understand that they are
liable for everything they publish online. Organizations that do not archive
web content run the risk of not preserving a record of their claims, offers and
other content posted on their web sites. Retaining this content has become
both a legal and regulatory requirement, and so the question is not if web
content should be retained, but only how much and for how long it should be
preserved.

Web archiving has been going on for quite some time, but enterprise-class solutions
have only recently become available. New, state-of-the-art technology is now available
to manage web archiving and it has the power and flexibility to meet existing and
emerging web archiving requirements. As a result, any organization that uses the web
to communicate or manage commerce should consider developing a web archiving
policy and deploy the appropriate technology to support that policy.

KEY TAKEAWAYS
The fundamental message of this white paper is:

•   Web archiving is, without question, a best practice for virtually any organization.
    Organizations that do not archive web content are placing their organizations at
    unnecessary risk from both a legal and regulatory viewpoint, and they are denying
    themselves the use of capabilities that can provide a distinct competitive advantage.

•   Web archiving is fundamentally identical to what many organizations have already
    implemented in the context of email archiving, file archiving and long-term retention
    of other types of important business content. In essence, web archiving is merely a
    superset of traditional types of archiving that are already well established in business
    and government.

•   Many current web archiving technologies are not designed with enterprise-class
    capabilities that will retain web content of evidentiary value.

•   Organizations should consider developing a web archiving policy, particularly as
    more content migrates to the web and web-based applications.




©2010 Osterman Research, Inc.                                                              1
Web Archiving: The Next Phase in the Evolution of Archiving



ABOUT THIS WHITE PAPER
This white paper discusses the importance and benefits of web archiving and various
use cases for it. It also briefly discusses the sponsor of this white paper and their
relevant offerings in the space.



Why the Web Represents the Next Phase of Archiving
WHAT IS WEB ARCHIVING?
Web archiving is what its name implies: the capture and archival storage of web-based
content. This can include individual web pages, entire web sites, content from web 2.0
applications like social networking sites, and other web-based content that is important
to capture and retain, normally for long periods.

The concept of web archiving is not new. For example, the Wayback Machine – a web
archiving service maintained by the non-profit organization Internet Archive based in
San Francisco, California – has been archiving web content since 1996i. However, the
Wayback Machine has several limitations for use in a business context:

•   Web content is captured only periodically, not on a regular basis. This can prevent
    the capture of a large proportion of web content, particularly for sites that update
    content frequently. Further, changes to a web page or web site may not be
    captured if the change occurs between content “snapshots”, the frequency of which
    is determined by Internet Archive.

•   There is no guarantee that all web content will be captured.

•   Web content is not necessarily captured in a way that will satisfy evidentiary rules
    during legal or regulatory proceedings.

As a result, while the Wayback Machine is a good first step toward archiving web
content, more sophisticated – and enterprise-class – web archiving is becoming a
necessity for a growing number of applications, as discussed below.

WHAT DRIVES THE NEED FOR WEB ARCHIVING?
Many of the drivers for web archiving are fundamentally the same as those for email
and other electronic content archiving:

•   Web content can be required for e-discovery and other litigation support
    requirements in much the same way that emails, word processing files, PDF files and
    other content are required.

•   Similarly, web content can be required to demonstrate an organization’s compliance
    (or lack thereof) with regulatory requirements in the context of advertising, forward-
    looking statements, claims of suitability and other content that must – or must not –
    be posted to web sites.




©2010 Osterman Research, Inc.                                                              2
Web Archiving: The Next Phase in the Evolution of Archiving



•   Many organizations have a requirement, often driven by a need to reduce risk or
    maintain adequate records, to preserve web site content as part of their overall
    records retention and records management strategy.

•   Unlike more traditional forms of archiving, web archiving can actually be used as a
    competitive and/or investigative tool to understand content posted on competitors’
    web sites.

WEB ARCHIVING vs. SERVER BACKUPS
There are some significant differences between web backups and web archives:

•   Although both a backup and an archive of a Web site can reproduce content at a
    later date for forensic, e-discovery or data mining purposes, a web archive will do so
    more quickly, more affordably and more easily.

•   Because of the ubiquity of database-driven web sites, a backup must retain archives
    of all of the files, as well as all of the databases that control the web site.

•   Searching through backups of a web site is much more difficult and more time-
    consuming than searching through an archive.

WEB ARCHIVING: THE NEXT STEP
Web archiving can rightly be considered the next logical extension of an organization’s
traditional archiving of email, files and other electronic content. While email and other
types of electronic content archiving tend to focus on internal content – emails sent to
and from employees and business, word processing files and presentations created for
internal uses, and so forth – web archiving trends to focus much more on publicly
available content. Because the web – including static web sites, web applications, social
networking content, etc. – is primarily public-facing in nature, web archiving focuses
primarily on content that the public has already seen or has had the opportunity to see.

As a result, web archiving is focused to a greater degree than traditional electronic
content archiving on issues like brand protection; reputation management; policy
enforcement; protection of content based on when it is created, posted and taken
down; business continuity and corporate memory.



Archiving Is Already an Established Best Practice
THE WEB IS GROWING RAPIDLY
The amount of content on the web has ballooned exponentially in recent years. For
example, as of December 2009, there were 234 million web sites, 47 million of which
were added just in 2009ii - an average of nearly 129,000 web sites added every day.
Further, even as far back as 2008 there were well in excess of one trillion unique URLs
on the web and the number continues to grow at a rapid pace.

Growth of the web is being driven by a number of factors, including the ubiquity of web
access, the ease and low cost with which content can be published and updated, and

©2010 Osterman Research, Inc.                                                             3
Web Archiving: The Next Phase in the Evolution of Archiving



greater cultural acceptance of the web as a medium of information-sharing and
commerce. For these reasons, both business and government are increasingly reliant on
the web as their primary means of communications and process management.

Consequently, the market for web archiving – as well as archiving of email, files,
SharePoint content and other information – is growing at a healthy pace. Web
archiving, currently a small segment of the total content archiving market, is poised to
become an enormous area of growth, driven by the issues discussed in this white paper.

GROWTH IN THE MARKET IS DRIVEN BY A VARIETY OF FACTORS
For just about any company, government agency or educational institution, there are
four primary drivers for archiving their electronic content. However, the importance of
these drivers will vary by an organization’s size, the industry(ies) in which it participates,
the advice of its internal and external legal counsel or compliance officers, and the
locales in which it operates:

•   Driver #1: Litigation
    Electronic content stores, including web sites, contain a growing proportion of
    business records that must be preserved for long periods of time. Further, this
    content is frequently requested during discovery proceedings because of the Federal
    Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is
    critical that all relevant electronic content be made available for e-discovery
    purposes.

    Further, when a hold on data is required, it is imperative that an organization
    immediately be able to begin preserving all relevant data. For example, if a dispute
    arises because of a claim made on a page of a company’s web site, that content
    must be preserved for as long as a court, regulator or other authorized entity may
    deem necessary. An enterprise-class web archiving system allows organizations to
    immediately place a hold on data when requested by a court or on the advice of
    legal counsel.

    If an organization is not able to adequately place a hold on data when it is obligated
    to do so, it can suffer a variety of serious consequences, ranging from
    embarrassment to major legal sanctions or heavy fines. Litigants that fail to
    preserve electronic content properly are subject to a wide variety of consequences,
    including brand damage, additional costs for third-parties to review or search for
    data, court sanctions, directed verdicts or instructions to a jury that it can view a
    defendant’s failure to produce data as evidence of culpability.

    In addition to the e-discovery and legal hold benefits, an enterprise-class web
    archiving system allows an organization to perform either formal or informal early
    case assessment activities. For example, if a customer makes a claim against a
    company based on a statement made on the company’s web site, senior managers
    can search the archive for information that will help them determine the potential
    liability they face. If this assessment of the potential lawsuit results in a
    determination that the company was indeed wrong in making the claim, they can
    instruct legal counsel to pursue a quick legal settlement. If, on the other hand, the


©2010 Osterman Research, Inc.                                                                4
Web Archiving: The Next Phase in the Evolution of Archiving



    assessment results in the discovery of information that supports the company’s
    position, that information can be used to convince the customer to drop the case or
    it can help win the case if it goes to trial. In either case, an archiving system can
    help the organization to understand its position early on, either avoiding
    unnecessary legal fees or an adverse judgment, or reducing its costs by proving the
    sufficiency of its case.

•   Driver #2: Regulatory Compliance
    For just about every organization, there are a large and growing number of
    regulatory obligations to preserve electronic content. Some of the more important
    requirements are:

    o   Sarbanes-Oxley Act of 2002
        The Sarbanes-Oxley Act of 2002 requires all public companies and their auditors
        to retain such relevant records as audit workpapers, memoranda,
        correspondence and electronic records for a period of seven years. Further,
        Section 403 of Sarbanes-Oxley amended Section 16 of the Securities and
        Exchange Act of 1934 to include a requirement for public companies to post
        certain types of content on their web sites.

        Under Sarbanes-Oxley, company officers are obliged to report internal controls
        and procedures for financial reporting and auditors are required to test the
        internal control structures. Businesses have to ensure that information is
        preserved – whether paper or electronic – that would be relevant to the
        company’s financial reporting.

    o   Health Insurance Portability and Accountability Act of 1996 (HIPAA)
        All organizations operating in the healthcare field need to comply with HIPAA to
        ensure the safety of Protected Health Information. Organizations are required to
        protect the data from unauthorized users, as well as to retain for six years a
        broad range of documentation regarding their compliance.

        As part of the American Recovery and Reinvestment Act of 2009 (ARRA), the
        provisions of HIPAA have been significantly expanded. A key component of
        ARRA is the Health Information Technology for Economic and Clinical Health Act
        (HITECH). Now, business partners of entities already covered by HIPAA, such as
        pharmacies, healthcare providers and others, are required to comply with HIPAA
        provisions. This includes attorneys, accounting firms, external billing companies
        and others that do business with covered entities. While these business
        associates were accountable to the covered entities with which they did business
        under the old HIPAA, these associates are now liable for governmental penalties
        under the new law.

        HIPAA violations have been expanded dramatically. For example, if a covered
        entity or one of their business associates loses 500 or more patient records, it
        must notify HHS and a “prominent media outlet” to let them know what has
        occurred. Section 13402 of HITECH requires that if a “covered entity has
        insufficient or out-of-date contact information for 10 or more individuals, the


©2010 Osterman Research, Inc.                                                               5
Web Archiving: The Next Phase in the Evolution of Archiving



        covered entity must provide substitute individual notice by either posting the
        notice on the home page of its web site or by providing the notice in major print
        or broadcast media where the affected individuals likely reside.”

        Fines for HIPAA violations can now reach as high as $1.5 million per calendar
        year.

    o   Securities and Exchange
                                                           Recent FINRA Disciplinary
        Commission Rules
                                                           Actions Related to Web Content
        Members of national securities
        exchanges, brokers and dealers are                 • An individual posted false and
        obliged to preserve all records for a                misleading information on a
        minimum of six years, the first two years            Google Finance bulletin board
        in an easily accessible place (SEC Rule              relating to securities recomm-
        17a-4). The affected records are broad               endations. The posting contained
        and encompass originals of                           predictions and projections of
        communications generated and received                future prices for the securities that
        by individuals within financial institutions,        were recommended, but the
                                                             posting was made without
        including inter-office memoranda and
                                                             approval. FINRA fined the
        internal audit working papers. Also
                                                             individual $10,000 and suspended
        included are automated messages sent to              him from associating with any
        all customers, which could include email             FINRA member for six months.
        blasts. The records may be "immediately
        produced or reproduced on 'micrographic            • A company made false and
        media' [microfilm, microfiche or similar]            misleading statements on its web
        or by means of 'electronic storage media'.           site related to low cost
        As noted above the Securities and                    commission rates and direct
        Exchange Act of 1934 has been amended                access to traders. The company
                                                             was censured and fined $20,000.
        to specifically include the requirement to
        post certain types of content on the web.
                                                           • An affiliate of a company
                                                             participated in and won CD
    o   Financial Industry Regulatory                        auctions without disclosing it was
        Authority (FINRA)                                    an auction participant. Further,
        FINRA is a non-governmental regulator                the advertising materials used
        formed in 2007 by the merger of various              contained misleading,
        functions of the New York Stock                      unwarranted and exaggerated
        Exchange and the National Association of             statements, and published
        Securities Dealers. FINRA manages a                  misleading market clearing yields
        wide variety of rules that are imposed               on its web site. The company was
                                                             found to have violated Rule 2210
        upon the more than 5,000 brokerage
                                                             and fined $225,000.
        firms and nearly 675,000 registered
        representatives it oversees.

        FINRA requires that various types of communications with the public must be
        filed prior to their use, including content that often would be posted on web
        sitesiii. This includes CMO advertisements, sales literature and investment
        analysis tools.



©2010 Osterman Research, Inc.                                                                        6
Web Archiving: The Next Phase in the Evolution of Archiving



    o   Model Requirements for the Management of Electronic Records
        (MoReq)
        MoReq is a specification, originally developed in 2001, that defines the functional
        requirements for the manner in which electronic records are managed in an
        Electronic Records Management System. MoReq has been used widely in
        Europe and has been updated with MoReq2.

    o   Other requirements
        A small sampling of the many other requirements for data retention are FINRA
        3010, the Investment Advisors Act of 1940 (hedge funds), the Gramm-Leach-
        Bliley Act, IDA 29.7, FDA 21 CFR Part 11, OCC Advisory, the Financial
        Modernization Act 1999, Medicare Conditions of Participation, the Fair Labor
        Standards Act, the Americans with Disabilities Act, the Toxic Substances Control
        Act, the UK Companies Act, the UK Company Law Reform Bill - Electronic
        Communications, the UK Combined Code on Corporate Governance 2003, the UK
        Human Rights Act, Basel II, and the Markets in Financial Instruments Directive.

•   Driver #3: Knowledge Management and Data Mining
    There is an enormous amount of useful content that is posted to a company’s own
    web site or other sites. This includes identifying and extracting information about
    companies’ products, their public financial information, their participation in trade
    shows and a wealth of other types of content. Applications for this information
    include competitive analysis, determination of compliance with various statutes,
    performing analytics to determine at what time of year certain events take place,
    and so on.

•   Driver #4: Maintain Corporate Memory
    Web archiving can be very useful for maintaining a corporate record of what has
    been posted to a web site, how long this content was maintained or when it was
    replaced. For example, a company may want a record of its web site for historical
    purposes, or it may need an archive in order to re-use some of its content at a later
    date. Maintaining an accurate archive of web content can significantly reduce the
    costs associated with recreating this content.



The Consequences of Not Archiving Web Content
The vast majority of organizations do not adequately archive their web content and they
face a number of risks from not doing so:

•   Increased risk in legal disputes
    An inability to produce past content from web sites – as with any electronic content
    – carries with it increased risk during legal actions. This includes an inability to
    produce time-stamped copies of web pages that will be admissible in court, an
    inability to respond to e-discovery requests when specific web content is required,
    and an inability to place legal holds on data so that existing web content is not
    overwritten when a legal dispute has been initiated or is anticipated.



©2010 Osterman Research, Inc.                                                               7
Web Archiving: The Next Phase in the Evolution of Archiving



•   Risk of non-compliance with regulatory obligations
    Many heavily regulated organizations, such as broker-dealers, have specific
    obligations to make (or not make) statements or claims on their web site. For
    example, FINRA Rule 2210 requires broker-dealers to archive their institutional
    communications, retail communications and correspondence. Because advertising
    and other public-facing communications often appear on regulated entities’ web
    sites, it is critical that web content is archived.

•   Loss of context for notices, marketing messages, etc.
    An organization that is not able to archive its web content cannot easily provide the
    context for its various web-based marketing messages and other communications.
    The use of this otherwise lost historical data can help a company keep track of past
    marketing campaigns, offers, policy statements, notifications to the public and a
    wide range of other content.

•   An inability to prove when statements were made or retracted
    Similarly, not archiving web content makes it very difficult to prove exactly when
    content was posted or removed from a web site or web page. For example, if a
    press release is embargoed until a certain date and time, a web archiving system
    can demonstrate exactly when the content was posted, and conversely can prove
    that the content was not posted before the embargo had been lifted.

    Another example is that of warning letters issues by the US Food and Drug
    Administration. These letters warn pharmaceutical manufacturers and other
    regulated companies about misleading statements, missing information and other
    claims. As but one of the many examples of such letters is an October 18, 2010
    letteriv to a pharmaceutical company, in which it was advised that two of its web
    pages discussing a magnetic resonance imaging contrast media it produces “omits
    important information about the approved indication for [the product], and both
    webpages misleadingly suggest unapproved new uses for the drugs.”

    Maintaining a web archive is critical to ensuring that an accurate record of content
    can be preserved and demonstrated when required.

•   Loss of digital heritage/corporate memory
    When web content is not archived, a significant proportion of an organization’s
    digital heritage – or corporate memory – simply disappears. Preservation of this
    content is important on a number of levels – legal, regulatory, productivity, etc. –
    but also because it represents something of the corporate history of the firm in the
    form of announcements to the public and other content that constitutes an
    organization’s digital record.

•   An inability to gauge the effectiveness of web campaigns
    Some organizations use their web site extensively to present marketing campaigns,
    post notifications of sales or special offers, and announce promotions of various
    types. If an organization cannot accurately archive its web content, it is at a
    disadvantage when attempting to correlate customer activity like sales calls or web
    inquiries with the specific timing of announcements and other web content.


©2010 Osterman Research, Inc.                                                               8
Web Archiving: The Next Phase in the Evolution of Archiving



•   Productivity and monetary loss from recreating unarchived content
    If web content is not archived and must be recreated, there can be significant time
    and money lost by those who created the original content, those who must code the
    content anew, etc. A web archive can, therefore, make various types of employees
    more efficient and save the organization money by allowing web content to be easily
    discovered and reused.



There Are Many Use Cases for Web Archiving
There is a large and diverse set of use cases for web archiving, some examples of which
are discussed below:

•   Facilitating regulatory compliance
    There is a wide range of applications for web archiving in the context of regulatory
    compliance. For example, state consumer protection agencies, the Federal Trade
    Commission, various watchdog groups and similar organizations worldwide have an
    interest in monitoring the claims, advertising, marketing messages and other content
    posted by companies on their web sites. Archiving web content from these
    organizations is crucial to monitoring their compliance with various regulations and
    statutes. As but one example of the myriad such compliance obligations that exist is
    the aforementioned FINRA Rule 2210, a set of compliance obligations imposed on
    broker-dealers and certain others in the financial services industry to advertise their
    services accurately.

    Similarly, government agencies have obligations with regard to state sunshine and
    freedom-of-information laws to provide content to citizens upon demand. Archiving
    of web content posted on government-operated web sites is key to helping
    government agencies fulfill their obligations under these requirements.

•   Checking web content for copyright violations
    Web archiving can be extremely useful in capturing content from various sources on
    the web and then searching that content for potential violations of copyright. For
    example, a major US-based men’s magazine uses the Wayback Machine roughly
    every month to search for content on the web that might be using its trademarked
    logo or other content, particularly its published images. As noted above, while the
    Wayback Machine offers some utility for this type of application, an enterprise-class
    web archiving capability can provide timelier and more complete information, not to
    mention the ability to accurately determine when content was posted and deleted
    from web pages. This can be particularly important in cases where a violator takes
    down content after receiving notice of a legal action by a copyright holder – an
    inability to prove exactly when the content was taken down can undermine a legal
    case.

    An important case in this regard was Innervision Web Solutions’ use of the domain
    name “DellComputersSuck.com”. Because Dell contended that Innervision had used
    the domain name to redirect visitors to the Innervision web site for commercial gain,
    and because they were able to prove this based on archived web content, Dell was


©2010 Osterman Research, Inc.                                                               9
Web Archiving: The Next Phase in the Evolution of Archiving



    able to have this domain transferred to its ownership because Innervision was found
    to have registered the domain in bad faith.

•   Proving the bona fides of expert witnesses
    The Federal Rules of Civil Procedure, Rule 26 requires that expert witnesses whose
    testimony is introduced during legal proceedings offer “the witnesses’ qualifications,
    including a list of all publications authored in the previous 10 years.” Because a
    growing proportion of many such experts’ publications are electronic in nature, such
    as blog posts or other web-based content, it is increasingly important for this
    content to be available to all parties during a legal proceeding.

    From the perspective of the litigating party that has not hired an expert witness, it is
    particularly important to be able to access web archives of all of the content offered
    by that witness. For example, if a litigant can access content older than 10 years, or
    if they can uncover an obscure blog post that might be contrary to the testimony
    offered in court, this may prove to be extremely valuable.

•   Demonstrating the veracity of electronic content
    In Vinhnee v. American Express, the defendant owed American Express in excess of
    $40,000 and the company sued to recover. Although American Express presented
    records of the defendant’s monthly statements, the company could not demonstrate
    the authenticity of these records and so lost the case, even after an appeal.

    In another case, Janssen-Ortho Inc. v. Novopharm Limited, an affidavit was
    presented that contained the link to a home page, but it did not include a copy of
    the page contents. The Federal Court in Canada that heard the case did not accept
    this affidavit, finding it to represent insufficient evidence.

    In both cases, a web archiving capability that could demonstrate the veracity of the
    information presented, along with verifiable time and date stamping, would likely
    have enabled the losing party to win its case.

•   Performing marketing analysis
    A web archiving capability can be very useful when researching various types of
    marketing messages as part of a promotional campaign, even when this research is
    about a competitor. For example, a hotel chain may wish to archive the web
    content of its three leading competitors to determine when specific messages were
    posted to the web and when they were taken down. This information can then be
    correlated with sales data, marketing reports and other information to determine
    which messages were most or least effective.

•   Conducting research
    A web archiving capability can be extraordinarily useful in a wide range of research
    applications, such as a journalist exploring the positions of a political candidate prior
    to conducting an interview, a customer researching exactly when a company’s stated
    policy was first posted to its web site or when it was withdrawn, a human resources
    staffer investigating the statements made to a blog post or Facebook wall by a
    prospective employee, or when and where information about a trade secret was first


©2010 Osterman Research, Inc.                                                              10
Web Archiving: The Next Phase in the Evolution of Archiving



    posted to the web, to name but a few of the tens of thousands of potential use
    cases for web archiving focused on research.

THE BOTTOM LINE
While there are a variety of applications for web archiving technology, the bottom line is
that web content must be preserved for the same reasons that email and other
electronic content must be archived. This was summarized in a landmark court decisionv
in which the presiding judge wrote, “This Court sees no reason to treat websites
differently than other electronic files.”



Key Issues in Selecting a Web Archiving Vendor
There are a number of features, functions and capabilities that decision makers should
consider as they evaluate web archiving solutions. Among these are the following:

BREADTH OF WEB CONTENT ARCHIVING
A web archiving solution should be able to archive a wide variety of content, from
individual web pages to entire web sites. This should also include social media pages,
RSS feeds, blogs and any other content that might be required for e-discovery, research
or other uses.

SUPPORT FOR A WIDE RANGE OF TECHNOLOGIES
A wide and growing variety of technologies are used on the web, including Adobe Flash,
AJAX, Javascript, PHP, various image formats (JPG, PNG, GIF, etc.), video content and
other formats. Any web archiving technology must be able to archive all of these
technologies. Further, it must accommodate new technologies as they become
available.

FLEXIBILITY OF ARCHIVING
A web archiving platform must also provide flexibility in the timing of archiving. Unlike
email or file archiving that is driven by the creation of discrete emails or files, web
archiving is based on specific timing requirements. For example, a web archive should
be able to archive all necessary web content at regular intervals, on a one-off basis,
automatically, manually, etc. In short, a web archiving platform must be able to archive
web content whenever it is required.

ANALYSIS AND REPORTING TOOLS
Web archiving capabilities should also provide robust analysis and reporting tools so that
content can be analyzed for purposes of e-discovery, litigation support, regulatory
compliance, marketing analysis or other purposes; or for purposes of reporting high-
level results to senior managers. For example, senior counsel may want to analyze an
entire web site’s contents over a particular date range for a set of keywords that may be
required as part of an early case assessment exercise. Or, a marketing manager may
want to search a competitor’s blog over the past year to search for instances of business
partners being mentioned. Analysis tools will ideally support the creation of charts to
aid in the analysis of trends, such as comparisons of web content over time.


©2010 Osterman Research, Inc.                                                            11
Web Archiving: The Next Phase in the Evolution of Archiving



INTEGRATION WITH EXISTING SYSTEMS
Web archiving capabilities should integrate with other systems in place in the
organization, including analysis tools, existing archiving systems for email, etc. The
ability to integrate with these systems will make searching and analyzing web content
easier and more efficient, and will allow organizations to respond more quickly to time-
sensitive requests. Further, integration with existing systems will allow data to be
analyzed without users learning a new tool, interface, etc.

DELIVERY MODELS
A web archiving platform should support a flexible delivery model. While many
organizations prefer an on-premise solution that can be managed completely behind the
corporate firewall, a growing number of organizations are opting for cloud-based
solutions that are completely managed by a third-party service provider.

FISMA-COMPLIANCE FOR FEDERAL GOVERNMENT CUSTOMERS
The Federal Information Security Management Act of 2002 (FISMA) requires US federal
agencies to create, implement and document an information security program to
support their information management goals. A key goal of FISMA is the archiving of
information assets, including web sites. Consequently, a best practice focused on FISMA
compliance will include regular capture of all relevant web site content, including secure,
long-term storage of this content.

ABILITY TO PERFORM FULL-TEXT/CONTENT SEARCHING
Another important feature of any web archiving solution is the ability to search for
content using full-text/searching capabilities. This is particularly important when
searching for specific keywords or phrases during an e-discovery or similar exercise in
much the same way that this type of search is critical for any other type of archived
content, such as email or files.

USE OF ORGANIZATIONAL TOOLS
Organizational tools are also a very useful feature for a web archiving system because it
allows reviewers to organize content for subsequent searches. For example, the ability
to organize content into folders, tag specific sections or pages for later review, or add
notes to pages or sections is very helpful for paralegals who are scouring archived web
content for later and more thorough review by senior counsel.

ABILITY TO COLLABORATE USING ARCHIVED CONTENT
Finally, it is important that any web archiving capability allow users to collaborate based
on this archived content. Just as with email or other types of content archiving, teams
of individuals will normally work on large cases involving archived web content and their
ability to collaborate is essential.




©2010 Osterman Research, Inc.                                                             12
Web Archiving: The Next Phase in the Evolution of Archiving




Conclusion: Consider Web Archiving
Because the web continues to grow in importance for both business and government as
a medium for communication and commerce, archiving of web content should become
an essential element of any organization’s risk mitigation and compliance strategy. As a
result, organizations should seriously consider developing a web archiving policy and
deploying technology that can support this policy.



About the Sponsor of This White Paper
ABOUT REED TECHNOLOGY
Reed Technology & Information Services (RTIS) offers the Reed Tech Web Archiving
Service for corporate enterprises, government, and professional services companies.

Reed Tech has been providing clients with information capture, conversion,
management, distribution and transformation services for almost 50 years. Reed Tech’s
clients include large government agencies like the U.S. Patent & Trademark Office, a
wide range of pharmaceutical and other life sciences companies, and law firms of all
sizes.

Reed Tech is a wholly-owned subsidiary of Reed Elsevier, an $8b global provider of
professional information and online workflow solutions in the Science, Medical, Legal,
and Risk and Business sectors. With almost 1,000 full time employees, Reed Tech
reports in through LexisNexis, a leading global provider of content-enabled workflow
solutions to professionals in law firms, corporations, government, law enforcement, tax,
accounting, academic institutions and risk and compliance assessment.

ABOUT ITERASI
Iterasi Inc. - creates enterprise-class web archiving technology applications specifically
for regulatory compliance, litigation protection, and e-discovery. Pete Grillo, CEO,
founded the company in 2007.




©2010 Osterman Research, Inc.                                                                13
Web Archiving: The Next Phase in the Evolution of Archiving




© 2010 Osterman Research, Inc. All rights reserved.

No part of this document may be reproduced in any form by any means, nor may it be distributed without the permission
of Osterman Research, Inc., nor may it be resold or distributed by any entity other than Osterman Research, Inc., without
prior written authorization of Osterman Research, Inc.

Osterman Research, Inc. does not provide legal advice. Nothing in this document constitutes legal advice, nor shall this
document or any software product or other offering referenced herein serve as a substitute for the reader’s compliance
with any laws (including but not limited to any act, statue, regulation, rule, directive, administrative order, executive
order, etc. (collectively, “Laws”)) referenced in this document. If necessary, the reader should consult with competent
legal counsel regarding any Laws referenced herein. Osterman Research, Inc. makes no representation or warranty
regarding the completeness or accuracy of the information contained in this document.

THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND.          ALL EXPRESS OR IMPLIED
REPRESENTATIONS, CONDITIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE
DETERMINED TO BE ILLEGAL.

i
      http://www.archive.org/about/faqs.php#The_Wayback_Machine
ii
      http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/
iii
      Filing Communications for FINRA Review Webcast
iv
      http://www.fda.gov/ICECI/EnforcementActions/WarningLetters/ucm230796.htm
v
      Arteria Prop. Pty Ltd. v. Universal Funding V.T.O., Inc., 2008 WL 4513696 (D.N.J. Oct. 1, 2008)



©2010 Osterman Research, Inc.                                                                                         14

Más contenido relacionado

Destacado

Uso del ciclo lunar en la siembra
Uso del ciclo lunar en la siembraUso del ciclo lunar en la siembra
Uso del ciclo lunar en la siembraJesús Eliécer
 
Nutrition for Tissue Rejuvenation
Nutrition for Tissue RejuvenationNutrition for Tissue Rejuvenation
Nutrition for Tissue RejuvenationDrCate.com
 
план методработы 14 15
план методработы 14 15план методработы 14 15
план методработы 14 15Demanessa
 
Презентація до комітетських слухань
Презентація до комітетських слуханьПрезентація до комітетських слухань
Презентація до комітетських слуханьOlena Ursu
 
методичка для гиа
методичка для гиаметодичка для гиа
методичка для гиаDemanessa
 
(Eng) part i of ccs ppt
(Eng) part i of ccs ppt(Eng) part i of ccs ppt
(Eng) part i of ccs pptOlena Ursu
 
ресурс нпо и рег. рынок труда
ресурс нпо и рег. рынок трударесурс нпо и рег. рынок труда
ресурс нпо и рег. рынок трудаDemanessa
 
О Погорєлов "Інструменти для візуалізації"
О Погорєлов "Інструменти для візуалізації"О Погорєлов "Інструменти для візуалізації"
О Погорєлов "Інструменти для візуалізації"Olena Ursu
 
Jawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'ud
Jawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'udJawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'ud
Jawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'udImran
 
Africa, The Next Frontier article by C. Wynne-Potts
Africa, The Next Frontier article by C. Wynne-PottsAfrica, The Next Frontier article by C. Wynne-Potts
Africa, The Next Frontier article by C. Wynne-Pottsallana624
 
Connecting Ideas and Summing Up
Connecting Ideas and Summing UpConnecting Ideas and Summing Up
Connecting Ideas and Summing UpAira Grace Atabay
 
Rus session 3 presentation v1_final_mk
Rus session 3 presentation v1_final_mkRus session 3 presentation v1_final_mk
Rus session 3 presentation v1_final_mkOlena Ursu
 

Destacado (14)

Uso del ciclo lunar en la siembra
Uso del ciclo lunar en la siembraUso del ciclo lunar en la siembra
Uso del ciclo lunar en la siembra
 
Nutrition for Tissue Rejuvenation
Nutrition for Tissue RejuvenationNutrition for Tissue Rejuvenation
Nutrition for Tissue Rejuvenation
 
план методработы 14 15
план методработы 14 15план методработы 14 15
план методработы 14 15
 
Solidarizarte ppt
Solidarizarte ppt Solidarizarte ppt
Solidarizarte ppt
 
Презентація до комітетських слухань
Презентація до комітетських слуханьПрезентація до комітетських слухань
Презентація до комітетських слухань
 
методичка для гиа
методичка для гиаметодичка для гиа
методичка для гиа
 
(Eng) part i of ccs ppt
(Eng) part i of ccs ppt(Eng) part i of ccs ppt
(Eng) part i of ccs ppt
 
ресурс нпо и рег. рынок труда
ресурс нпо и рег. рынок трударесурс нпо и рег. рынок труда
ресурс нпо и рег. рынок труда
 
О Погорєлов "Інструменти для візуалізації"
О Погорєлов "Інструменти для візуалізації"О Погорєлов "Інструменти для візуалізації"
О Погорєлов "Інструменти для візуалізації"
 
Jawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'ud
Jawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'udJawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'ud
Jawapan kpd artikel cinta asyura dan karbala - ustaz aizam mas'ud
 
Africa, The Next Frontier article by C. Wynne-Potts
Africa, The Next Frontier article by C. Wynne-PottsAfrica, The Next Frontier article by C. Wynne-Potts
Africa, The Next Frontier article by C. Wynne-Potts
 
Connecting Ideas and Summing Up
Connecting Ideas and Summing UpConnecting Ideas and Summing Up
Connecting Ideas and Summing Up
 
Ankle joint
Ankle jointAnkle joint
Ankle joint
 
Rus session 3 presentation v1_final_mk
Rus session 3 presentation v1_final_mkRus session 3 presentation v1_final_mk
Rus session 3 presentation v1_final_mk
 

Similar a Web Archiving Whitepaper

Messaging best practices for 2011
Messaging best practices for 2011Messaging best practices for 2011
Messaging best practices for 2011Actiance, Inc.
 
Creating Sustainable Website Processes
Creating Sustainable Website ProcessesCreating Sustainable Website Processes
Creating Sustainable Website ProcessesNatalie Semczuk
 
Making File Transfer Easier, Compliant and More Secure
Making File Transfer Easier, Compliant and More SecureMaking File Transfer Easier, Compliant and More Secure
Making File Transfer Easier, Compliant and More SecureOsterman Research, Inc.
 
Web 2.0 Components for Business Websites
Web 2.0 Components for Business WebsitesWeb 2.0 Components for Business Websites
Web 2.0 Components for Business WebsitesGems Solutions
 
Reprotec uk presentation
Reprotec uk presentationReprotec uk presentation
Reprotec uk presentationRichard Kelly
 
Re modelling museum collections for digital content phm2008
Re modelling museum collections for digital content phm2008Re modelling museum collections for digital content phm2008
Re modelling museum collections for digital content phm2008Geoff Barker
 
Digital Business Model 20182 W9 Lean Startup
Digital Business Model 20182 W9 Lean StartupDigital Business Model 20182 W9 Lean Startup
Digital Business Model 20182 W9 Lean StartupAnton Herutomo
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...lisbk
 
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 ConferenceStructuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 ConferenceNick Inglis
 
MS PowerPoint format
MS PowerPoint formatMS PowerPoint format
MS PowerPoint formatwebhostingguy
 
Wydeinfo.com
Wydeinfo.comWydeinfo.com
Wydeinfo.comwydeinfo
 
Driving Visitor Engagement - Internet World 2014 presentation by James Bloor
Driving Visitor Engagement - Internet World 2014 presentation by James BloorDriving Visitor Engagement - Internet World 2014 presentation by James Bloor
Driving Visitor Engagement - Internet World 2014 presentation by James BloorDistinction
 
‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...
‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...
‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...Beacon
 
How To Evaluate A Website With The 2QCV2Q Model
How To Evaluate A Website With The 2QCV2Q ModelHow To Evaluate A Website With The 2QCV2Q Model
How To Evaluate A Website With The 2QCV2Q ModelMarcello Brivio
 
Content Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL ProfilerContent Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL ProfilerGoInflow
 
JISC PoWR poster
JISC PoWR posterJISC PoWR poster
JISC PoWR posterlisbk
 
Introduction to the Assembled Web - Experian Forward
Introduction to the Assembled Web - Experian ForwardIntroduction to the Assembled Web - Experian Forward
Introduction to the Assembled Web - Experian ForwardJohn Eckman
 
Sitecore user-grop-cardiff-31oct-v003-final
Sitecore user-grop-cardiff-31oct-v003-finalSitecore user-grop-cardiff-31oct-v003-final
Sitecore user-grop-cardiff-31oct-v003-finalfusionworkshop
 

Similar a Web Archiving Whitepaper (20)

Messaging best practices for 2011
Messaging best practices for 2011Messaging best practices for 2011
Messaging best practices for 2011
 
Creating Sustainable Website Processes
Creating Sustainable Website ProcessesCreating Sustainable Website Processes
Creating Sustainable Website Processes
 
Making File Transfer Easier, Compliant and More Secure
Making File Transfer Easier, Compliant and More SecureMaking File Transfer Easier, Compliant and More Secure
Making File Transfer Easier, Compliant and More Secure
 
Web 2.0 Components for Business Websites
Web 2.0 Components for Business WebsitesWeb 2.0 Components for Business Websites
Web 2.0 Components for Business Websites
 
SharePoint WCM 2013
SharePoint WCM 2013SharePoint WCM 2013
SharePoint WCM 2013
 
Reprotec uk presentation
Reprotec uk presentationReprotec uk presentation
Reprotec uk presentation
 
Re modelling museum collections for digital content phm2008
Re modelling museum collections for digital content phm2008Re modelling museum collections for digital content phm2008
Re modelling museum collections for digital content phm2008
 
Digital Business Model 20182 W9 Lean Startup
Digital Business Model 20182 W9 Lean StartupDigital Business Model 20182 W9 Lean Startup
Digital Business Model 20182 W9 Lean Startup
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...
 
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 ConferenceStructuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
 
MS PowerPoint format
MS PowerPoint formatMS PowerPoint format
MS PowerPoint format
 
Wydeinfo.com
Wydeinfo.comWydeinfo.com
Wydeinfo.com
 
Driving Visitor Engagement - Internet World 2014 presentation by James Bloor
Driving Visitor Engagement - Internet World 2014 presentation by James BloorDriving Visitor Engagement - Internet World 2014 presentation by James Bloor
Driving Visitor Engagement - Internet World 2014 presentation by James Bloor
 
‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...
‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...
‘No Results’ No More! Practical Strategies the Pros Use to Improve Site Searc...
 
How To Evaluate A Website With The 2QCV2Q Model
How To Evaluate A Website With The 2QCV2Q ModelHow To Evaluate A Website With The 2QCV2Q Model
How To Evaluate A Website With The 2QCV2Q Model
 
Content Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL ProfilerContent Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL Profiler
 
JISC PoWR poster
JISC PoWR posterJISC PoWR poster
JISC PoWR poster
 
Introduction to the Assembled Web - Experian Forward
Introduction to the Assembled Web - Experian ForwardIntroduction to the Assembled Web - Experian Forward
Introduction to the Assembled Web - Experian Forward
 
Sitecore user-grop-cardiff-31oct-v003-final
Sitecore user-grop-cardiff-31oct-v003-finalSitecore user-grop-cardiff-31oct-v003-final
Sitecore user-grop-cardiff-31oct-v003-final
 
WIPAC Monthly August 2018
WIPAC Monthly  August 2018WIPAC Monthly  August 2018
WIPAC Monthly August 2018
 

Web Archiving Whitepaper

  • 1. Web Archiving: The Next Phase in the Evolution of Archiving by An Osterman Research White Paper Published November 2010 SPONSORED BY ( ! ! ! ! ! ! ! ! !! ! ! ! !"#$!#%&'()*( Osterman Research, Inc. • P.O. Box 1058 • Black Diamond, Washington 98010-1058 Tel: +1 253 630 5839 • Fax: +1 253 458 0934 • info@ostermanresearch.com www.ostermanresearch.com • Twitter: @mosterman !
  • 2. Web Archiving: The Next Phase in the Evolution of Archiving Executive Summary OVERVIEW The web has become the primary communication and commerce channel for businesses and government agencies. Digital media (web sites and other web-based content) has all but replaced print media as the primary mode of communication with customers, constituents, prospects, investors and others. The web is also becoming the primary channel for transacting business, managing commerce for everything from online purchases to tax payments. However, business and governments do not yet understand that they are liable for everything they publish online. Organizations that do not archive web content run the risk of not preserving a record of their claims, offers and other content posted on their web sites. Retaining this content has become both a legal and regulatory requirement, and so the question is not if web content should be retained, but only how much and for how long it should be preserved. Web archiving has been going on for quite some time, but enterprise-class solutions have only recently become available. New, state-of-the-art technology is now available to manage web archiving and it has the power and flexibility to meet existing and emerging web archiving requirements. As a result, any organization that uses the web to communicate or manage commerce should consider developing a web archiving policy and deploy the appropriate technology to support that policy. KEY TAKEAWAYS The fundamental message of this white paper is: • Web archiving is, without question, a best practice for virtually any organization. Organizations that do not archive web content are placing their organizations at unnecessary risk from both a legal and regulatory viewpoint, and they are denying themselves the use of capabilities that can provide a distinct competitive advantage. • Web archiving is fundamentally identical to what many organizations have already implemented in the context of email archiving, file archiving and long-term retention of other types of important business content. In essence, web archiving is merely a superset of traditional types of archiving that are already well established in business and government. • Many current web archiving technologies are not designed with enterprise-class capabilities that will retain web content of evidentiary value. • Organizations should consider developing a web archiving policy, particularly as more content migrates to the web and web-based applications. ©2010 Osterman Research, Inc. 1
  • 3. Web Archiving: The Next Phase in the Evolution of Archiving ABOUT THIS WHITE PAPER This white paper discusses the importance and benefits of web archiving and various use cases for it. It also briefly discusses the sponsor of this white paper and their relevant offerings in the space. Why the Web Represents the Next Phase of Archiving WHAT IS WEB ARCHIVING? Web archiving is what its name implies: the capture and archival storage of web-based content. This can include individual web pages, entire web sites, content from web 2.0 applications like social networking sites, and other web-based content that is important to capture and retain, normally for long periods. The concept of web archiving is not new. For example, the Wayback Machine – a web archiving service maintained by the non-profit organization Internet Archive based in San Francisco, California – has been archiving web content since 1996i. However, the Wayback Machine has several limitations for use in a business context: • Web content is captured only periodically, not on a regular basis. This can prevent the capture of a large proportion of web content, particularly for sites that update content frequently. Further, changes to a web page or web site may not be captured if the change occurs between content “snapshots”, the frequency of which is determined by Internet Archive. • There is no guarantee that all web content will be captured. • Web content is not necessarily captured in a way that will satisfy evidentiary rules during legal or regulatory proceedings. As a result, while the Wayback Machine is a good first step toward archiving web content, more sophisticated – and enterprise-class – web archiving is becoming a necessity for a growing number of applications, as discussed below. WHAT DRIVES THE NEED FOR WEB ARCHIVING? Many of the drivers for web archiving are fundamentally the same as those for email and other electronic content archiving: • Web content can be required for e-discovery and other litigation support requirements in much the same way that emails, word processing files, PDF files and other content are required. • Similarly, web content can be required to demonstrate an organization’s compliance (or lack thereof) with regulatory requirements in the context of advertising, forward- looking statements, claims of suitability and other content that must – or must not – be posted to web sites. ©2010 Osterman Research, Inc. 2
  • 4. Web Archiving: The Next Phase in the Evolution of Archiving • Many organizations have a requirement, often driven by a need to reduce risk or maintain adequate records, to preserve web site content as part of their overall records retention and records management strategy. • Unlike more traditional forms of archiving, web archiving can actually be used as a competitive and/or investigative tool to understand content posted on competitors’ web sites. WEB ARCHIVING vs. SERVER BACKUPS There are some significant differences between web backups and web archives: • Although both a backup and an archive of a Web site can reproduce content at a later date for forensic, e-discovery or data mining purposes, a web archive will do so more quickly, more affordably and more easily. • Because of the ubiquity of database-driven web sites, a backup must retain archives of all of the files, as well as all of the databases that control the web site. • Searching through backups of a web site is much more difficult and more time- consuming than searching through an archive. WEB ARCHIVING: THE NEXT STEP Web archiving can rightly be considered the next logical extension of an organization’s traditional archiving of email, files and other electronic content. While email and other types of electronic content archiving tend to focus on internal content – emails sent to and from employees and business, word processing files and presentations created for internal uses, and so forth – web archiving trends to focus much more on publicly available content. Because the web – including static web sites, web applications, social networking content, etc. – is primarily public-facing in nature, web archiving focuses primarily on content that the public has already seen or has had the opportunity to see. As a result, web archiving is focused to a greater degree than traditional electronic content archiving on issues like brand protection; reputation management; policy enforcement; protection of content based on when it is created, posted and taken down; business continuity and corporate memory. Archiving Is Already an Established Best Practice THE WEB IS GROWING RAPIDLY The amount of content on the web has ballooned exponentially in recent years. For example, as of December 2009, there were 234 million web sites, 47 million of which were added just in 2009ii - an average of nearly 129,000 web sites added every day. Further, even as far back as 2008 there were well in excess of one trillion unique URLs on the web and the number continues to grow at a rapid pace. Growth of the web is being driven by a number of factors, including the ubiquity of web access, the ease and low cost with which content can be published and updated, and ©2010 Osterman Research, Inc. 3
  • 5. Web Archiving: The Next Phase in the Evolution of Archiving greater cultural acceptance of the web as a medium of information-sharing and commerce. For these reasons, both business and government are increasingly reliant on the web as their primary means of communications and process management. Consequently, the market for web archiving – as well as archiving of email, files, SharePoint content and other information – is growing at a healthy pace. Web archiving, currently a small segment of the total content archiving market, is poised to become an enormous area of growth, driven by the issues discussed in this white paper. GROWTH IN THE MARKET IS DRIVEN BY A VARIETY OF FACTORS For just about any company, government agency or educational institution, there are four primary drivers for archiving their electronic content. However, the importance of these drivers will vary by an organization’s size, the industry(ies) in which it participates, the advice of its internal and external legal counsel or compliance officers, and the locales in which it operates: • Driver #1: Litigation Electronic content stores, including web sites, contain a growing proportion of business records that must be preserved for long periods of time. Further, this content is frequently requested during discovery proceedings because of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is critical that all relevant electronic content be made available for e-discovery purposes. Further, when a hold on data is required, it is imperative that an organization immediately be able to begin preserving all relevant data. For example, if a dispute arises because of a claim made on a page of a company’s web site, that content must be preserved for as long as a court, regulator or other authorized entity may deem necessary. An enterprise-class web archiving system allows organizations to immediately place a hold on data when requested by a court or on the advice of legal counsel. If an organization is not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of serious consequences, ranging from embarrassment to major legal sanctions or heavy fines. Litigants that fail to preserve electronic content properly are subject to a wide variety of consequences, including brand damage, additional costs for third-parties to review or search for data, court sanctions, directed verdicts or instructions to a jury that it can view a defendant’s failure to produce data as evidence of culpability. In addition to the e-discovery and legal hold benefits, an enterprise-class web archiving system allows an organization to perform either formal or informal early case assessment activities. For example, if a customer makes a claim against a company based on a statement made on the company’s web site, senior managers can search the archive for information that will help them determine the potential liability they face. If this assessment of the potential lawsuit results in a determination that the company was indeed wrong in making the claim, they can instruct legal counsel to pursue a quick legal settlement. If, on the other hand, the ©2010 Osterman Research, Inc. 4
  • 6. Web Archiving: The Next Phase in the Evolution of Archiving assessment results in the discovery of information that supports the company’s position, that information can be used to convince the customer to drop the case or it can help win the case if it goes to trial. In either case, an archiving system can help the organization to understand its position early on, either avoiding unnecessary legal fees or an adverse judgment, or reducing its costs by proving the sufficiency of its case. • Driver #2: Regulatory Compliance For just about every organization, there are a large and growing number of regulatory obligations to preserve electronic content. Some of the more important requirements are: o Sarbanes-Oxley Act of 2002 The Sarbanes-Oxley Act of 2002 requires all public companies and their auditors to retain such relevant records as audit workpapers, memoranda, correspondence and electronic records for a period of seven years. Further, Section 403 of Sarbanes-Oxley amended Section 16 of the Securities and Exchange Act of 1934 to include a requirement for public companies to post certain types of content on their web sites. Under Sarbanes-Oxley, company officers are obliged to report internal controls and procedures for financial reporting and auditors are required to test the internal control structures. Businesses have to ensure that information is preserved – whether paper or electronic – that would be relevant to the company’s financial reporting. o Health Insurance Portability and Accountability Act of 1996 (HIPAA) All organizations operating in the healthcare field need to comply with HIPAA to ensure the safety of Protected Health Information. Organizations are required to protect the data from unauthorized users, as well as to retain for six years a broad range of documentation regarding their compliance. As part of the American Recovery and Reinvestment Act of 2009 (ARRA), the provisions of HIPAA have been significantly expanded. A key component of ARRA is the Health Information Technology for Economic and Clinical Health Act (HITECH). Now, business partners of entities already covered by HIPAA, such as pharmacies, healthcare providers and others, are required to comply with HIPAA provisions. This includes attorneys, accounting firms, external billing companies and others that do business with covered entities. While these business associates were accountable to the covered entities with which they did business under the old HIPAA, these associates are now liable for governmental penalties under the new law. HIPAA violations have been expanded dramatically. For example, if a covered entity or one of their business associates loses 500 or more patient records, it must notify HHS and a “prominent media outlet” to let them know what has occurred. Section 13402 of HITECH requires that if a “covered entity has insufficient or out-of-date contact information for 10 or more individuals, the ©2010 Osterman Research, Inc. 5
  • 7. Web Archiving: The Next Phase in the Evolution of Archiving covered entity must provide substitute individual notice by either posting the notice on the home page of its web site or by providing the notice in major print or broadcast media where the affected individuals likely reside.” Fines for HIPAA violations can now reach as high as $1.5 million per calendar year. o Securities and Exchange Recent FINRA Disciplinary Commission Rules Actions Related to Web Content Members of national securities exchanges, brokers and dealers are • An individual posted false and obliged to preserve all records for a misleading information on a minimum of six years, the first two years Google Finance bulletin board in an easily accessible place (SEC Rule relating to securities recomm- 17a-4). The affected records are broad endations. The posting contained and encompass originals of predictions and projections of communications generated and received future prices for the securities that by individuals within financial institutions, were recommended, but the posting was made without including inter-office memoranda and approval. FINRA fined the internal audit working papers. Also individual $10,000 and suspended included are automated messages sent to him from associating with any all customers, which could include email FINRA member for six months. blasts. The records may be "immediately produced or reproduced on 'micrographic • A company made false and media' [microfilm, microfiche or similar] misleading statements on its web or by means of 'electronic storage media'. site related to low cost As noted above the Securities and commission rates and direct Exchange Act of 1934 has been amended access to traders. The company was censured and fined $20,000. to specifically include the requirement to post certain types of content on the web. • An affiliate of a company participated in and won CD o Financial Industry Regulatory auctions without disclosing it was Authority (FINRA) an auction participant. Further, FINRA is a non-governmental regulator the advertising materials used formed in 2007 by the merger of various contained misleading, functions of the New York Stock unwarranted and exaggerated Exchange and the National Association of statements, and published Securities Dealers. FINRA manages a misleading market clearing yields wide variety of rules that are imposed on its web site. The company was found to have violated Rule 2210 upon the more than 5,000 brokerage and fined $225,000. firms and nearly 675,000 registered representatives it oversees. FINRA requires that various types of communications with the public must be filed prior to their use, including content that often would be posted on web sitesiii. This includes CMO advertisements, sales literature and investment analysis tools. ©2010 Osterman Research, Inc. 6
  • 8. Web Archiving: The Next Phase in the Evolution of Archiving o Model Requirements for the Management of Electronic Records (MoReq) MoReq is a specification, originally developed in 2001, that defines the functional requirements for the manner in which electronic records are managed in an Electronic Records Management System. MoReq has been used widely in Europe and has been updated with MoReq2. o Other requirements A small sampling of the many other requirements for data retention are FINRA 3010, the Investment Advisors Act of 1940 (hedge funds), the Gramm-Leach- Bliley Act, IDA 29.7, FDA 21 CFR Part 11, OCC Advisory, the Financial Modernization Act 1999, Medicare Conditions of Participation, the Fair Labor Standards Act, the Americans with Disabilities Act, the Toxic Substances Control Act, the UK Companies Act, the UK Company Law Reform Bill - Electronic Communications, the UK Combined Code on Corporate Governance 2003, the UK Human Rights Act, Basel II, and the Markets in Financial Instruments Directive. • Driver #3: Knowledge Management and Data Mining There is an enormous amount of useful content that is posted to a company’s own web site or other sites. This includes identifying and extracting information about companies’ products, their public financial information, their participation in trade shows and a wealth of other types of content. Applications for this information include competitive analysis, determination of compliance with various statutes, performing analytics to determine at what time of year certain events take place, and so on. • Driver #4: Maintain Corporate Memory Web archiving can be very useful for maintaining a corporate record of what has been posted to a web site, how long this content was maintained or when it was replaced. For example, a company may want a record of its web site for historical purposes, or it may need an archive in order to re-use some of its content at a later date. Maintaining an accurate archive of web content can significantly reduce the costs associated with recreating this content. The Consequences of Not Archiving Web Content The vast majority of organizations do not adequately archive their web content and they face a number of risks from not doing so: • Increased risk in legal disputes An inability to produce past content from web sites – as with any electronic content – carries with it increased risk during legal actions. This includes an inability to produce time-stamped copies of web pages that will be admissible in court, an inability to respond to e-discovery requests when specific web content is required, and an inability to place legal holds on data so that existing web content is not overwritten when a legal dispute has been initiated or is anticipated. ©2010 Osterman Research, Inc. 7
  • 9. Web Archiving: The Next Phase in the Evolution of Archiving • Risk of non-compliance with regulatory obligations Many heavily regulated organizations, such as broker-dealers, have specific obligations to make (or not make) statements or claims on their web site. For example, FINRA Rule 2210 requires broker-dealers to archive their institutional communications, retail communications and correspondence. Because advertising and other public-facing communications often appear on regulated entities’ web sites, it is critical that web content is archived. • Loss of context for notices, marketing messages, etc. An organization that is not able to archive its web content cannot easily provide the context for its various web-based marketing messages and other communications. The use of this otherwise lost historical data can help a company keep track of past marketing campaigns, offers, policy statements, notifications to the public and a wide range of other content. • An inability to prove when statements were made or retracted Similarly, not archiving web content makes it very difficult to prove exactly when content was posted or removed from a web site or web page. For example, if a press release is embargoed until a certain date and time, a web archiving system can demonstrate exactly when the content was posted, and conversely can prove that the content was not posted before the embargo had been lifted. Another example is that of warning letters issues by the US Food and Drug Administration. These letters warn pharmaceutical manufacturers and other regulated companies about misleading statements, missing information and other claims. As but one of the many examples of such letters is an October 18, 2010 letteriv to a pharmaceutical company, in which it was advised that two of its web pages discussing a magnetic resonance imaging contrast media it produces “omits important information about the approved indication for [the product], and both webpages misleadingly suggest unapproved new uses for the drugs.” Maintaining a web archive is critical to ensuring that an accurate record of content can be preserved and demonstrated when required. • Loss of digital heritage/corporate memory When web content is not archived, a significant proportion of an organization’s digital heritage – or corporate memory – simply disappears. Preservation of this content is important on a number of levels – legal, regulatory, productivity, etc. – but also because it represents something of the corporate history of the firm in the form of announcements to the public and other content that constitutes an organization’s digital record. • An inability to gauge the effectiveness of web campaigns Some organizations use their web site extensively to present marketing campaigns, post notifications of sales or special offers, and announce promotions of various types. If an organization cannot accurately archive its web content, it is at a disadvantage when attempting to correlate customer activity like sales calls or web inquiries with the specific timing of announcements and other web content. ©2010 Osterman Research, Inc. 8
  • 10. Web Archiving: The Next Phase in the Evolution of Archiving • Productivity and monetary loss from recreating unarchived content If web content is not archived and must be recreated, there can be significant time and money lost by those who created the original content, those who must code the content anew, etc. A web archive can, therefore, make various types of employees more efficient and save the organization money by allowing web content to be easily discovered and reused. There Are Many Use Cases for Web Archiving There is a large and diverse set of use cases for web archiving, some examples of which are discussed below: • Facilitating regulatory compliance There is a wide range of applications for web archiving in the context of regulatory compliance. For example, state consumer protection agencies, the Federal Trade Commission, various watchdog groups and similar organizations worldwide have an interest in monitoring the claims, advertising, marketing messages and other content posted by companies on their web sites. Archiving web content from these organizations is crucial to monitoring their compliance with various regulations and statutes. As but one example of the myriad such compliance obligations that exist is the aforementioned FINRA Rule 2210, a set of compliance obligations imposed on broker-dealers and certain others in the financial services industry to advertise their services accurately. Similarly, government agencies have obligations with regard to state sunshine and freedom-of-information laws to provide content to citizens upon demand. Archiving of web content posted on government-operated web sites is key to helping government agencies fulfill their obligations under these requirements. • Checking web content for copyright violations Web archiving can be extremely useful in capturing content from various sources on the web and then searching that content for potential violations of copyright. For example, a major US-based men’s magazine uses the Wayback Machine roughly every month to search for content on the web that might be using its trademarked logo or other content, particularly its published images. As noted above, while the Wayback Machine offers some utility for this type of application, an enterprise-class web archiving capability can provide timelier and more complete information, not to mention the ability to accurately determine when content was posted and deleted from web pages. This can be particularly important in cases where a violator takes down content after receiving notice of a legal action by a copyright holder – an inability to prove exactly when the content was taken down can undermine a legal case. An important case in this regard was Innervision Web Solutions’ use of the domain name “DellComputersSuck.com”. Because Dell contended that Innervision had used the domain name to redirect visitors to the Innervision web site for commercial gain, and because they were able to prove this based on archived web content, Dell was ©2010 Osterman Research, Inc. 9
  • 11. Web Archiving: The Next Phase in the Evolution of Archiving able to have this domain transferred to its ownership because Innervision was found to have registered the domain in bad faith. • Proving the bona fides of expert witnesses The Federal Rules of Civil Procedure, Rule 26 requires that expert witnesses whose testimony is introduced during legal proceedings offer “the witnesses’ qualifications, including a list of all publications authored in the previous 10 years.” Because a growing proportion of many such experts’ publications are electronic in nature, such as blog posts or other web-based content, it is increasingly important for this content to be available to all parties during a legal proceeding. From the perspective of the litigating party that has not hired an expert witness, it is particularly important to be able to access web archives of all of the content offered by that witness. For example, if a litigant can access content older than 10 years, or if they can uncover an obscure blog post that might be contrary to the testimony offered in court, this may prove to be extremely valuable. • Demonstrating the veracity of electronic content In Vinhnee v. American Express, the defendant owed American Express in excess of $40,000 and the company sued to recover. Although American Express presented records of the defendant’s monthly statements, the company could not demonstrate the authenticity of these records and so lost the case, even after an appeal. In another case, Janssen-Ortho Inc. v. Novopharm Limited, an affidavit was presented that contained the link to a home page, but it did not include a copy of the page contents. The Federal Court in Canada that heard the case did not accept this affidavit, finding it to represent insufficient evidence. In both cases, a web archiving capability that could demonstrate the veracity of the information presented, along with verifiable time and date stamping, would likely have enabled the losing party to win its case. • Performing marketing analysis A web archiving capability can be very useful when researching various types of marketing messages as part of a promotional campaign, even when this research is about a competitor. For example, a hotel chain may wish to archive the web content of its three leading competitors to determine when specific messages were posted to the web and when they were taken down. This information can then be correlated with sales data, marketing reports and other information to determine which messages were most or least effective. • Conducting research A web archiving capability can be extraordinarily useful in a wide range of research applications, such as a journalist exploring the positions of a political candidate prior to conducting an interview, a customer researching exactly when a company’s stated policy was first posted to its web site or when it was withdrawn, a human resources staffer investigating the statements made to a blog post or Facebook wall by a prospective employee, or when and where information about a trade secret was first ©2010 Osterman Research, Inc. 10
  • 12. Web Archiving: The Next Phase in the Evolution of Archiving posted to the web, to name but a few of the tens of thousands of potential use cases for web archiving focused on research. THE BOTTOM LINE While there are a variety of applications for web archiving technology, the bottom line is that web content must be preserved for the same reasons that email and other electronic content must be archived. This was summarized in a landmark court decisionv in which the presiding judge wrote, “This Court sees no reason to treat websites differently than other electronic files.” Key Issues in Selecting a Web Archiving Vendor There are a number of features, functions and capabilities that decision makers should consider as they evaluate web archiving solutions. Among these are the following: BREADTH OF WEB CONTENT ARCHIVING A web archiving solution should be able to archive a wide variety of content, from individual web pages to entire web sites. This should also include social media pages, RSS feeds, blogs and any other content that might be required for e-discovery, research or other uses. SUPPORT FOR A WIDE RANGE OF TECHNOLOGIES A wide and growing variety of technologies are used on the web, including Adobe Flash, AJAX, Javascript, PHP, various image formats (JPG, PNG, GIF, etc.), video content and other formats. Any web archiving technology must be able to archive all of these technologies. Further, it must accommodate new technologies as they become available. FLEXIBILITY OF ARCHIVING A web archiving platform must also provide flexibility in the timing of archiving. Unlike email or file archiving that is driven by the creation of discrete emails or files, web archiving is based on specific timing requirements. For example, a web archive should be able to archive all necessary web content at regular intervals, on a one-off basis, automatically, manually, etc. In short, a web archiving platform must be able to archive web content whenever it is required. ANALYSIS AND REPORTING TOOLS Web archiving capabilities should also provide robust analysis and reporting tools so that content can be analyzed for purposes of e-discovery, litigation support, regulatory compliance, marketing analysis or other purposes; or for purposes of reporting high- level results to senior managers. For example, senior counsel may want to analyze an entire web site’s contents over a particular date range for a set of keywords that may be required as part of an early case assessment exercise. Or, a marketing manager may want to search a competitor’s blog over the past year to search for instances of business partners being mentioned. Analysis tools will ideally support the creation of charts to aid in the analysis of trends, such as comparisons of web content over time. ©2010 Osterman Research, Inc. 11
  • 13. Web Archiving: The Next Phase in the Evolution of Archiving INTEGRATION WITH EXISTING SYSTEMS Web archiving capabilities should integrate with other systems in place in the organization, including analysis tools, existing archiving systems for email, etc. The ability to integrate with these systems will make searching and analyzing web content easier and more efficient, and will allow organizations to respond more quickly to time- sensitive requests. Further, integration with existing systems will allow data to be analyzed without users learning a new tool, interface, etc. DELIVERY MODELS A web archiving platform should support a flexible delivery model. While many organizations prefer an on-premise solution that can be managed completely behind the corporate firewall, a growing number of organizations are opting for cloud-based solutions that are completely managed by a third-party service provider. FISMA-COMPLIANCE FOR FEDERAL GOVERNMENT CUSTOMERS The Federal Information Security Management Act of 2002 (FISMA) requires US federal agencies to create, implement and document an information security program to support their information management goals. A key goal of FISMA is the archiving of information assets, including web sites. Consequently, a best practice focused on FISMA compliance will include regular capture of all relevant web site content, including secure, long-term storage of this content. ABILITY TO PERFORM FULL-TEXT/CONTENT SEARCHING Another important feature of any web archiving solution is the ability to search for content using full-text/searching capabilities. This is particularly important when searching for specific keywords or phrases during an e-discovery or similar exercise in much the same way that this type of search is critical for any other type of archived content, such as email or files. USE OF ORGANIZATIONAL TOOLS Organizational tools are also a very useful feature for a web archiving system because it allows reviewers to organize content for subsequent searches. For example, the ability to organize content into folders, tag specific sections or pages for later review, or add notes to pages or sections is very helpful for paralegals who are scouring archived web content for later and more thorough review by senior counsel. ABILITY TO COLLABORATE USING ARCHIVED CONTENT Finally, it is important that any web archiving capability allow users to collaborate based on this archived content. Just as with email or other types of content archiving, teams of individuals will normally work on large cases involving archived web content and their ability to collaborate is essential. ©2010 Osterman Research, Inc. 12
  • 14. Web Archiving: The Next Phase in the Evolution of Archiving Conclusion: Consider Web Archiving Because the web continues to grow in importance for both business and government as a medium for communication and commerce, archiving of web content should become an essential element of any organization’s risk mitigation and compliance strategy. As a result, organizations should seriously consider developing a web archiving policy and deploying technology that can support this policy. About the Sponsor of This White Paper ABOUT REED TECHNOLOGY Reed Technology & Information Services (RTIS) offers the Reed Tech Web Archiving Service for corporate enterprises, government, and professional services companies. Reed Tech has been providing clients with information capture, conversion, management, distribution and transformation services for almost 50 years. Reed Tech’s clients include large government agencies like the U.S. Patent & Trademark Office, a wide range of pharmaceutical and other life sciences companies, and law firms of all sizes. Reed Tech is a wholly-owned subsidiary of Reed Elsevier, an $8b global provider of professional information and online workflow solutions in the Science, Medical, Legal, and Risk and Business sectors. With almost 1,000 full time employees, Reed Tech reports in through LexisNexis, a leading global provider of content-enabled workflow solutions to professionals in law firms, corporations, government, law enforcement, tax, accounting, academic institutions and risk and compliance assessment. ABOUT ITERASI Iterasi Inc. - creates enterprise-class web archiving technology applications specifically for regulatory compliance, litigation protection, and e-discovery. Pete Grillo, CEO, founded the company in 2007. ©2010 Osterman Research, Inc. 13
  • 15. Web Archiving: The Next Phase in the Evolution of Archiving © 2010 Osterman Research, Inc. All rights reserved. No part of this document may be reproduced in any form by any means, nor may it be distributed without the permission of Osterman Research, Inc., nor may it be resold or distributed by any entity other than Osterman Research, Inc., without prior written authorization of Osterman Research, Inc. Osterman Research, Inc. does not provide legal advice. Nothing in this document constitutes legal advice, nor shall this document or any software product or other offering referenced herein serve as a substitute for the reader’s compliance with any laws (including but not limited to any act, statue, regulation, rule, directive, administrative order, executive order, etc. (collectively, “Laws”)) referenced in this document. If necessary, the reader should consult with competent legal counsel regarding any Laws referenced herein. Osterman Research, Inc. makes no representation or warranty regarding the completeness or accuracy of the information contained in this document. THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR IMPLIED REPRESENTATIONS, CONDITIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE DETERMINED TO BE ILLEGAL. i http://www.archive.org/about/faqs.php#The_Wayback_Machine ii http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/ iii Filing Communications for FINRA Review Webcast iv http://www.fda.gov/ICECI/EnforcementActions/WarningLetters/ucm230796.htm v Arteria Prop. Pty Ltd. v. Universal Funding V.T.O., Inc., 2008 WL 4513696 (D.N.J. Oct. 1, 2008) ©2010 Osterman Research, Inc. 14