SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Web Archiving for
               Compliance & eDiscovery




ALEPH ARCHIVES Ltd. ✉ 600 Blv de Maisonneuve suite 1700 - Montréal, Québec (Canada) / chemin des Croix-Rouges 16 - 1007 Lausanne (Switzerland)
                                             ✎ info@aleph-archives.com ☞ aleph-archives.com
Copyright © 2012 Aleph Archives. All Rights Reserved.




WEB ARCHIVING

INTRODUCTION
Quick access to digital data and electronic information stored online is a «must have» when it turns to
elaborate strategies in litigation or statutory compliance turmoil.


There are however many obstacles to permit and manage such access in an efficient way, whilst tak-
ing into account both the frequent complexity of the related turmoil and the legal context which need
to be dealt with. It is often impossible or too late to obtain the relevant information when it is neces-
sary to, such as during eDiscovery processes.


Aleph Archives is an IT service provider dedicated to companies with specific needs regarding Web-
content preservation. Aleph Archives offers turnkey tools to easily and efficiently retrieve relevant data
stored online.


According to recent researches, the average life expectancy of a website is less than 75 days, and
disputes over the content of websites are on the increase. In a certain number of countries, there are
regulatory and archiving compliance regulations (i.e. Sarbanes-Oxley Act - US, Health Insurance Port-
ability and Accountability Act - US, Gramm-Leach Bliley Act. -US, Federal Rules of Civil Procedure -
US, etc) governing, and authorities (i.e. SEC and FINRA - US, Financial Services Authority - UK) based
thereon which supervise, the different industry sectors.


Through a unique cloud-based Web archiving platform named CAMA®, Aleph Archives provides a
«Web Preservation» services for regulatory compliance, litigation support and eDiscovery to help cor-
porate entities, legal and governmental authorities in the collection, management and archiving of their
huge and increasing Web content. CAMA® is the only platform that archives and keeps records of
your websites, webpages, and web presence at large. CAMA® clearly evidences the content of web-
site which has been shown to a particular enduser during its visit thereof and equally as important,
which content – and hence which data - have not.


Web archiving for eDiscovery process is a recent "technological niche", as opposed to legacy eDis-
covery which has been used for years to preserve electronic data (eg. email, files, etc.). The Web ar-
chiving eDiscovery process is based on three main features, as outlined by the Electronic Discovery
reference Model: thorough gathering of electronically stored information from Websites, full access
and playback of any archived web content and conversion to a form that allows full-text search.



                                                                                                             1
Copyright © 2012 Aleph Archives. All Rights Reserved.




PRODUCTS & INNOVATION

CAMA® Web Archiving Platform
Aleph Archives is a pioneer in the domain of Web archiving. We offer a high-quality archive accessibil-
ity and rendering. With CAMA®, Aleph Archives sets the web archiving process and the related quality
assurance (QA) to a higher level by working with crawl engineering experts, QA dedicated teams and
a powerful - yet easy to use - archive access technology1.




                      Load the archived
                     version with a click




                                                                                                            Testimonials
                       CAMA® in action: archived (07/04/2011) version of Toyota’s Corporate website
                                                                                                            and videos


Aleph Archives targets the companies in need of strict, reliable archiving processes to
ensure compliance with SEC and FINRA regulations. The CAMA® Web archiving platform is more effi-
cient and more reliable than any solution of its main competitors. Aleph Archives offers open (WARC -
ISO 28500:2009 2 ), adaptive (cloud-based computing) and innovative (scheduled crawls, export Web
archives as PDF/PNG, antiviral check, CAMA® Appliance, real-time results deduplication, multilingual
search and translation), etc.


1   Products demo at: http://www.youtube.com/user/alepharchives/

2   WARC ISO file format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717


                                                                                                                        2
Copyright © 2012 Aleph Archives. All Rights Reserved.




CAMA® belongs to the category of « client-based & web-served » archiving solution (refer to Appen-
dix A and B for more details) that allow creating and maintaining stable, time-structured, verifiably au-
thentic and independent versions of corporate web presence, « social media » included.


            CAMA® in action: archived (05/10/2011) version of AerzteZeitung online German Newspaper




                                         Play all embedded videos as usual




Aleph Archives’s strategy aims at satisfying any of its clients, as CAMA® offers high-quality archived
websites (which can be filed as evidence in case of litigation), easy-to-use browsing and access tools,
and a full-Web-based service to reduce costs (refer to Appendix C).





                                                                                                               3
Copyright © 2012 Aleph Archives. All Rights Reserved.




                  Today (08/02/2011) live version of NY Daily newspaper




                                                                                                Timeline




Qrcode, Digital
Signing, and
Timestamping


Options Pane




                                 CAMA® in action: archived (10/05/2011) version of NY Daily newspaper



  
                                                                                                             4
Copyright © 2012 Aleph Archives. All Rights Reserved.




MARKET SECTORS: who is
CAMA® suitable for?

Corporates
a. E-Discovery
Litigation Protection — Websites contain a growing proportion of business records that must be pre-
served for long periods of time. This content is frequently requested during discovery proceedings be-
cause of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is
critical that all relevant electronic content be made available for e-discovery purposes.


Legal Hold — When a hold on data is required, it is imperative that an organization immediately begins
preserving all relevant data. Our web archiving platform CAMA® allows organizations to immediately
place a hold on data when requested by a court or on the advice of legal counsel. If an organization is
not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of se-
rious consequences, ranging from embarrassment to major legal sanctions or heavy fines.


b. Regulatory Compliance
For just about every organization, there are a large and growing number of regulatory obligations to pre-
serve electronic content. Some of the more important requirements are:


    • Sarbanes-Oxley Act of 2002
    • Health Insurance Portability and Accountability Act of 1996 (HIPAA)
    • Securities and Exchange Commission Rules (SEC)
    • Financial Industry Regulatory Authority (FINRA)
    • Model Requirements for the Management of Electronic Records (MoReq)

c. Maintain Corporate Memory & Knowledge Management
Web archiving can be very useful for maintaining a corporate record of what has been posted to a Web
site, how long this content was maintained or when it was replaced. For example, a company may want
a record of its Web site for historical purposes, or it may need an archive in order to re-use some of its
content at a later date. Maintaining an accurate archive of Web content can significantly reduce the
costs associated with recreating this content.





                                                                                                            5
Copyright © 2012 Aleph Archives. All Rights Reserved.




Government
Virtually all government agencies have regulatory obligations to preserve electronic content. Because
your agency’s online content is increasing both in complexity and volume, and because governments
are held accountable for the information they publish on the web, you need to employ a records re-
tention policy.


The 2006 changes to the Federal Rules of Civil Procedure indicate that all organizations (including go-
vernments) must be able to find, capture, and produce electronically stored information that might be
relevant to a judicial or regulatory request. This can’t be done with server backups, CMS revision con-
trol, or other outdated methods. You need a solution that can provide indisputable proof of your online
records integrity and authenticity (as required by the Federal Rules of Evidence).


For example, 2010 saw the Executive Office of the President (EOP) issue a solicitation to:


« Provide the necessary services to capture, store, extract to approved formats, and transfer content
published by EOP on publicly-accessible web sites, along with information posted by non-EOP persons
on publicly-accessible web sites where the EOP offices under PRA maintains a presence, throughout
the term of the contract. »


Other requirements come from:


    • Presidential Records Act (PRA)
    • National Archives and Records Administration (NARA)
        • E-GOV - electronic records management initiatives
        • Guidance on Managing Records in Web 2.0/Social Media Platforms, October 20th, 2010
        • Library of Congress
    • Federal Rules of Civil Procedure (FRCP)
    • Department of Commerce
    • Department of Energy
    • Department of Justice
    • Environmental Protection Agency
    • Office of Management & Budget
    • Securities and Exchange Commission Rules (SEC)
    • Library & Archives Canada





                                                                                                           6
Copyright © 2012 Aleph Archives. All Rights Reserved.




Website and « social media archiving3 » is a good solution for e-discovery preparedness. Aleph Archi-
ves technology uses web bots (i.e crawlers) that capture all web pages (including social media). The
web pages are stored exactly as they are captured (including links, rich media, video, and Flash),
which satisfies regulatory requirements for digital records. Aleph Archives also provides a digital times-
tamp and signature for each archived page, ensuring data integrity and authenticity.  With this SaaS
solution (no tedious installation or software), governments can sign up and begin archiving in less than
an hour.


Adopting a web archiving policy is essential. But it’s not just for big cities or the federal government.
Aleph Archives’s pricing is competitive so that even small towns can stay prepared.  
The Internet will only continue to grow in scale and complexity, and governments are increasingly in-
terested in how it can be used for civic growth and development.The issue of records retention must
be addressed from the start, so that agencies can move forward confidently online.


               « Government websites are public records and must be archived to comply with
                                   Public Records Laws. Start archiving now. »



Finance
Online marketing/communications can present a challenge for securities traders, investment advisors,
banks, and others in the financial services industry. The benefits of advancing technologies must be
weighed against the risks associated with non-compliance in the area of books and records retention.
Failure to meet the demands of industry standards can result in hefty fines and bad publicity.

Multiple sets of guidelines for the financial industry (issued by SEC, FDIC, FSA, SOX, FINRA, and
others) demand the preservation of business records (both paper AND electronic) in such a way that
the data can be reproduced in a timely and complete manner to a regulator.  These requirements are
now being extended to include newer tools such as social media platforms, and FINRA has advised
that no compliance grace period will be in effect for these new technologies.

It’s critical that firms implement a robust records retention policy for their websites and « social media
pages ».  Should your corporate web presence be investigated or questioned, a perfect representa-
tion of your company’s online activity is a necessity — and that’s exactly what CAMA® provides. 



                 « Website archiving is vital to fulfilling many key FINRA and SEC regulations.
                                            Start complying today. »



3   Twitter and Government Transparency


                                                                                                              7
Copyright © 2012 Aleph Archives. All Rights Reserved.




Food and Drugs Companies
In archiving their electronic data, public traded companies need to comply with the records manage-
ment regulations of the Sarbanes-Oxley (SOX) Act.


The past year has seen a dramatic increase in the FDA‘s enforcement of regulations that deal with
product claims and labeling. In an effort to be more pro-active, the agency has been investigating
companies for compliance with the FD & C Act, particularly section 403 A, which deals specifically
with product descriptions and claims. As a result, a number of companies have received warning let-
ters — which are viewable online, damaging brand reputation — addressing the product claims made
on their labels or websites.

Since most marketing now happens via websites, social media, and other Internet tools, it is of ut-
most importance for your company to have a reliable, accurate archive of all online activity. Should
your claims be investigated or questioned, defensible evidence of your website’s precise content is a
necessity — and that’s exactly what CAMA® provides.

Using crawling technology, we take automated snapshots of your website. Only new pages or chan-
ged pages are archived, saving storage space. The whole process is automatic — you don’t have to
remember to do anything.

                « Have a reliable, accurate and defensible archive of all online activity. »




Law firms
Companies creating content online or law firms can use CAMA® to provide legal proof of intellectual
property. CAMA® provides each page with a digital timestamp and a digital signature that cannot be
altered without detection and, hence, creates legal proof of copyright. This trusted, non-refutable evi-
dence stands up in a court of law if copyright ownership is ever questioned.


             « Use websites as legal evidence in court. Have CAMA® create integral and
                           authentic evidence with support for e-Discovery. »




                 “ This Court sees no reason to treat web
                 sites differently than other electronic files. ”
                                          Arteria Prop. Pty Ltd. vs. Universal Funding V.T.O., Inc





                                                                                                              8
Copyright © 2012 Aleph Archives. All Rights Reserved.




CAMA® for Social Media e-Discovery

Organizations and their employees are leveraging social media tools at unprecedented levels. With
over 150 million blogs, an average of 140 million tweets every day, and +800 millions of users of social
media sites worldwide (Facebook, LinkedIn, MySpace...), organizations are challenged to define
usage policies and implement solutions to appropriately govern, discover and preserve relevant infor-
mation from these complex and malleable data sources. Complicating the challenge of performing
discovery on social media sites is the fact that these sites also include rich media such as audio and
video, adding to an already complex environment. Legacy tools and manual processes cannot effecti-
vely manage the risk associated with social media sites and interactive content.


To successfully manage discovery of social media and protect themselves from potential risk, organi-
zations must embrace new technologies to harness and understand the meaning of the social media
content. Since social media content can be subject to legal hold if it contains relevant information, le-
gal teams must be prepared to search, identify, preserve and collect this information. Social media
sites must be managed as other enterprise data sources, as part of a comprehensive Social Media
eDiscovery and information governance program. Given the complexity and volume of social media
content, legal teams must be prepared with an automated solution that can understand meaning and
cull through voluminous data sources to find relevant information.


According to a report issued by Garner, Inc., a leading technology research and advisory firm, half of
all companies will have been asked to produce material from social media sites for e-Discovery by the
end of 2013. Debra Logan, vice president and distinguished analyst at Gartner, wrote: 


« In e-Discovery, there is no difference between social media and electronic or even paper artifacts.
The phrase to remember is if it exists, it is discoverable. Unique aspects of social media present addi-
tional challenges, but as with an overall information governance strategy, the key to avoiding or miti-
gating potential legal issues in the use of social media for business purposes is to have a governance
framework, policy and user education. ».


In addition to the challenge of meeting the legal hold and preservation obligation, organizations inclu-
ding those in the Financial Services, Healthcare, and Pharmaceutical industry, must ensure that em-
ployees are not violating regulations by creating or posting non-compliant content. As regulators re-
cognize the influence and risks associated with social media channels, they are beginning to require
organizations to actively monitor and govern employees' social media interactions.


For instance, FINRA (Financial Industry Regulatory Authority) regulatory notice 10-06, requires mem-
ber firms to supervise and archive content posted to social media sites. The Food and Drug Adminis-
tration (FDA), Federal Trade Commission (FTC), and the National Futures Association (NFA) are also




                                                                                                           9
Copyright © 2012 Aleph Archives. All Rights Reserved.

   developing rules associated with the use of social media, and the Federal Courts have issued guideli-
   nes for monitoring and managing social media sites usage (see Resources & Links section).


   For example, if you don’t have an archiving system, you could be in trouble trying to find something
   you posted.




  Loading archived
         version


                                                                                                                All media types
                                                                                                                (Flash, photos,
                                                                                                               videos, posts...)
                                                                                                                are preserved
                                                                                                                 in their native
                                                                                                                    format
NYTimes newspaper
on Facebook




All links are clickable.
Browse the archived
pages, play videos,
load images...




                           CAMA® in action: archived (05/17/2011) version of NYTimes newspaper on Facebook




   According to Facebook4:
   « Currently, you can only search for content that has been posted in the last 30 days. The range of the
   search history may be expanded in the future. »




   4   same apply to Twitter and LinkedIn, see Archiving Social Media prepares you for e-Discovery

   
                                                                                                                         10
Copyright © 2012 Aleph Archives. All Rights Reserved.



Aleph Archives’s advanced web archiving platform for e-Discovery enables organizations to proactive-
ly manage, search for, identify and preserve any social media content. CAMA® enables organizations
to take advantage of the power and business value of social networks, while ensuring FRCP, and re-
gulatory compliance.



Unique Selling Proposition

The main competitive advantages of the CAMA® platform are:


      • superior technology to capture multiple web formats in dynamic websites,
      • more comprehensive web archiving process with crawl engineering experts,
      • high-quality archive accessibility and rendering,
      • Universal Archives View (UAW) independent from OSes and browser types or versions,
      • optimized fulltext search engine tailored to very large web archive collections (billions of
         documents),
      • deduplicated full-text search results in real-time,
      • daily archiving capabilities,
      • support of WARC ISO file format,
      • dedicated quality assurance teams and processes,
      • ability to be deployed over commodity machines,
      • fault tolerant software design,
      • high availability 5



CAMA® is the only solution in the market capable of running without Internet connexion while
accessing the archives and also being able to be fully deployed « In-House » (i.e inside the cus-
tomer’s infrastructure). The « In-House » solution offers you the freedom of exploiting the potential of
CAMA® (training required).



                                            DISASTER & DATA RECOVERY
                                       « Your data safe and secure »

             Aleph Archives’s “retention service” includes shadow copies of your archived
             data in a geographically distinct locations (USA, Canada, Switzerland, France). This
             means that two copies of your web archives exist at any given time to provide
             high data availability and avoid data loss.




5   See our Service Legal Agreement (SLA)


                                                                                                               11
Copyright © 2012 Aleph Archives. All Rights Reserved.




Pricing Model
Cloud-based solution

This section describes the implementation process for Aleph’s enterprise web archiving service and
the pricing for the Set Up phases and for the provision of archive services thereafter.


Aleph may calculate the fees using one of two methods of estimation.

1. Where requirements are not fully defined, a simple overall price can be provided, which will be
based on the size and scope of the archive policy in broad terms. A breakdown of these fees may be
provided for transparency.

2. Where requirements are more fully defined, a more rigorous approach to estimating fees may be
used. This will provide a price per URL (i.e archived resource), which will be more accurate than the
simple overall price, in that it is based on the specifics of an archive strategy defined by the more de-
tailed requirements. Three parameters are involved here: the scope, the frequency, and the price per
URL.


    • The scope defines which URLs are "in" a particular crawl: the list of URLs the customer would
      like to archive.
    • The archiving frequency for each scope can vary from daily, to weekly, to monthly to quarterly, to
      annually. Aleph Archives is the only web archiving company offering a daily archiving service.
    • The price per URL is composed of:
       ‣ System administration charges;
       ‣ Archiving services fees;
       ‣ Infrastructure and storage costs (retention, data integrity, data security, etc.).


InHouse solution

All interested customers in the InHouse version of CAMA® are welcome to contact us for a quote.





                                                                                                           12
Copyright © 2012 Aleph Archives. All Rights Reserved.




                               APPENDIX A.

Web Archiving Policy

A web archiving policy is the only means of creating and maintaining a stable, time-structured, verifia-
bly authentic and independent version of the corporate web presence. « Independent » means that
access to the content must be possible without requiring the original CMS version to be installed,
configured and running. Having a web archiving policy is the only way the corporate Web-publishing
infrastructure can evolve without threatening accessibility to legacy content. It is also the only way to
avoid the continuous licensing and maintenance costs of legacy CMSs.


A substantial and enduring web archive can be achieved by generating a flat, stable and time-struc-
tred version of the published content, capturing authentic snapshots according to the corporate ar-
chiving policy. These snapshots must be taken as user-centric views of the content, i.e. accurately
reflecting the user’s experience of that particular content. In addition they must be stored and made
accessible in precisely the same form, thereby meeting legal and compliance requirements as authen-
tic copies. And they must enable discovery using familiar web paradigms such as full-text search, as
well as more sophisticated e-discovery techniques including metadata, tagging, filters and complex
search.



A1. How to choose your web archiving solution?
Web archiving has made significant progress during the last five to seven years. It now offers a choice
of approach to both policy and supporting technology. These choices should be considered carefully
against business objectives before the decision is made. The main differences lie in the capture and
access methods used.


Three different methods exist to capture and archive web content:


    a. client-side archiving
    b. transaction archiving
    c. server-side archiving



                                                                                                          13
Copyright © 2012 Aleph Archives. All Rights Reserved.




A2. Client-side Archiving
« Client-side archiving » uses an archival crawler, derived from search engine crawler technologies,
with
 significant enhancements to ensure that complex and hard-to-reach content can be found and
captured, as well as stored without change. Starting from seed pages or entry points, these tools au-
tomatically capture pages and parse them to extract all links. The process repeats and continues as
long as newly discovered pages remain within the scope defined for the crawl. The captured web
content and embedded files are stored unchanged — original and authentic copies, an exact equiva-
lent of what the generic user would have received in their browser at the time — and preserved in a
flat, standards-based and self- contained file format that can be confidently considered as future-
proof. This is especially important within a legal context.


To be effective this method requires a crawler with excellent link extraction and path-finding algorithms
that can work in a wide range of circumstances and site/page designs. In addition to client-side archi-
ving, there are two alternative methods to capture web content. Both methods need to be operated
from the server-side; require prior authorisation to services; and need access to both front-end and
back-end servers.




A3. Transaction Archiving
The first of these alternative methods, called « transaction archiving », consists of the systematic cap-
ture and archiving of all browser/server exchanges (request/response pairs), resulting from the interac-
tion of users with sites, regardless of their content type and how they are produced.


Transaction archiving enables tracking and recording of every actual instantiation of content in an au-
thentic flat HTML form, easy to maintain and preserve over time. Moreover, it can be used to archive
hidden web content, provided this content is requested, i.e. read, by the websites’ users during the
capture time.


However, transaction archiving generates unnecessary duplicates of frequently-visited pages and rai-
ses serious privacy concerns as the method implicitly relies on usage tracking.





                                                                                                           14
Copyright © 2012 Aleph Archives. All Rights Reserved.




A4. Server-side Archiving
The second, and more obvious, alternative to client side archiving is « server-side archiving ». This
consists of directly copying files in the document folders to back-up servers. Although it might appear
to be the simplest approach, it is in fact seriously flawed, from both the preservation and archive ac-
cess points of view.


To make certain that any web content archived using this method can be properly restored, server-
side archiving requires that all original CMSs, databases and other software are archived alongside the
content or are actively maintained in an operational state; or that the content is migrated to newer
CMSs, databases, etc. In any case, these activities will be required for the whole period of archive re-
tention. Interestingly, IT backups essentially rely on this method in almost all cases, systematically fai-
ling to meet long-term preservation and ac- cess capabilities that are essential for legal and com-
pliance requirements. However, for some types of hidden-web content, this method can prove to be
useful, mainly in situations where it is required to archive parts of websites that a client-side crawler
cannot reach.



A5. Comparison of Content Capture Methods
The following table summarises the main content capture methods, where: ✔ = fully supported
and ● = possible/custom development.



                                                                               Server-side    Transaction   Client-side
    Content captured as user sees it, unchanged, and authentic                                     ✔             ✔
    Archive access independent of original publishing technology                                   ✔             ✔
    Able to capture interactive or query based content                              ✔              ✔             ●
    Retains web URL space (not dependent on server link mapping)                                   ✔             ✔
    De-duplication possible                                                                        ●             ✔
    Easily directed and scheduled capture                                           ✔                            ✔
    Flexible archival scope, for a wide range of needs                                             ✔             ✔
    Able to capture browser/server exchanges (request/response pairs)                              ✔
    Web server technology independence                                                                           ✔
    Archiving services can be centralized in one place                                                           ✔
    Cost effective and efficient operations over time                                                             ✔


In most cases client-side archiving is the best approach for capturing content. The quality of the resul-
ting archive will depend mainly on the capabilities of the crawler, particularly with respect to link ex-
traction, even when links are encoded in scripts and executables. This is one of the key determinants
for capture of all files in a consistent and timely manner.



                                                                                                                     15
Copyright © 2012 Aleph Archives. All Rights Reserved.




                                     APPENDIX B.

Accessing your Web Archives

Two different methods exist to provide access to archives:


      a. website-copier approach
      b. Web-served approach


The choice is largely determined by how the files are stored. This is critically important, because web
URLs use different naming conventions to file systems, with different permissible and reserved cha-
racters, escaping rules, case sensitivity, etc.



B1. Website-copier Approach
Website copiers write all captured files directly to disk, and therefore need to modify names and links
as they are stored in order to make the archive accessible. This results in an archive that is not an au-
thentic version of the original server’s response stream.



B2. Web-served Approach
Archive web servers, on the other hand, store responses from the original server unchanged in con-
tainer files. This ensures the content and server response stream are kept in an authentic form.


The emerging standard for web archive container files is WARC6 — the Web ARChiving file format —
ISO standard ISO/DIS 28500. It is already being adopted as the foundation for web archive storage
and preservation. A WARC file records the sequence of harvested web files captured by the crawler,
each page preceded by a header containing metadata that briefly describes the harvested content, its
length and checksum.


WARC ensures the preservation of the original naming scheme and linking, thereby providing archive
storage of content in an authentic form, as well as providing the means for additional integrity checks
during the entire period of custodianship.


6   WARC file ISO format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717


                                                                                                                 16
Copyright © 2012 Aleph Archives. All Rights Reserved.




B3. Comparison of Access Methods
The following table summarises the main archive access methods, where: ✔ = fully supported
and ● = possible/custom development.


                                                                       Website Copy      Web-served Archive
    Searchable                                                               ✔                     ✔
    Browsable                                                                ✔                     ✔
    Content directly navigable from disk                                     ✔                     ●
    Content stored and accessed unchanged, and authentic                                           ✔
    Links independent of naming conventions                                                        ✔
    Storage and preservation of metadata                                     ✔                     ✔
    Access independent of file location                                                             ✔
    Standards-based archives                                                                       ✔



There is a consensus today that the website-copier approach has serious limitations concerning au-
thenticity of the archive, whereas the Web-served approach can ensure authenticity by design. In pro-
fessional use therefore, especially where legal and regulatory obligations are business priorities, the
Web-served approach is a necessity.





                                                                                                            17
Copyright © 2012 Aleph Archives. All Rights Reserved.




                                 APPENDIX C.

Web Archiving as a Best Practice
The web has matured into a central communication channel for businesses and government agencies,
with digital media (websites and other web-based content) all but replacing print media as the primary
mode of communication with customers, constituents, prospects, investors, and others.


Organizations using the web must keep accurate records of web content — online communication is
just as much of a liability as any other form of communication. As a recent case ruled: « This Court sees
no reason to treat websites differently than other electronic files. »


Web archiving has become a best practice for any organization using the web to communicate. Organi-
zations who neglect to retain accurate records of their web presence are placing themselves at unne-
cessary risk, both from a compliance and litigation standpoint.


Protect your organization by regularly archiving web content with Aleph Archives Web Archiving Plat-
form CAMA®. We provide all the technology and services you need to archive your websites and web
presence from any domain.





                                                                                                             18
Copyright © 2012 Aleph Archives. All Rights Reserved.




                                                                            APPENDIX D.
                        ALEPH ARCHIVES’s CAMA® PLATFORM
ARCHITECTURE OVERVIEW




                                                                             APPENDIX E.

                                                          More details about the architecture internals are available upon request.
                               
                                                                                                                        19
Copyright © 2012 Aleph Archives. All Rights Reserved.




                                APPENDIX E.

Elements of a Web Archiving Plan
Setup
Aleph Archives runs, tests, and calibrates the CAMA® robots to get the best rules in order to capture
your website(s) with the highest quality.


Capture
The cost related to website crawl and engineering of the target URL’s on a specified frequency.


Retention
The cost of annual storage and retaining archives of target websites. Standard plan calls 7 years re-
tention.


Operation
Includes the maintaining the designated servers and machines up and running for CAMA®, archives
access, retention, and quality assurance.


Quality Assurance (QA)
    - QA Level 1: we check and verify one level deeper (depth 1) from website root (i.e home page).
    - QA Level 2: we check and verify two levels deep from the root, and so on accordingly with QA
     Level 3 and QA Level 4.
     QA can go as far down in website depth as the client needs. In industry practice, QA Level 4 is
     sufficient for most enterprises for regulatory compliance, legal and operations purposes.
    - Exhaustive QA: we check and verify all designated website's and levels, verifying every page to
     the website’s full depth. Exhaustive QA may be cost prohibitive, depending on the customer’s
     requirements. Upon request, Aleph Archives will provide price quotation for Exhaustive QA.
    - Mixed QA: we combine a sampled QA per website level with an exhaustive QA to a certain level.




                                                                                                        20
Copyright © 2012 Aleph Archives. All Rights Reserved.




                                             APPENDIX F.

Aleph Archives provides the following CAMA® Plans:

                           FEATURE                 PROFESSIONAL       ENTERPRISE       PREMIUM
    Crawl engineering team                               ✔                  ✔               ✔
    WARC format (ISO 28500:2009) compliance              ✔                  ✔               ✔
    Scheduled crawls                                     ✔                  ✔               ✔
    Archives summary pane                                ✔                  ✔               ✔
    Document format handling (HTML, Word, Power-         ✔                  ✔               ✔

    Point, PDF, Flash …)
    Full text search                                  standard          advanced        advanced
    Full text search history                             ✔                  ✔               ✔
    Full text search queries import & export             ✔                  ✔               ✔
    Automatic language detection                         ✔                  ✔               ✔
    Documents metadata extraction and indexing           ✔                  ✔               ✔
    Infinite archives retention                           ✔                  ✔               ✔
    ARC to WARC batch migration                          ✔                  ✔               ✔
    WARC to WARC batch conversion                        ✔                  ✔               ✔
    Archives verification and repair tools                ✔                  ✔               ✔
    Text summarizer                                      ✔                  ✔               ✔
    Audit trails identification and traceability                             ✔               ✔
    Deduplicated full text search                                           ✔               ✔
    Archived resources export (PDF, PNG)                                    ✔               ✔
    Multi-core aware archives servers                                       ✔               ✔
    Archives redundancy                                                     ✔               ✔
    Load balancing for archives access                                      ✔               ✔
    Antivirus checker                                                       ✔               ✔
    Trusted archives (digital signatures)                                   ✔               ✔
    SEC 17a-4 and FINRA compliance                                          ✔               ✔
    Secured archives access (SSL Encryption)                                ✔               ✔
    Multilanguage instant translator                                        ✔               ✔
    Custom Branding                                                         ✔               ✔
    Archives compression                                                    ✔               ✔
    Archived data processing and management                                                 ✔


                                                                                                              21
Copyright © 2012 Aleph Archives. All Rights Reserved.

                          FEATURE            PROFESSIONAL         ENTERPRISE       PREMIUM
    CAMA®    Appliance                                                                  ✔
    CAMA®    Appliance on USB pen drive                                                 ✔
    CAMA®    Kit (Access API)                                                           ✔
    CAMA®    64bits                                                                     ✔
    Quality Assurance team (level)                 basic             medium            high
    Custom metadata limit                           30               unlimited      unlimited
    Collections limit                              100               unlimited      unlimited
    Accounts limit                                  10               unlimited      unlimited
    Crawled resources per month                 up to 500K           up to 5M       unlimited
    Archived resources per month                up to 500GB         up to 1TB       up to 2TB




A « Custom Plan » is also available via an online form which allows customers to choose product fea-
tures that best suit their needs.





                                                                                                          22
Copyright © 2012 Aleph Archives. All Rights Reserved.




                     RESOURCES & LINKS

☞ Aleph Archives
    - Website
    - Products demo
☞ Records Management
    Finance
    -   FINRA Regulation Notices

    -   FINRA Guidance

    -   FINRA Regulatory Notice 10-06 on Social Media

    -   Summary of NASD Rule 3110 — Books and Records

    -   Federal Rules of Evidence 901 — Data Integrity & Authenticity

    -   SEC — Division of Trading and Markets

    -   SEC — Division of Investment Management

    -   SEC Rule 17 a-4 — Books and Records

    -   Sarbanes-Oxley Act (SOX)

    -   Financial Services Authority (FSA) Handbook (Europe)

    -   FSA Handbook Section 3.2 — see Records Requirements, Sec 3.2.20 (Europe)

    -   Model Requirements for the Management of Electronic Records (MoReq) (Europe)


    Food and Drug Administration
    - Federal Rules of Evidence 901 — Data Integrity & Authenticity
    - FDA Guidance Documents — Food
    - FDA Compliance & Enforcement – Food
    - FDA Guidance Documents — Drugs
    - Code of Federal Regulations (CFR) Title 21
    - Model Requirements for the Management of Electronic Records (MoReq) (Europe)
    - Pharma Social Media Wiki
    - FDASM (Everything About the FDA, Internet, Social Media)


                                                                                                                    23

Más contenido relacionado

Destacado

Earthstaff | Your Online Profile
Earthstaff | Your Online ProfileEarthstaff | Your Online Profile
Earthstaff | Your Online ProfileStaffgroup
 
Earthstaff | Working In....
Earthstaff | Working In....Earthstaff | Working In....
Earthstaff | Working In....Staffgroup
 
Mini lecture on Learning Analytics
Mini lecture on Learning AnalyticsMini lecture on Learning Analytics
Mini lecture on Learning AnalyticsYasuhisa Tamura
 
Eurostaff | Your Online Profile
Eurostaff | Your Online ProfileEurostaff | Your Online Profile
Eurostaff | Your Online ProfileStaffgroup
 
Latest trends on implementing v mware on flex pod home slide
Latest trends on implementing v mware on flex pod home slideLatest trends on implementing v mware on flex pod home slide
Latest trends on implementing v mware on flex pod home slideUnitek Eduation
 
Intro to Deploying and administering server virtualization with Hyper-V and S...
Intro to Deploying and administering server virtualization with Hyper-V and S...Intro to Deploying and administering server virtualization with Hyper-V and S...
Intro to Deploying and administering server virtualization with Hyper-V and S...Unitek Eduation
 
Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...
Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...
Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...Marco Ripà
 
Group Policy Windows Server 2008
Group Policy Windows Server 2008Group Policy Windows Server 2008
Group Policy Windows Server 2008Unitek Eduation
 
Administering and configuring System Center Configuration Manager 2012 R2 SP1
Administering and configuring System Center Configuration Manager 2012 R2 SP1Administering and configuring System Center Configuration Manager 2012 R2 SP1
Administering and configuring System Center Configuration Manager 2012 R2 SP1Unitek Eduation
 

Destacado (9)

Earthstaff | Your Online Profile
Earthstaff | Your Online ProfileEarthstaff | Your Online Profile
Earthstaff | Your Online Profile
 
Earthstaff | Working In....
Earthstaff | Working In....Earthstaff | Working In....
Earthstaff | Working In....
 
Mini lecture on Learning Analytics
Mini lecture on Learning AnalyticsMini lecture on Learning Analytics
Mini lecture on Learning Analytics
 
Eurostaff | Your Online Profile
Eurostaff | Your Online ProfileEurostaff | Your Online Profile
Eurostaff | Your Online Profile
 
Latest trends on implementing v mware on flex pod home slide
Latest trends on implementing v mware on flex pod home slideLatest trends on implementing v mware on flex pod home slide
Latest trends on implementing v mware on flex pod home slide
 
Intro to Deploying and administering server virtualization with Hyper-V and S...
Intro to Deploying and administering server virtualization with Hyper-V and S...Intro to Deploying and administering server virtualization with Hyper-V and S...
Intro to Deploying and administering server virtualization with Hyper-V and S...
 
Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...
Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...
Identifiyng gifted children and dyslexia early diagnosis: risk of cheating on...
 
Group Policy Windows Server 2008
Group Policy Windows Server 2008Group Policy Windows Server 2008
Group Policy Windows Server 2008
 
Administering and configuring System Center Configuration Manager 2012 R2 SP1
Administering and configuring System Center Configuration Manager 2012 R2 SP1Administering and configuring System Center Configuration Manager 2012 R2 SP1
Administering and configuring System Center Configuration Manager 2012 R2 SP1
 

Similar a Web Archiving Whitepaper Aleph Archives

Enterprise Content Management 101 for Financial Services
Enterprise Content Management 101 for Financial ServicesEnterprise Content Management 101 for Financial Services
Enterprise Content Management 101 for Financial ServicesAlfresco Software
 
AWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAmazon Web Services
 
Brian Campo, DoD JCS, Content.gov Presentation
Brian Campo, DoD JCS, Content.gov PresentationBrian Campo, DoD JCS, Content.gov Presentation
Brian Campo, DoD JCS, Content.gov PresentationAlfresco Software
 
Partner webinar featuring CatDV
Partner webinar featuring CatDVPartner webinar featuring CatDV
Partner webinar featuring CatDVFileCatalyst
 
MySQL day Dublin - OCI & Application Development
MySQL day Dublin - OCI & Application DevelopmentMySQL day Dublin - OCI & Application Development
MySQL day Dublin - OCI & Application DevelopmentHenry J. Kröger
 
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data ChallengesWebinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data ChallengesStorage Switzerland
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.javier ramirez
 
Alfresco digital assetmanagement-042111-final
Alfresco digital assetmanagement-042111-finalAlfresco digital assetmanagement-042111-final
Alfresco digital assetmanagement-042111-finalEmil Loreto
 
Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009Cassie Findlay
 
SafePeak Globes testimonial
SafePeak Globes testimonialSafePeak Globes testimonial
SafePeak Globes testimonialVladi Vexler
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayBob Sokol
 
supporting docu-mentation.In 2005, the more than 70 agencies of .docx
supporting docu-mentation.In 2005, the more than 70 agencies of .docxsupporting docu-mentation.In 2005, the more than 70 agencies of .docx
supporting docu-mentation.In 2005, the more than 70 agencies of .docxcalvins9
 
At 306 Case Study The Newest Shipping Systems Its All About Rapid Informa...
At 306   Case Study   The Newest Shipping Systems Its All About Rapid Informa...At 306   Case Study   The Newest Shipping Systems Its All About Rapid Informa...
At 306 Case Study The Newest Shipping Systems Its All About Rapid Informa...oscarmurray
 
Webinar: How and Why to Containerize Your Legacy Applications
Webinar: How and Why to Containerize Your Legacy ApplicationsWebinar: How and Why to Containerize Your Legacy Applications
Webinar: How and Why to Containerize Your Legacy ApplicationsStorage Switzerland
 
HTML5 Offline Web Applications (Silicon Valley User Group)
HTML5 Offline Web Applications (Silicon Valley User Group)HTML5 Offline Web Applications (Silicon Valley User Group)
HTML5 Offline Web Applications (Silicon Valley User Group)robinzimmermann
 
NetCache Accelerates Web Servers
NetCache Accelerates Web ServersNetCache Accelerates Web Servers
NetCache Accelerates Web Serverswebhostingguy
 
Content Management Lifecycle for ANM
Content Management Lifecycle for ANMContent Management Lifecycle for ANM
Content Management Lifecycle for ANMAzri Jamil
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 

Similar a Web Archiving Whitepaper Aleph Archives (20)

Enterprise Content Management 101 for Financial Services
Enterprise Content Management 101 for Financial ServicesEnterprise Content Management 101 for Financial Services
Enterprise Content Management 101 for Financial Services
 
AWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage Ecosystem
 
Brian Campo, DoD JCS, Content.gov Presentation
Brian Campo, DoD JCS, Content.gov PresentationBrian Campo, DoD JCS, Content.gov Presentation
Brian Campo, DoD JCS, Content.gov Presentation
 
Partner webinar featuring CatDV
Partner webinar featuring CatDVPartner webinar featuring CatDV
Partner webinar featuring CatDV
 
MySQL day Dublin - OCI & Application Development
MySQL day Dublin - OCI & Application DevelopmentMySQL day Dublin - OCI & Application Development
MySQL day Dublin - OCI & Application Development
 
AOL Case Study
AOL Case StudyAOL Case Study
AOL Case Study
 
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data ChallengesWebinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
 
Alfresco digital assetmanagement-042111-final
Alfresco digital assetmanagement-042111-finalAlfresco digital assetmanagement-042111-final
Alfresco digital assetmanagement-042111-final
 
Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009
 
SafePeak Globes testimonial
SafePeak Globes testimonialSafePeak Globes testimonial
SafePeak Globes testimonial
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps Day
 
supporting docu-mentation.In 2005, the more than 70 agencies of .docx
supporting docu-mentation.In 2005, the more than 70 agencies of .docxsupporting docu-mentation.In 2005, the more than 70 agencies of .docx
supporting docu-mentation.In 2005, the more than 70 agencies of .docx
 
At 306 Case Study The Newest Shipping Systems Its All About Rapid Informa...
At 306   Case Study   The Newest Shipping Systems Its All About Rapid Informa...At 306   Case Study   The Newest Shipping Systems Its All About Rapid Informa...
At 306 Case Study The Newest Shipping Systems Its All About Rapid Informa...
 
Webinar: How and Why to Containerize Your Legacy Applications
Webinar: How and Why to Containerize Your Legacy ApplicationsWebinar: How and Why to Containerize Your Legacy Applications
Webinar: How and Why to Containerize Your Legacy Applications
 
HTML5 Offline Web Applications (Silicon Valley User Group)
HTML5 Offline Web Applications (Silicon Valley User Group)HTML5 Offline Web Applications (Silicon Valley User Group)
HTML5 Offline Web Applications (Silicon Valley User Group)
 
NetCache Accelerates Web Servers
NetCache Accelerates Web ServersNetCache Accelerates Web Servers
NetCache Accelerates Web Servers
 
Content Management Lifecycle for ANM
Content Management Lifecycle for ANMContent Management Lifecycle for ANM
Content Management Lifecycle for ANM
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Web Archiving Whitepaper Aleph Archives

  • 1. Web Archiving for Compliance & eDiscovery ALEPH ARCHIVES Ltd. ✉ 600 Blv de Maisonneuve suite 1700 - Montréal, Québec (Canada) / chemin des Croix-Rouges 16 - 1007 Lausanne (Switzerland) ✎ info@aleph-archives.com ☞ aleph-archives.com
  • 2. Copyright © 2012 Aleph Archives. All Rights Reserved. WEB ARCHIVING INTRODUCTION Quick access to digital data and electronic information stored online is a «must have» when it turns to elaborate strategies in litigation or statutory compliance turmoil. There are however many obstacles to permit and manage such access in an efficient way, whilst tak- ing into account both the frequent complexity of the related turmoil and the legal context which need to be dealt with. It is often impossible or too late to obtain the relevant information when it is neces- sary to, such as during eDiscovery processes. Aleph Archives is an IT service provider dedicated to companies with specific needs regarding Web- content preservation. Aleph Archives offers turnkey tools to easily and efficiently retrieve relevant data stored online. According to recent researches, the average life expectancy of a website is less than 75 days, and disputes over the content of websites are on the increase. In a certain number of countries, there are regulatory and archiving compliance regulations (i.e. Sarbanes-Oxley Act - US, Health Insurance Port- ability and Accountability Act - US, Gramm-Leach Bliley Act. -US, Federal Rules of Civil Procedure - US, etc) governing, and authorities (i.e. SEC and FINRA - US, Financial Services Authority - UK) based thereon which supervise, the different industry sectors. Through a unique cloud-based Web archiving platform named CAMA®, Aleph Archives provides a «Web Preservation» services for regulatory compliance, litigation support and eDiscovery to help cor- porate entities, legal and governmental authorities in the collection, management and archiving of their huge and increasing Web content. CAMA® is the only platform that archives and keeps records of your websites, webpages, and web presence at large. CAMA® clearly evidences the content of web- site which has been shown to a particular enduser during its visit thereof and equally as important, which content – and hence which data - have not. Web archiving for eDiscovery process is a recent "technological niche", as opposed to legacy eDis- covery which has been used for years to preserve electronic data (eg. email, files, etc.). The Web ar- chiving eDiscovery process is based on three main features, as outlined by the Electronic Discovery reference Model: thorough gathering of electronically stored information from Websites, full access and playback of any archived web content and conversion to a form that allows full-text search. 1
  • 3. Copyright © 2012 Aleph Archives. All Rights Reserved. PRODUCTS & INNOVATION CAMA® Web Archiving Platform Aleph Archives is a pioneer in the domain of Web archiving. We offer a high-quality archive accessibil- ity and rendering. With CAMA®, Aleph Archives sets the web archiving process and the related quality assurance (QA) to a higher level by working with crawl engineering experts, QA dedicated teams and a powerful - yet easy to use - archive access technology1. Load the archived version with a click Testimonials CAMA® in action: archived (07/04/2011) version of Toyota’s Corporate website and videos Aleph Archives targets the companies in need of strict, reliable archiving processes to ensure compliance with SEC and FINRA regulations. The CAMA® Web archiving platform is more effi- cient and more reliable than any solution of its main competitors. Aleph Archives offers open (WARC - ISO 28500:2009 2 ), adaptive (cloud-based computing) and innovative (scheduled crawls, export Web archives as PDF/PNG, antiviral check, CAMA® Appliance, real-time results deduplication, multilingual search and translation), etc. 1 Products demo at: http://www.youtube.com/user/alepharchives/ 2 WARC ISO file format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717 2
  • 4. Copyright © 2012 Aleph Archives. All Rights Reserved. CAMA® belongs to the category of « client-based & web-served » archiving solution (refer to Appen- dix A and B for more details) that allow creating and maintaining stable, time-structured, verifiably au- thentic and independent versions of corporate web presence, « social media » included. CAMA® in action: archived (05/10/2011) version of AerzteZeitung online German Newspaper Play all embedded videos as usual Aleph Archives’s strategy aims at satisfying any of its clients, as CAMA® offers high-quality archived websites (which can be filed as evidence in case of litigation), easy-to-use browsing and access tools, and a full-Web-based service to reduce costs (refer to Appendix C). 3
  • 5. Copyright © 2012 Aleph Archives. All Rights Reserved. Today (08/02/2011) live version of NY Daily newspaper Timeline Qrcode, Digital Signing, and Timestamping Options Pane CAMA® in action: archived (10/05/2011) version of NY Daily newspaper 4
  • 6. Copyright © 2012 Aleph Archives. All Rights Reserved. MARKET SECTORS: who is CAMA® suitable for? Corporates a. E-Discovery Litigation Protection — Websites contain a growing proportion of business records that must be pre- served for long periods of time. This content is frequently requested during discovery proceedings be- cause of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is critical that all relevant electronic content be made available for e-discovery purposes. Legal Hold — When a hold on data is required, it is imperative that an organization immediately begins preserving all relevant data. Our web archiving platform CAMA® allows organizations to immediately place a hold on data when requested by a court or on the advice of legal counsel. If an organization is not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of se- rious consequences, ranging from embarrassment to major legal sanctions or heavy fines. b. Regulatory Compliance For just about every organization, there are a large and growing number of regulatory obligations to pre- serve electronic content. Some of the more important requirements are: • Sarbanes-Oxley Act of 2002 • Health Insurance Portability and Accountability Act of 1996 (HIPAA) • Securities and Exchange Commission Rules (SEC) • Financial Industry Regulatory Authority (FINRA) • Model Requirements for the Management of Electronic Records (MoReq) c. Maintain Corporate Memory & Knowledge Management Web archiving can be very useful for maintaining a corporate record of what has been posted to a Web site, how long this content was maintained or when it was replaced. For example, a company may want a record of its Web site for historical purposes, or it may need an archive in order to re-use some of its content at a later date. Maintaining an accurate archive of Web content can significantly reduce the costs associated with recreating this content. 5
  • 7. Copyright © 2012 Aleph Archives. All Rights Reserved. Government Virtually all government agencies have regulatory obligations to preserve electronic content. Because your agency’s online content is increasing both in complexity and volume, and because governments are held accountable for the information they publish on the web, you need to employ a records re- tention policy. The 2006 changes to the Federal Rules of Civil Procedure indicate that all organizations (including go- vernments) must be able to find, capture, and produce electronically stored information that might be relevant to a judicial or regulatory request. This can’t be done with server backups, CMS revision con- trol, or other outdated methods. You need a solution that can provide indisputable proof of your online records integrity and authenticity (as required by the Federal Rules of Evidence). For example, 2010 saw the Executive Office of the President (EOP) issue a solicitation to: « Provide the necessary services to capture, store, extract to approved formats, and transfer content published by EOP on publicly-accessible web sites, along with information posted by non-EOP persons on publicly-accessible web sites where the EOP offices under PRA maintains a presence, throughout the term of the contract. » Other requirements come from: • Presidential Records Act (PRA) • National Archives and Records Administration (NARA) • E-GOV - electronic records management initiatives • Guidance on Managing Records in Web 2.0/Social Media Platforms, October 20th, 2010 • Library of Congress • Federal Rules of Civil Procedure (FRCP) • Department of Commerce • Department of Energy • Department of Justice • Environmental Protection Agency • Office of Management & Budget • Securities and Exchange Commission Rules (SEC) • Library & Archives Canada 6
  • 8. Copyright © 2012 Aleph Archives. All Rights Reserved. Website and « social media archiving3 » is a good solution for e-discovery preparedness. Aleph Archi- ves technology uses web bots (i.e crawlers) that capture all web pages (including social media). The web pages are stored exactly as they are captured (including links, rich media, video, and Flash), which satisfies regulatory requirements for digital records. Aleph Archives also provides a digital times- tamp and signature for each archived page, ensuring data integrity and authenticity.  With this SaaS solution (no tedious installation or software), governments can sign up and begin archiving in less than an hour. Adopting a web archiving policy is essential. But it’s not just for big cities or the federal government. Aleph Archives’s pricing is competitive so that even small towns can stay prepared.   The Internet will only continue to grow in scale and complexity, and governments are increasingly in- terested in how it can be used for civic growth and development.The issue of records retention must be addressed from the start, so that agencies can move forward confidently online. « Government websites are public records and must be archived to comply with Public Records Laws. Start archiving now. » Finance Online marketing/communications can present a challenge for securities traders, investment advisors, banks, and others in the financial services industry. The benefits of advancing technologies must be weighed against the risks associated with non-compliance in the area of books and records retention. Failure to meet the demands of industry standards can result in hefty fines and bad publicity. Multiple sets of guidelines for the financial industry (issued by SEC, FDIC, FSA, SOX, FINRA, and others) demand the preservation of business records (both paper AND electronic) in such a way that the data can be reproduced in a timely and complete manner to a regulator.  These requirements are now being extended to include newer tools such as social media platforms, and FINRA has advised that no compliance grace period will be in effect for these new technologies. It’s critical that firms implement a robust records retention policy for their websites and « social media pages ».  Should your corporate web presence be investigated or questioned, a perfect representa- tion of your company’s online activity is a necessity — and that’s exactly what CAMA® provides.  « Website archiving is vital to fulfilling many key FINRA and SEC regulations. Start complying today. » 3 Twitter and Government Transparency 7
  • 9. Copyright © 2012 Aleph Archives. All Rights Reserved. Food and Drugs Companies In archiving their electronic data, public traded companies need to comply with the records manage- ment regulations of the Sarbanes-Oxley (SOX) Act. The past year has seen a dramatic increase in the FDA‘s enforcement of regulations that deal with product claims and labeling. In an effort to be more pro-active, the agency has been investigating companies for compliance with the FD & C Act, particularly section 403 A, which deals specifically with product descriptions and claims. As a result, a number of companies have received warning let- ters — which are viewable online, damaging brand reputation — addressing the product claims made on their labels or websites. Since most marketing now happens via websites, social media, and other Internet tools, it is of ut- most importance for your company to have a reliable, accurate archive of all online activity. Should your claims be investigated or questioned, defensible evidence of your website’s precise content is a necessity — and that’s exactly what CAMA® provides. Using crawling technology, we take automated snapshots of your website. Only new pages or chan- ged pages are archived, saving storage space. The whole process is automatic — you don’t have to remember to do anything. « Have a reliable, accurate and defensible archive of all online activity. » Law firms Companies creating content online or law firms can use CAMA® to provide legal proof of intellectual property. CAMA® provides each page with a digital timestamp and a digital signature that cannot be altered without detection and, hence, creates legal proof of copyright. This trusted, non-refutable evi- dence stands up in a court of law if copyright ownership is ever questioned. « Use websites as legal evidence in court. Have CAMA® create integral and authentic evidence with support for e-Discovery. » “ This Court sees no reason to treat web sites differently than other electronic files. ” Arteria Prop. Pty Ltd. vs. Universal Funding V.T.O., Inc 8
  • 10. Copyright © 2012 Aleph Archives. All Rights Reserved. CAMA® for Social Media e-Discovery Organizations and their employees are leveraging social media tools at unprecedented levels. With over 150 million blogs, an average of 140 million tweets every day, and +800 millions of users of social media sites worldwide (Facebook, LinkedIn, MySpace...), organizations are challenged to define usage policies and implement solutions to appropriately govern, discover and preserve relevant infor- mation from these complex and malleable data sources. Complicating the challenge of performing discovery on social media sites is the fact that these sites also include rich media such as audio and video, adding to an already complex environment. Legacy tools and manual processes cannot effecti- vely manage the risk associated with social media sites and interactive content. To successfully manage discovery of social media and protect themselves from potential risk, organi- zations must embrace new technologies to harness and understand the meaning of the social media content. Since social media content can be subject to legal hold if it contains relevant information, le- gal teams must be prepared to search, identify, preserve and collect this information. Social media sites must be managed as other enterprise data sources, as part of a comprehensive Social Media eDiscovery and information governance program. Given the complexity and volume of social media content, legal teams must be prepared with an automated solution that can understand meaning and cull through voluminous data sources to find relevant information. According to a report issued by Garner, Inc., a leading technology research and advisory firm, half of all companies will have been asked to produce material from social media sites for e-Discovery by the end of 2013. Debra Logan, vice president and distinguished analyst at Gartner, wrote:  « In e-Discovery, there is no difference between social media and electronic or even paper artifacts. The phrase to remember is if it exists, it is discoverable. Unique aspects of social media present addi- tional challenges, but as with an overall information governance strategy, the key to avoiding or miti- gating potential legal issues in the use of social media for business purposes is to have a governance framework, policy and user education. ». In addition to the challenge of meeting the legal hold and preservation obligation, organizations inclu- ding those in the Financial Services, Healthcare, and Pharmaceutical industry, must ensure that em- ployees are not violating regulations by creating or posting non-compliant content. As regulators re- cognize the influence and risks associated with social media channels, they are beginning to require organizations to actively monitor and govern employees' social media interactions. For instance, FINRA (Financial Industry Regulatory Authority) regulatory notice 10-06, requires mem- ber firms to supervise and archive content posted to social media sites. The Food and Drug Adminis- tration (FDA), Federal Trade Commission (FTC), and the National Futures Association (NFA) are also 9
  • 11. Copyright © 2012 Aleph Archives. All Rights Reserved. developing rules associated with the use of social media, and the Federal Courts have issued guideli- nes for monitoring and managing social media sites usage (see Resources & Links section). For example, if you don’t have an archiving system, you could be in trouble trying to find something you posted. Loading archived version All media types (Flash, photos, videos, posts...) are preserved in their native format NYTimes newspaper on Facebook All links are clickable. Browse the archived pages, play videos, load images... CAMA® in action: archived (05/17/2011) version of NYTimes newspaper on Facebook According to Facebook4: « Currently, you can only search for content that has been posted in the last 30 days. The range of the search history may be expanded in the future. » 4 same apply to Twitter and LinkedIn, see Archiving Social Media prepares you for e-Discovery 10
  • 12. Copyright © 2012 Aleph Archives. All Rights Reserved. Aleph Archives’s advanced web archiving platform for e-Discovery enables organizations to proactive- ly manage, search for, identify and preserve any social media content. CAMA® enables organizations to take advantage of the power and business value of social networks, while ensuring FRCP, and re- gulatory compliance. Unique Selling Proposition The main competitive advantages of the CAMA® platform are: • superior technology to capture multiple web formats in dynamic websites, • more comprehensive web archiving process with crawl engineering experts, • high-quality archive accessibility and rendering, • Universal Archives View (UAW) independent from OSes and browser types or versions, • optimized fulltext search engine tailored to very large web archive collections (billions of documents), • deduplicated full-text search results in real-time, • daily archiving capabilities, • support of WARC ISO file format, • dedicated quality assurance teams and processes, • ability to be deployed over commodity machines, • fault tolerant software design, • high availability 5 CAMA® is the only solution in the market capable of running without Internet connexion while accessing the archives and also being able to be fully deployed « In-House » (i.e inside the cus- tomer’s infrastructure). The « In-House » solution offers you the freedom of exploiting the potential of CAMA® (training required). DISASTER & DATA RECOVERY « Your data safe and secure » Aleph Archives’s “retention service” includes shadow copies of your archived data in a geographically distinct locations (USA, Canada, Switzerland, France). This means that two copies of your web archives exist at any given time to provide high data availability and avoid data loss. 5 See our Service Legal Agreement (SLA) 11
  • 13. Copyright © 2012 Aleph Archives. All Rights Reserved. Pricing Model Cloud-based solution This section describes the implementation process for Aleph’s enterprise web archiving service and the pricing for the Set Up phases and for the provision of archive services thereafter. Aleph may calculate the fees using one of two methods of estimation. 1. Where requirements are not fully defined, a simple overall price can be provided, which will be based on the size and scope of the archive policy in broad terms. A breakdown of these fees may be provided for transparency. 2. Where requirements are more fully defined, a more rigorous approach to estimating fees may be used. This will provide a price per URL (i.e archived resource), which will be more accurate than the simple overall price, in that it is based on the specifics of an archive strategy defined by the more de- tailed requirements. Three parameters are involved here: the scope, the frequency, and the price per URL. • The scope defines which URLs are "in" a particular crawl: the list of URLs the customer would like to archive. • The archiving frequency for each scope can vary from daily, to weekly, to monthly to quarterly, to annually. Aleph Archives is the only web archiving company offering a daily archiving service. • The price per URL is composed of: ‣ System administration charges; ‣ Archiving services fees; ‣ Infrastructure and storage costs (retention, data integrity, data security, etc.). InHouse solution All interested customers in the InHouse version of CAMA® are welcome to contact us for a quote. 12
  • 14. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX A. Web Archiving Policy A web archiving policy is the only means of creating and maintaining a stable, time-structured, verifia- bly authentic and independent version of the corporate web presence. « Independent » means that access to the content must be possible without requiring the original CMS version to be installed, configured and running. Having a web archiving policy is the only way the corporate Web-publishing infrastructure can evolve without threatening accessibility to legacy content. It is also the only way to avoid the continuous licensing and maintenance costs of legacy CMSs. A substantial and enduring web archive can be achieved by generating a flat, stable and time-struc- tred version of the published content, capturing authentic snapshots according to the corporate ar- chiving policy. These snapshots must be taken as user-centric views of the content, i.e. accurately reflecting the user’s experience of that particular content. In addition they must be stored and made accessible in precisely the same form, thereby meeting legal and compliance requirements as authen- tic copies. And they must enable discovery using familiar web paradigms such as full-text search, as well as more sophisticated e-discovery techniques including metadata, tagging, filters and complex search. A1. How to choose your web archiving solution? Web archiving has made significant progress during the last five to seven years. It now offers a choice of approach to both policy and supporting technology. These choices should be considered carefully against business objectives before the decision is made. The main differences lie in the capture and access methods used. Three different methods exist to capture and archive web content: a. client-side archiving b. transaction archiving c. server-side archiving 13
  • 15. Copyright © 2012 Aleph Archives. All Rights Reserved. A2. Client-side Archiving « Client-side archiving » uses an archival crawler, derived from search engine crawler technologies, with significant enhancements to ensure that complex and hard-to-reach content can be found and captured, as well as stored without change. Starting from seed pages or entry points, these tools au- tomatically capture pages and parse them to extract all links. The process repeats and continues as long as newly discovered pages remain within the scope defined for the crawl. The captured web content and embedded files are stored unchanged — original and authentic copies, an exact equiva- lent of what the generic user would have received in their browser at the time — and preserved in a flat, standards-based and self- contained file format that can be confidently considered as future- proof. This is especially important within a legal context. To be effective this method requires a crawler with excellent link extraction and path-finding algorithms that can work in a wide range of circumstances and site/page designs. In addition to client-side archi- ving, there are two alternative methods to capture web content. Both methods need to be operated from the server-side; require prior authorisation to services; and need access to both front-end and back-end servers. A3. Transaction Archiving The first of these alternative methods, called « transaction archiving », consists of the systematic cap- ture and archiving of all browser/server exchanges (request/response pairs), resulting from the interac- tion of users with sites, regardless of their content type and how they are produced. Transaction archiving enables tracking and recording of every actual instantiation of content in an au- thentic flat HTML form, easy to maintain and preserve over time. Moreover, it can be used to archive hidden web content, provided this content is requested, i.e. read, by the websites’ users during the capture time. However, transaction archiving generates unnecessary duplicates of frequently-visited pages and rai- ses serious privacy concerns as the method implicitly relies on usage tracking. 14
  • 16. Copyright © 2012 Aleph Archives. All Rights Reserved. A4. Server-side Archiving The second, and more obvious, alternative to client side archiving is « server-side archiving ». This consists of directly copying files in the document folders to back-up servers. Although it might appear to be the simplest approach, it is in fact seriously flawed, from both the preservation and archive ac- cess points of view. To make certain that any web content archived using this method can be properly restored, server- side archiving requires that all original CMSs, databases and other software are archived alongside the content or are actively maintained in an operational state; or that the content is migrated to newer CMSs, databases, etc. In any case, these activities will be required for the whole period of archive re- tention. Interestingly, IT backups essentially rely on this method in almost all cases, systematically fai- ling to meet long-term preservation and ac- cess capabilities that are essential for legal and com- pliance requirements. However, for some types of hidden-web content, this method can prove to be useful, mainly in situations where it is required to archive parts of websites that a client-side crawler cannot reach. A5. Comparison of Content Capture Methods The following table summarises the main content capture methods, where: ✔ = fully supported and ● = possible/custom development. Server-side Transaction Client-side Content captured as user sees it, unchanged, and authentic ✔ ✔ Archive access independent of original publishing technology ✔ ✔ Able to capture interactive or query based content ✔ ✔ ● Retains web URL space (not dependent on server link mapping) ✔ ✔ De-duplication possible ● ✔ Easily directed and scheduled capture ✔ ✔ Flexible archival scope, for a wide range of needs ✔ ✔ Able to capture browser/server exchanges (request/response pairs) ✔ Web server technology independence ✔ Archiving services can be centralized in one place ✔ Cost effective and efficient operations over time ✔ In most cases client-side archiving is the best approach for capturing content. The quality of the resul- ting archive will depend mainly on the capabilities of the crawler, particularly with respect to link ex- traction, even when links are encoded in scripts and executables. This is one of the key determinants for capture of all files in a consistent and timely manner. 15
  • 17. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX B. Accessing your Web Archives Two different methods exist to provide access to archives: a. website-copier approach b. Web-served approach The choice is largely determined by how the files are stored. This is critically important, because web URLs use different naming conventions to file systems, with different permissible and reserved cha- racters, escaping rules, case sensitivity, etc. B1. Website-copier Approach Website copiers write all captured files directly to disk, and therefore need to modify names and links as they are stored in order to make the archive accessible. This results in an archive that is not an au- thentic version of the original server’s response stream. B2. Web-served Approach Archive web servers, on the other hand, store responses from the original server unchanged in con- tainer files. This ensures the content and server response stream are kept in an authentic form. The emerging standard for web archive container files is WARC6 — the Web ARChiving file format — ISO standard ISO/DIS 28500. It is already being adopted as the foundation for web archive storage and preservation. A WARC file records the sequence of harvested web files captured by the crawler, each page preceded by a header containing metadata that briefly describes the harvested content, its length and checksum. WARC ensures the preservation of the original naming scheme and linking, thereby providing archive storage of content in an authentic form, as well as providing the means for additional integrity checks during the entire period of custodianship. 6 WARC file ISO format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717 16
  • 18. Copyright © 2012 Aleph Archives. All Rights Reserved. B3. Comparison of Access Methods The following table summarises the main archive access methods, where: ✔ = fully supported and ● = possible/custom development. Website Copy Web-served Archive Searchable ✔ ✔ Browsable ✔ ✔ Content directly navigable from disk ✔ ● Content stored and accessed unchanged, and authentic ✔ Links independent of naming conventions ✔ Storage and preservation of metadata ✔ ✔ Access independent of file location ✔ Standards-based archives ✔ There is a consensus today that the website-copier approach has serious limitations concerning au- thenticity of the archive, whereas the Web-served approach can ensure authenticity by design. In pro- fessional use therefore, especially where legal and regulatory obligations are business priorities, the Web-served approach is a necessity. 17
  • 19. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX C. Web Archiving as a Best Practice The web has matured into a central communication channel for businesses and government agencies, with digital media (websites and other web-based content) all but replacing print media as the primary mode of communication with customers, constituents, prospects, investors, and others. Organizations using the web must keep accurate records of web content — online communication is just as much of a liability as any other form of communication. As a recent case ruled: « This Court sees no reason to treat websites differently than other electronic files. » Web archiving has become a best practice for any organization using the web to communicate. Organi- zations who neglect to retain accurate records of their web presence are placing themselves at unne- cessary risk, both from a compliance and litigation standpoint. Protect your organization by regularly archiving web content with Aleph Archives Web Archiving Plat- form CAMA®. We provide all the technology and services you need to archive your websites and web presence from any domain. 18
  • 20. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX D. ALEPH ARCHIVES’s CAMA® PLATFORM ARCHITECTURE OVERVIEW APPENDIX E. More details about the architecture internals are available upon request. 19
  • 21. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX E. Elements of a Web Archiving Plan Setup Aleph Archives runs, tests, and calibrates the CAMA® robots to get the best rules in order to capture your website(s) with the highest quality. Capture The cost related to website crawl and engineering of the target URL’s on a specified frequency. Retention The cost of annual storage and retaining archives of target websites. Standard plan calls 7 years re- tention. Operation Includes the maintaining the designated servers and machines up and running for CAMA®, archives access, retention, and quality assurance. Quality Assurance (QA) - QA Level 1: we check and verify one level deeper (depth 1) from website root (i.e home page). - QA Level 2: we check and verify two levels deep from the root, and so on accordingly with QA Level 3 and QA Level 4. QA can go as far down in website depth as the client needs. In industry practice, QA Level 4 is sufficient for most enterprises for regulatory compliance, legal and operations purposes. - Exhaustive QA: we check and verify all designated website's and levels, verifying every page to the website’s full depth. Exhaustive QA may be cost prohibitive, depending on the customer’s requirements. Upon request, Aleph Archives will provide price quotation for Exhaustive QA. - Mixed QA: we combine a sampled QA per website level with an exhaustive QA to a certain level. 20
  • 22. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX F. Aleph Archives provides the following CAMA® Plans: FEATURE PROFESSIONAL ENTERPRISE PREMIUM Crawl engineering team ✔ ✔ ✔ WARC format (ISO 28500:2009) compliance ✔ ✔ ✔ Scheduled crawls ✔ ✔ ✔ Archives summary pane ✔ ✔ ✔ Document format handling (HTML, Word, Power- ✔ ✔ ✔ Point, PDF, Flash …) Full text search standard advanced advanced Full text search history ✔ ✔ ✔ Full text search queries import & export ✔ ✔ ✔ Automatic language detection ✔ ✔ ✔ Documents metadata extraction and indexing ✔ ✔ ✔ Infinite archives retention ✔ ✔ ✔ ARC to WARC batch migration ✔ ✔ ✔ WARC to WARC batch conversion ✔ ✔ ✔ Archives verification and repair tools ✔ ✔ ✔ Text summarizer ✔ ✔ ✔ Audit trails identification and traceability ✔ ✔ Deduplicated full text search ✔ ✔ Archived resources export (PDF, PNG) ✔ ✔ Multi-core aware archives servers ✔ ✔ Archives redundancy ✔ ✔ Load balancing for archives access ✔ ✔ Antivirus checker ✔ ✔ Trusted archives (digital signatures) ✔ ✔ SEC 17a-4 and FINRA compliance ✔ ✔ Secured archives access (SSL Encryption) ✔ ✔ Multilanguage instant translator ✔ ✔ Custom Branding ✔ ✔ Archives compression ✔ ✔ Archived data processing and management ✔ 21
  • 23. Copyright © 2012 Aleph Archives. All Rights Reserved. FEATURE PROFESSIONAL ENTERPRISE PREMIUM CAMA® Appliance ✔ CAMA® Appliance on USB pen drive ✔ CAMA® Kit (Access API) ✔ CAMA® 64bits ✔ Quality Assurance team (level) basic medium high Custom metadata limit 30 unlimited unlimited Collections limit 100 unlimited unlimited Accounts limit 10 unlimited unlimited Crawled resources per month up to 500K up to 5M unlimited Archived resources per month up to 500GB up to 1TB up to 2TB A « Custom Plan » is also available via an online form which allows customers to choose product fea- tures that best suit their needs. 22
  • 24. Copyright © 2012 Aleph Archives. All Rights Reserved. RESOURCES & LINKS ☞ Aleph Archives - Website - Products demo ☞ Records Management Finance - FINRA Regulation Notices - FINRA Guidance - FINRA Regulatory Notice 10-06 on Social Media - Summary of NASD Rule 3110 — Books and Records - Federal Rules of Evidence 901 — Data Integrity & Authenticity - SEC — Division of Trading and Markets - SEC — Division of Investment Management - SEC Rule 17 a-4 — Books and Records - Sarbanes-Oxley Act (SOX) - Financial Services Authority (FSA) Handbook (Europe) - FSA Handbook Section 3.2 — see Records Requirements, Sec 3.2.20 (Europe) - Model Requirements for the Management of Electronic Records (MoReq) (Europe) Food and Drug Administration - Federal Rules of Evidence 901 — Data Integrity & Authenticity - FDA Guidance Documents — Food - FDA Compliance & Enforcement – Food - FDA Guidance Documents — Drugs - Code of Federal Regulations (CFR) Title 21 - Model Requirements for the Management of Electronic Records (MoReq) (Europe) - Pharma Social Media Wiki - FDASM (Everything About the FDA, Internet, Social Media) 23