SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Ensuring Data Storage Security in Cloud Computing
                   Cong Wang, Qian Wang, and Kui Ren                                          Wenjing Lou
                              Department of ECE                                          Department of ECE
                        Illinois Institute of Technology                             Worcester Polytechnic Institute
                   Email: {cwang, qwang, kren}@ece.iit.edu                            Email: wjlou@ece.wpi.edu




   Abstract—Cloud Computing has been envisioned as the next-            number of reasons. Firstly, traditional cryptographic primitives
generation architecture of IT Enterprise. In contrast to tradi-         for the purpose of data security protection can not be directly
tional solutions, where the IT services are under proper physical,      adopted due to the users’ loss control of data under Cloud
logical and personnel controls, Cloud Computing moves the
application software and databases to the large data centers,           Computing. Therefore, verification of correct data storage
where the management of the data and services may not be                in the cloud must be conducted without explicit knowledge
fully trustworthy. This unique attribute, however, poses many           of the whole data. Considering various kinds of data for
new security challenges which have not been well understood.            each user stored in the cloud and the demand of long term
In this article, we focus on cloud data storage security, which         continuous assurance of their data safety, the problem of
has always been an important aspect of quality of service. To
ensure the correctness of users’ data in the cloud, we propose an       verifying correctness of data storage in the cloud becomes
effective and flexible distributed scheme with two salient features,     even more challenging. Secondly, Cloud Computing is not just
opposing to its predecessors. By utilizing the homomorphic token        a third party data warehouse. The data stored in the cloud
with distributed verification of erasure-coded data, our scheme          may be frequently updated by the users, including insertion,
achieves the integration of storage correctness insurance and data      deletion, modification, appending, reordering, etc. To ensure
error localization, i.e., the identification of misbehaving server(s).
Unlike most prior works, the new scheme further supports secure         storage correctness under dynamic data update is hence of
and efficient dynamic operations on data blocks, including: data         paramount importance. However, this dynamic feature also
update, delete and append. Extensive security and performance           makes traditional integrity insurance techniques futile and
analysis shows that the proposed scheme is highly efficient and          entails new solutions. Last but not the least, the deployment
resilient against Byzantine failure, malicious data modification         of Cloud Computing is powered by data centers running in
attack, and even server colluding attacks.
                                                                        a simultaneous, cooperated and distributed manner. Individual
                      I. I NTRODUCTION                                  user’s data is redundantly stored in multiple physical loca-
   Several trends are opening up the era of Cloud Computing,            tions to further reduce the data integrity threats. Therefore,
which is an Internet-based development and use of computer              distributed protocols for storage correctness assurance will be
technology. The ever cheaper and more powerful processors,              of most importance in achieving a robust and secure cloud data
together with the software as a service (SaaS) computing archi-         storage system in the real world. However, such important area
tecture, are transforming data centers into pools of computing          remains to be fully explored in the literature.
service on a huge scale. The increasing network bandwidth and              Recently, the importance of ensuring the remote data
reliable yet flexible network connections make it even possible          integrity has been highlighted by the following research
that users can now subscribe high quality services from data            works [3]–[7]. These techniques, while can be useful to ensure
and software that reside solely on remote data centers.                 the storage correctness without having users possessing data,
   Moving data into the cloud offers great convenience to               can not address all the security threats in cloud data storage,
users since they don’t have to care about the complexities              since they are all focusing on single server scenario and
of direct hardware management. The pioneer of Cloud Com-                most of them do not consider dynamic data operations. As
puting vendors, Amazon Simple Storage Service (S3) and                  an complementary approach, researchers have also proposed
Amazon Elastic Compute Cloud (EC2) [1] are both well                    distributed protocols [8]–[10] for ensuring storage correct-
known examples. While these internet-based online services              ness across multiple servers or peers. Again, none of these
do provide huge amounts of storage space and customizable               distributed schemes is aware of dynamic data operations.
computing resources, this computing platform shift, however,            As a result, their applicability in cloud data storage can be
is eliminating the responsibility of local machines for data            drastically limited.
maintenance at the same time. As a result, users are at the                In this paper, we propose an effective and flexible distributed
mercy of their cloud service providers for the availability and         scheme with explicit dynamic data support to ensure the
integrity of their data. Recent downtime of Amazon’s S3 is              correctness of users’ data in the cloud. We rely on erasure-
such an example [2].                                                    correcting code in the file distribution preparation to provide
   From the perspective of data security, which has always              redundancies and guarantee the data dependability. This con-
been an important aspect of quality of service, Cloud Com-              struction drastically reduces the communication and storage
puting inevitably poses new challenging security threats for            overhead as compared to the traditional replication-based file
distribution techniques. By utilizing the homomorphic token                                                              S
                                                                                                                     Me s e cu rity
                                                                                                                         sag e
with distributed verification of erasure-coded data, our scheme                             rit y low
                                                                                         cu F
                                                                                                                                F low

                                                                                       Se age
achieves the storage correctness insurance as well as data                            Mes
                                                                                          s Optional Third Party Auditor
                                                                                                                                                    Cloud
                                                                                                                                                   Storage
error localization: whenever data corruption has been detected                                                                                     Servers

during the storage correctness verification, our scheme can
                                                                                                    Data Flow
almost guarantee the simultaneous localization of data errors,                Users

i.e., the identification of the misbehaving server(s).                                                       essage Flow
                                                                                                  Security M
   Our work is among the first few ones in this field to consider                                                                         Cloud Service Provider

distributed data storage in Cloud Computing. Our contribution
can be summarized as the following three aspects:                                Fig. 1: Cloud data storage architecture
1) Compared to many of its predecessors, which only provide
binary results about the storage state across the distributed
servers, the challenge-response protocol in our work further        these operations we are considering are block update, delete,
provides the localization of data error.                            insert and append.
2) Unlike most prior works for ensuring remote data integrity,         As users no longer possess their data locally, it is of critical
the new scheme supports secure and efficient dynamic opera-          importance to assure users that their data are being correctly
tions on data blocks, including: update, delete and append.         stored and maintained. That is, users should be equipped with
                                                                    security means so that they can make continuous correctness
3) Extensive security and performance analysis shows that
                                                                    assurance of their stored data even without the existence of
the proposed scheme is highly efficient and resilient against
                                                                    local copies. In case that users do not necessarily have the
Byzantine failure, malicious data modification attack, and even
                                                                    time, feasibility or resources to monitor their data, they can
server colluding attacks.
                                                                    delegate the tasks to an optional trusted TPA of their respective
   The rest of the paper is organized as follows. Section II
                                                                    choices. In our model, we assume that the point-to-point
introduces the system model, adversary model, our design goal
                                                                    communication channels between each cloud server and the
and notations. Then we provide the detailed description of our
                                                                    user is authenticated and reliable, which can be achieved in
scheme in Section III and IV. Section V gives the security
                                                                    practice with little overhead. Note that we don’t address the
analysis and performance evaluations, followed by Section VI
                                                                    issue of data privacy in this paper, as in Cloud Computing,
which overviews the related work. Finally, Section VII gives
                                                                    data privacy is orthogonal to the problem we study here.
the concluding remark of the whole paper.
                                                                    B. Adversary Model
                  II. P ROBLEM S TATEMENT
                                                                       Security threats faced by cloud data storage can come
A. System Model                                                     from two different sources. On the one hand, a CSP can
   A representative network architecture for cloud data storage     be self-interested, untrusted and possibly malicious. Not only
is illustrated in Figure 1. Three different network entities can    does it desire to move data that has not been or is rarely
be identified as follows:                                            accessed to a lower tier of storage than agreed for monetary
   • User: users, who have data to be stored in the cloud and       reasons, but it may also attempt to hide a data loss incident
      rely on the cloud for data computation, consist of both       due to management errors, Byzantine failures and so on.
      individual consumers and organizations.                       On the other hand, there may also exist an economically-
   • Cloud Service Provider (CSP): a CSP, who has signif-           motivated adversary, who has the capability to compromise
      icant resources and expertise in building and managing        a number of cloud data storage servers in different time
      distributed cloud storage servers, owns and operates live     intervals and subsequently is able to modify or delete users’
      Cloud Computing systems.                                      data while remaining undetected by CSPs for a certain period.
   • Third Party Auditor (TPA): an optional TPA, who has            Specifically, we consider two types of adversary with different
      expertise and capabilities that users may not have, is        levels of capability in this paper:
      trusted to assess and expose risk of cloud storage services   Weak Adversary: The adversary is interested in corrupting the
      on behalf of the users upon request.                          user’s data files stored on individual servers. Once a server is
   In cloud data storage, a user stores his data through a CSP      comprised, an adversary can pollute the original data files by
into a set of cloud servers, which are running in a simulta-        modifying or introducing its own fraudulent data to prevent
neous, cooperated and distributed manner. Data redundancy           the original data from being retrieved by the user.
can be employed with technique of erasure-correcting code           Strong Adversary: This is the worst case scenario, in which
to further tolerate faults or server crash as user’s data grows     we assume that the adversary can compromise all the storage
in size and importance. Thereafter, for application purposes,       servers so that he can intentionally modify the data files as
the user interacts with the cloud servers via CSP to access or      long as they are internally consistent. In fact, this is equivalent
retrieve his data. In some cases, the user may need to perform      to the case where all servers are colluding together to hide a
block level operations on his data. The most general forms of       data loss or corruption incident.
C. Design Goals                                                      integrated with the verification of erasure-coded data [8] [12].
   To ensure the security and dependability for cloud data           Subsequently, it is also shown how to derive a challenge-
storage under the aforementioned adversary model, we aim             response protocol for verifying the storage correctness as well
to design efficient mechanisms for dynamic data verification           as identifying misbehaving servers. Finally, the procedure for
and operation and achieve the following goals: (1) Storage           file retrieval and error recovery based on erasure-correcting
correctness: to ensure users that their data are indeed stored       code is outlined.
appropriately and kept intact all the time in the cloud. (2)
                                                                     A. File Distribution Preparation
Fast localization of data error: to effectively locate the mal-
functioning server when data corruption has been detected. (3)          It is well known that erasure-correcting code may be used
Dynamic data support: to maintain the same level of storage          to tolerate multiple failures in distributed storage systems. In
correctness assurance even if users modify, delete or append         cloud data storage, we rely on this technique to disperse the
their data files in the cloud. (4) Dependability: to enhance data     data file F redundantly across a set of n = m + k distributed
availability against Byzantine failures, malicious data modifi-       servers. A (m + k, k) Reed-Solomon erasure-correcting code
cation and server colluding attacks, i.e. minimizing the effect      is used to create k redundancy parity vectors from m data
brought by data errors or server failures. (5) Lightweight:          vectors in such a way that the original m data vectors can be
to enable users to perform storage correctness checks with           reconstructed from any m out of the m + k data and parity
minimum overhead.                                                    vectors. By placing each of the m + k vectors on a different
                                                                     server, the original data file can survive the failure of any k of
D. Notation and Preliminaries                                        the m+k servers without any data loss, with a space overhead
  •   F – the data file to be stored. We assume that F can be         of k/m. For support of efficient sequential I/O to the original
      denoted as a matrix of m equal-sized data vectors, each        file, our file layout is systematic, i.e., the unmodified m data
      consisting of l blocks. Data blocks are all well represented   file vectors together with k parity vectors is distributed across
      as elements in Galois Field GF (2p ) for p = 8 or 16.          m + k different servers.
  •   A – The dispersal matrix used for Reed-Solomon coding.            Let F = (F1 , F2 , . . . , Fm ) and Fi = (f1i , f2i , . . . , fli )T
  •   G – The encoded file matrix, which includes a set of            (i ∈ {1, . . . , m}), where l ≤ 2p − 1. Note all these blocks
      n = m + k vectors, each consisting of l blocks.                are elements of GF (2p ). The systematic layout with parity
  •   fkey (·) – pseudorandom function (PRF), which is defined        vectors is achieved with the information dispersal matrix A,
      as f : {0, 1}∗ × key → GF (2p ).                               derived from an m × (m + k) Vandermonde matrix [13]:
  •   φkey (·) – pseudorandom permutation (PRP), which is                                                                            
                                                                              1        1     ...      1       1     ...      1
      defined as φ : {0, 1}log2 (l) × key → {0, 1}log2 (l) .             β1
                                                                                      β2    ...     βm     βm+1 . . .     βn        
  •   ver – a version number bound with the index for individ-                                                                       ,
                                                                              .
                                                                              .         .
                                                                                        .    ..        .
                                                                                                       .      .
                                                                                                              .     ..       .
                                                                                                                             .
      ual blocks, which records the times the block has been                 .         .       .      .      .        .     .        
      modified. Initially we assume ver is 0 for all data blocks.            m−1      m−1               m−1         m−1           m−1
                                                                           β1       β2          . . . βm          βm+1    . . . βn
  •   sver – the seed for PRF, which depends on the file name,
       ij
      block index i, the server position j as well as the optional   where βj (j ∈ {1, . . . , n}) are distinct elements randomly
      block version number ver.                                      picked from GF (2p ).
                                                                       After a sequence of elementary row transformations, the
           III. E NSURING C LOUD DATA S TORAGE                       desired matrix A can be written as
   In cloud data storage system, users store their data in                                                                   
                                                                                     1 0 . . . 0 p11 p12 . . . p1k
the cloud and no longer possess the data locally. Thus, the                         0 1 . . . 0 p21 p22 . . . p2k 
                                                                                                                             
correctness and availability of the data files being stored on the    A = (I|P) =  . . .            .    .      .   ..     . ,
distributed cloud servers must be guaranteed. One of the key                        . .
                                                                                      . .       .. ..    .
                                                                                                         .      .
                                                                                                                .      .   . 
                                                                                                                           .
issues is to effectively detect any unauthorized data modifica-                       0 0 . . . 1 pm1 pm2 . . . pmk
tion and corruption, possibly due to server compromise and/or        where I is a m × m identity matrix and P is the secret parity
random Byzantine failures. Besides, in the distributed case          generation matrix with size m × k. Note that A is derived
when such inconsistencies are successfully detected, to find          from a Vandermonde matrix, thus it has the property that any
which server the data error lies in is also of great significance,    m out of the m + k columns form an invertible matrix.
since it can be the first step to fast recover the storage errors.       By multiplying F by A, the user obtains the encoded file:
   To address these problems, our main scheme for ensuring
cloud data storage is presented in this section. The first part of     G= F·A =            (G(1) , G(2) , . . . , G(m) , G(m+1) , . . . , G(n) )
the section is devoted to a review of basic tools from coding                        =    (F1 , F2 , . . . , Fm , G(m+1) , . . . , G(n) ),
theory that are needed in our scheme for file distribution across
                                                                                          (j)    (j)        (j)
cloud servers. Then, the homomorphic token is introduced.            where G(j) = (g1 , g2 , . . . , gl )T (j ∈ {1, . . . , n}). As
The token computation function we are considering belongs            noticed, the multiplication reproduces the original data file
to a family of universal hash function [11], chosen to pre-          vectors of F and the remaining part (G(m+1) , . . . , G(n) ) are
serve the homomorphic properties, which can be perfectly             k parity vectors generated based on F.
Algorithm 1 Token Pre-computation                                        Algorithm 2 Correctness Verification and Error Localization
 1: procedure                                                             1: procedure C HALLENGE (i)
                                                                                                                    (i)
 2:    Choose parameters l, n and function f, φ;                          2:    Recompute αi = fkchal (i) and kprp from KP RP ;
                                                                                               (i)
 3:    Choose the number t of tokens;                                     3:    Send {αi , kprp } to all the cloud servers;
 4:    Choose the number r of indices per verification;                    4:    Receive from servers:
                                                                                    (j)
 5:    Generate master key Kprp and challenge kchal ;                           {Ri = r αq ∗ G(j) [φk(i) (q)]|1 ≤ j ≤ n}
                                                                                              q=1 i
       for vector G(j) , j ← 1, n do
                                                                                                               prp
 6:                                                                       5:    for (j ← m + 1, n) do
 7:        for round i← 1, t do                                           6:         R(j) ← R(j) − r fkj (sIq ,j )·αq , Iq = φk(i) (q)
                                           (i)                                                            q=1             i            prp
 8:            Derive αi = fkchal (i) and kprp from KP RP .               7:    end for
                            (j)    r
 9:            Compute vi = q=1 αq ∗ G(j) [φk(i) (q)]
                                        i                                 8:
                                                                                        (1)         (m)
                                                                                if ((Ri , . . . , Ri ) · P==(Ri
                                                                                                                  (m+1)            (n)
                                                                                                                        , . . . , Ri )) then
                                                   prp
10:        end for                                                        9:         Accept and ready for the next challenge.
11:    end for                                                           10:    else
12:    Store all the vi s locally.                                       11:         for (j ← 1, n) do
13: end procedure                                                                               (j)     (j)
                                                                         12:              if (Ri ! =vi ) then
                                                                         13:                  return server j is misbehaving.
                                                                         14:              end if
B. Challenge Token Precomputation                                        15:         end for
   In order to achieve assurance of data storage correctness             16:    end if
                                                                         17: end procedure
and data error localization simultaneously, our scheme entirely
relies on the pre-computed verification tokens. The main idea
is as follows: before file distribution the user pre-computes a
certain number of short verification tokens on individual vector          user stores them locally to obviate the need for encryption
G(j) (j ∈ {1, . . . , n}), each token covering a random subset           and lower the bandwidth overhead during dynamic data op-
of data blocks. Later, when the user wants to make sure the              eration which will be discussed shortly. The details of token
storage correctness for the data in the cloud, he challenges             generation are shown in Algorithm 1.
                                                                           Once all tokens are computed, the final step before
the cloud servers with a set of randomly generated block                                                                        (j)
indices. Upon receiving challenge, each cloud server computes            file distribution is to blind each parity block gi          in
a short “signature” over the specified blocks and returns them            (G(m+1) , . . . , G(n) ) by
to the user. The values of these signatures should match the                           (j)         (j)
                                                                                     gi      ← gi        + fkj (sij ), i ∈ {1, . . . , l},
corresponding tokens pre-computed by the user. Meanwhile,
as all servers operate over the same subset of the indices, the          where kj is the secret key for parity vector G(j) (j ∈ {m +
requested response values for integrity check must also be a             1, . . . , n}). This is for protection of the secret matrix P. We
valid codeword determined by secret matrix P.                            will discuss the necessity of using blinded parities in detail
   Suppose the user wants to challenge the cloud servers t               in Section V. After blinding the parity information, the user
times to ensure the correctness of data storage. Then, he                disperses all the n encoded vectors G(j) (j ∈ {1, . . . , n})
must pre-compute t verification tokens for each G(j) (j ∈                 across the cloud servers S1 , S2 , . . . , Sn .
{1, . . . , n}), using a PRF f (·), a PRP φ(·), a challenge key          C. Correctness Verification and Error Localization
kchal and a master permutation key KP RP . To generate the                  Error localization is a key prerequisite for eliminating errors
ith token for server j, the user acts as follows:                        in storage systems. However, many previous schemes do not
   1) Derive a random challenge value αi of GF (2p ) by αi =             explicitly consider the problem of data error localization,
                                          (i)
        fkchal (i) and a permutation key kprp based on KP RP .           thus only provide binary results for the storage verification.
   2) Compute the set of r randomly-chosen indices:                      Our scheme outperforms those by integrating the correctness
                                                                         verification and error localization in our challenge-response
         {Iq ∈ [1, ..., l]|1 ≤ q ≤ r}, where Iq = φk(i) (q).
                                                             prp         protocol: the response values from servers for each challenge
  3) Calculate the token as:                                             not only determine the correctness of the distributed storage,
                       r
                                                                         but also contain information to locate potential data error(s).
           (j)                                                     (j)      Specifically, the procedure of the i-th challenge-response for
          vi      =         αq ∗ G(j) [Iq ], where G(j) [Iq ] = gIq .
                             i
                      q=1
                                                                         a cross-check over the n servers is described as follows:
                                                                            1) The user reveals the αi as well as the i-th permutation
            (j)                                                                      (i)
Note that vi , which is an element of GF (2p ) with small                      key kprp to each servers.
size, is the response the user expects to receive from server j             2) The server storing vector G(j) aggregates those r rows
                                                                                                      (i)
when he challenges it on the specified data blocks.                             specified by index kprp into a linear combination
   After token generation, the user has the choice of either                                                r
                                                                                               (j)
keeping the pre-computed tokens locally or storing them in                                    Ri     =           αq ∗ G(j) [φk(i) (q)].
                                                                                                                  i             prp
encrypted form on the cloud servers. In our case here, the                                                 q=1
(j)                                            Algorithm 3 Error Recovery
  3) Upon receiving Ri s from all the servers, the user takes
     away blind values in R(j) (j ∈ {m + 1, . . . , n}) by                       1: procedure
                             r                                                      % Assume the block corruptions have been detected
        (j)        (j)                                                              among
      Ri      ← Ri −               fkj (sIq ,j ) · αq , where Iq = φk(i) (q).
                                                    i
                           q=1
                                                                         prp
                                                                                    % the specified r rows;
                                                                                    % Assume s ≤ k servers have been identified misbehaving
  4) Then the user verifies whether the received values re-
                                                                                 2:    Download r rows of blocks from servers;
     main a valid codeword determined by secret matrix P:
                                                                                 3:    Treat s servers as erasures and recover the blocks.
                 (1)             (m)        ?     (m+1)            (n)           4:    Resend the recovered blocks to corresponding servers.
              (Ri , . . . , Ri         ) · P = (Ri        , . . . , Ri ).
                                                                                 5: end procedure
   Because all the servers operate over the same subset of
indices, the linear aggregation of these r specified rows
   (1)         (n)
(Ri , . . . , Ri ) has to be a codeword in the encoded file                      operations of update, delete and append to modify the data file
matrix. If the above equation holds, the challenge is passed.                   while maintaining the storage correctness assurance.
Otherwise, it indicates that among those specified rows, there                      The straightforward and trivial way to support these opera-
exist file block corruptions.                                                    tions is for user to download all the data from the cloud servers
   Once the inconsistency among the storage has been success-                   and re-compute the whole parity blocks as well as verification
fully detected, we can rely on the pre-computed verification                     tokens. This would clearly be highly inefficient. In this section,
tokens to further determine where the potential data error(s)                   we will show how our scheme can explicitly and efficiently
                                  (j)
lies in. Note that each response Ri is computed exactly in the                  handle dynamic data operations for cloud data storage.
                       (j)
same way as token vi , thus the user can simply find which
server is misbehaving by verifying the following n equations:                   A. Update Operation
                         (j) ?     (j)
                                                                                   In cloud data storage, sometimes the user may need to
                    Ri      = vi , j ∈ {1, . . . , n}.                          modify some data block(s) stored in the cloud, from its
Algorithm 2 gives the details of correctness verification and                    current value fij to a new one, fij + ∆fij . We refer this
error localization.                                                             operation as data update. Due to the linear property of Reed-
                                                                                Solomon code, a user can perform the update operation and
D. File Retrieval and Error Recovery                                            generate the updated parity blocks by using ∆fij only, without
   Since our layout of file matrix is systematic, the user can                   involving any other unchanged blocks. Specifically, the user
reconstruct the original file by downloading the data vectors                    can construct a general update matrix ∆F as
from the first m servers, assuming that they return the correct                                                                   
                                                                                                      ∆f11 ∆f12 . . . ∆f1m
response values. Notice that our verification scheme is based                                        ∆f21 ∆f22 . . . ∆f2m 
                                                                                                                                 
on random spot-checking, so the storage correctness assurance                            ∆F =  .                .     ..     .   
is a probabilistic one. However, by choosing system param-                                          .   .       .
                                                                                                                 .        .   .
                                                                                                                              .   
eters (e.g., r, l, t) appropriately and conducting enough times                                      ∆fl1 ∆fl2 . . . ∆flm
of verification, we can guarantee the successful file retrieval                                  =   (∆F1 , ∆F2 , . . . , ∆Fm ).
with high probability. On the other hand, whenever the data
                                                                                Note that we use zero elements in ∆F to denote the unchanged
corruption is detected, the comparison of pre-computed tokens
                                                                                blocks. To maintain the corresponding parity vectors as well as
and received response values can guarantee the identification
                                                                                be consistent with the original file layout, the user can multiply
of misbehaving server(s), again with high probability, which
                                                                                ∆F by A and thus generate the update information for both
will be discussed shortly. Therefore, the user can always
                                                                                the data vectors and parity vectors as follows:
ask servers to send back blocks of the r rows specified in
the challenge and regenerate the correct blocks by erasure                       ∆F · A =       (∆G(1) , . . . , ∆G(m) , ∆G(m+1) , . . . , ∆G(n) )
correction, shown in Algorithm 3, as long as there are at most                          =       (∆F1 , . . . , ∆Fm , ∆G(m+1) , . . . , ∆G(n) ),
k misbehaving servers are identified. The newly recovered
blocks can then be redistributed to the misbehaving servers                     where ∆G(j) (j ∈ {m + 1, . . . , n}) denotes the update
to maintain the correctness of storage.                                         information for the parity vector G(j) .
                                                                                   Because the data update operation inevitably affects some
  IV. P ROVIDING DYNAMIC DATA O PERATION S UPPORT                               or all of the remaining verification tokens, after preparation of
   So far, we assumed that F represents static or archived                      update information, the user has to amend those unused tokens
data. This model may fit some application scenarios, such                        for each vector G(j) to maintain the same storage correctness
as libraries and scientific datasets. However, in cloud data                     assurance. In other words, for all the unused tokens, the user
storage, there are many potential scenarios where data stored                   needs to exclude every occurrence of the old data block and
in the cloud is dynamic, like electronic documents, photos, or                  replace it with the new one. Thanks to the homomorphic
log files etc. Therefore, it is crucial to consider the dynamic                  construction of our verification token, the user can perform
case, where a user may wish to perform various block-level                      the token update efficiently. To give more details, suppose a
(j)
block G(j) [Is ], which is covered by the specific token vi , has                       To support block append operation, we need a slight modi-
been changed to G(j) [Is ] + ∆G(j) [Is ], where Is = φk(i) (s).                     fication to our token pre-computation. Specifically, we require
                                                                              prp
                                                    (j)
To maintain the usability of token vi , it is not hard to verify                    the user to expect the maximum size in blocks, denoted as
that the user can simply update it by                                               lmax , for each of his data vector. The idea of supporting
                                                                                    block append, which is similar as adopted in [7], relies on
                      (j)         (j)
                     vi     ← vi        + αs ∗ ∆G(j) [Is ],
                                           i                                        the initial budget for the maximum anticipated data size lmax
                                                                                    in each encoded data vector as well as the system parameter
without retrieving any other r − 1 blocks required in the pre-
                    (j)                                                             rmax = ⌈r ∗ (lmax /l)⌉ for each pre-computed challenge-
computation of vi .
                                                                                    response token. The pre-computation of the i-th token on
   After the amendment to the affected tokens1 , the user needs
                                     (j)                                            server j is modified as follows:
to blind the update information ∆gi for each parity block in
                                                                                                                  rmax
(∆G(m+1) , . . . , ∆G(n) ) to hide the secret matrix P by
                                                                                                           vi =          αq ∗ G(j) [Iq ],
                                                                                                                          i
             (j)            (j)
           ∆gi       ←    ∆gi      +    fkj (sver ), i
                                              ij          ∈ {1, . . . , l}.                                       q=1

Here we use a new seed sver for the PRF. The version number                         where
                           ij
ver functions like a counter which helps the user to keep                                                    G(j) [φk(i) (q)]      , [φk(i) (q)] ≤ l
track of the blind information on the specific parity blocks.                                G(j) [Iq ] =                 prp            prp


After blinding, the user sends update information to the cloud                                               0                     , [φk(i) (q)] > l
                                                                                                                                        prp

servers, which perform the update operation as                                      This formula guarantees that on average, there will be r indices
            G  (j)
                     ←G     (j)
                                  + ∆G    (j)
                                                , (j ∈ {1, . . . , n}).             falling into the range of existing l blocks. Because the cloud
                                                                                    servers and the user have the agreement on the number of
B. Delete Operation                                                                 existing blocks in each vector G(j) , servers will follow exactly
   Sometimes, after being stored in the cloud, certain data                         the above procedure when re-computing the token values upon
blocks may need to be deleted. The delete operation we are                          receiving user’s challenge request.
considering is a general one, in which user replaces the data                          Now when the user is ready to append new blocks, i.e.,
block with zero or some special reserved data symbol. From                          both the file blocks and the corresponding parity blocks are
this point of view, the delete operation is actually a special case                 generated, the total length of each vector G(j) will be increased
of the data update operation, where the original data blocks                        and fall into the range [l, lmax ]. Therefore, the user will update
can be replaced with zeros or some predetermined special                            those affected tokens by adding αs ∗ G(j) [Is ] to the old vi
                                                                                                                           i
blocks. Therefore, we can rely on the update procedure to                           whenever G(j) [Is ] = 0 for Is > l, where Is = φk(i) (s). The
                                                                                                                                           prp
support delete operation, i.e., by setting ∆fij in ∆F to be                         parity blinding is similar as introduced in update operation,
−∆fij . Also, all the affected tokens have to be modified and                        thus is omitted here.
the updated parity information has to be blinded using the
same method specified in update operation.                                           D. Insert Operation
                                                                                      An insert operation to the data file refers to an append
C. Append Operation
                                                                                    operation at the desired index position while maintaining the
   In some cases, the user may want to increase the size of                         same data block structure for the whole data file, i.e., inserting
his stored data by adding blocks at the end of the data file,                        a block F [j] corresponds to shifting all blocks starting with
which we refer as data append. We anticipate that the most                          index j + 1 by one slot. An insert operation may affect many
frequent append operation in cloud data storage is bulk append,                     rows in the logical data file matrix F, and a substantial number
in which the user needs to upload a large number of blocks                          of computations are required to renumber all the subsequent
(not a single block) at one time.                                                   blocks as well as re-compute the challenge-response tokens.
   Given the file matrix F illustrated in file distribution prepa-                    Therefore, an efficient insert operation is difficult to support
ration, appending blocks towards the end of a data file is                           and thus we leave it for our future work.
equivalent to concatenate corresponding rows at the bottom
of the matrix layout for file F. In the beginning, there are                         V. S ECURITY A NALYSIS AND P ERFORMANCE E VALUATION
only l rows in the file matrix. To simplify the presentation,                           In this section, we analyze our proposed scheme in terms
we suppose the user wants to append m blocks at the end of                          of security and efficiency. Our security analysis focuses on
file F, denoted as (fl+1,1 , fl+1,2 , ..., fl+1,m ) (We can always                   the adversary model defined in Section II. We also evaluate
use zero-padding to make a row of m elements.). With the                            the efficiency of our scheme via implementation of both file
secret matrix P, the user can directly calculate the append                         distribution preparation and verification token precomputation.
blocks for each parity server as
                                                     (m+1)           (n)
                                                                                    A. Security Strength Against Weak Adversary
           (fl+1,1 , ..., fl+1,m ) · P = (gl+1               , ..., gl+1 ).
                                                                                      1) Detection Probability against data modification: In our
  1 In practice, it is possible that only a fraction of tokens need amendment,      scheme, servers are required to operate on specified rows in
since the updated blocks may not be covered by all the tokens.                      each correctness verification for the calculation of requested
Iw qo s09 (1)
Iw qo s09 (1)
Iw qo s09 (1)

Más contenido relacionado

La actualidad más candente

Iaetsd an efficient secure scheme for multi user in cloud
Iaetsd an efficient secure scheme for multi user in cloudIaetsd an efficient secure scheme for multi user in cloud
Iaetsd an efficient secure scheme for multi user in cloudIaetsd Iaetsd
 
Proposed Model for Enhancing Data Storage Security in Cloud Computing Systems
Proposed Model for Enhancing Data Storage Security in Cloud Computing SystemsProposed Model for Enhancing Data Storage Security in Cloud Computing Systems
Proposed Model for Enhancing Data Storage Security in Cloud Computing SystemsHossam Al-Ansary
 
Excellent Manner of Using Secure way of data storage in cloud computing
Excellent Manner of Using Secure way of data storage in cloud computingExcellent Manner of Using Secure way of data storage in cloud computing
Excellent Manner of Using Secure way of data storage in cloud computingEditor IJMTER
 
Security Issues in Cloud Computing by rahul abhishek
Security Issues in Cloud Computing  by rahul abhishekSecurity Issues in Cloud Computing  by rahul abhishek
Security Issues in Cloud Computing by rahul abhishekEr. rahul abhishek
 
Data Partitioning Technique In Cloud: A Survey On Limitation And Benefits
Data Partitioning Technique In Cloud: A Survey On Limitation And BenefitsData Partitioning Technique In Cloud: A Survey On Limitation And Benefits
Data Partitioning Technique In Cloud: A Survey On Limitation And BenefitsIJERA Editor
 
Security policy enforcement in cloud infrastructure
Security policy enforcement in cloud infrastructureSecurity policy enforcement in cloud infrastructure
Security policy enforcement in cloud infrastructurecsandit
 
Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...
Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...
Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...ijsrd.com
 
Security Issues in Cloud Computing by rahul abhishek
Security Issues in Cloud Computing  by rahul abhishekSecurity Issues in Cloud Computing  by rahul abhishek
Security Issues in Cloud Computing by rahul abhishekEr. rahul abhishek
 
A Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud ComputingA Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud Computingvivatechijri
 
Ijarcet vol-2-issue-4-1405-1409
Ijarcet vol-2-issue-4-1405-1409Ijarcet vol-2-issue-4-1405-1409
Ijarcet vol-2-issue-4-1405-1409Editor IJARCET
 
Secure Redundant Data Avoidance over Multi-Cloud Architecture.
Secure Redundant Data Avoidance over Multi-Cloud Architecture. Secure Redundant Data Avoidance over Multi-Cloud Architecture.
Secure Redundant Data Avoidance over Multi-Cloud Architecture. IJCERT JOURNAL
 
ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...
ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...
ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...pharmaindexing
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detectionijsrd.com
 

La actualidad más candente (16)

Iaetsd an efficient secure scheme for multi user in cloud
Iaetsd an efficient secure scheme for multi user in cloudIaetsd an efficient secure scheme for multi user in cloud
Iaetsd an efficient secure scheme for multi user in cloud
 
Proposed Model for Enhancing Data Storage Security in Cloud Computing Systems
Proposed Model for Enhancing Data Storage Security in Cloud Computing SystemsProposed Model for Enhancing Data Storage Security in Cloud Computing Systems
Proposed Model for Enhancing Data Storage Security in Cloud Computing Systems
 
Excellent Manner of Using Secure way of data storage in cloud computing
Excellent Manner of Using Secure way of data storage in cloud computingExcellent Manner of Using Secure way of data storage in cloud computing
Excellent Manner of Using Secure way of data storage in cloud computing
 
Security Issues in Cloud Computing by rahul abhishek
Security Issues in Cloud Computing  by rahul abhishekSecurity Issues in Cloud Computing  by rahul abhishek
Security Issues in Cloud Computing by rahul abhishek
 
Data Partitioning Technique In Cloud: A Survey On Limitation And Benefits
Data Partitioning Technique In Cloud: A Survey On Limitation And BenefitsData Partitioning Technique In Cloud: A Survey On Limitation And Benefits
Data Partitioning Technique In Cloud: A Survey On Limitation And Benefits
 
Security policy enforcement in cloud infrastructure
Security policy enforcement in cloud infrastructureSecurity policy enforcement in cloud infrastructure
Security policy enforcement in cloud infrastructure
 
Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...
Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...
Effective & Flexible Cryptography Based Scheme for Ensuring User`s Data Secur...
 
L018137479
L018137479L018137479
L018137479
 
Security Issues in Cloud Computing by rahul abhishek
Security Issues in Cloud Computing  by rahul abhishekSecurity Issues in Cloud Computing  by rahul abhishek
Security Issues in Cloud Computing by rahul abhishek
 
A Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud ComputingA Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud Computing
 
489 493
489 493489 493
489 493
 
Ijarcet vol-2-issue-4-1405-1409
Ijarcet vol-2-issue-4-1405-1409Ijarcet vol-2-issue-4-1405-1409
Ijarcet vol-2-issue-4-1405-1409
 
Fog doc
Fog doc Fog doc
Fog doc
 
Secure Redundant Data Avoidance over Multi-Cloud Architecture.
Secure Redundant Data Avoidance over Multi-Cloud Architecture. Secure Redundant Data Avoidance over Multi-Cloud Architecture.
Secure Redundant Data Avoidance over Multi-Cloud Architecture.
 
ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...
ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...
ENHANCING SECURITY IN CLOUD COMPUTING BY COMBINING DYNAMIC BROADCAST ENCRYPTI...
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detection
 

Similar a Iw qo s09 (1)

Distributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud ComputingDistributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud ComputingAIRCC Publishing Corporation
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGAIRCC Publishing Corporation
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudeSAT Journals
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudeSAT Publishing House
 
S.A.kalaiselvan toward secure and dependable storage services
S.A.kalaiselvan toward secure and dependable storage servicesS.A.kalaiselvan toward secure and dependable storage services
S.A.kalaiselvan toward secure and dependable storage serviceskalaiselvanresearch
 
Privacy and Integrity Preserving in Cloud Storage Devices
Privacy and Integrity Preserving in Cloud Storage DevicesPrivacy and Integrity Preserving in Cloud Storage Devices
Privacy and Integrity Preserving in Cloud Storage DevicesIOSR Journals
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerIOSR Journals
 
Towards Secure and Dependable Storage Services in Cloud Computing
Towards Secure and Dependable Storage Services in Cloud  Computing Towards Secure and Dependable Storage Services in Cloud  Computing
Towards Secure and Dependable Storage Services in Cloud Computing IJMER
 
Dynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for CloudDynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for CloudAM Publications
 
Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...
Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...
Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...Editor IJMTER
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Enabling Public Audit Ability and Data Dynamics for Storage Security in Clou...
Enabling Public Audit Ability and Data Dynamics for Storage  Security in Clou...Enabling Public Audit Ability and Data Dynamics for Storage  Security in Clou...
Enabling Public Audit Ability and Data Dynamics for Storage Security in Clou...IOSR Journals
 
Secure distributed deduplication systems with improved reliability
Secure distributed deduplication systems with improved reliabilitySecure distributed deduplication systems with improved reliability
Secure distributed deduplication systems with improved reliabilityPvrtechnologies Nellore
 

Similar a Iw qo s09 (1) (20)

Distributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud ComputingDistributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud Computing
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloud
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloud
 
Toward secure and dependable
Toward secure and dependableToward secure and dependable
Toward secure and dependable
 
L01246974
L01246974L01246974
L01246974
 
S.A.kalaiselvan toward secure and dependable storage services
S.A.kalaiselvan toward secure and dependable storage servicesS.A.kalaiselvan toward secure and dependable storage services
S.A.kalaiselvan toward secure and dependable storage services
 
Privacy and Integrity Preserving in Cloud Storage Devices
Privacy and Integrity Preserving in Cloud Storage DevicesPrivacy and Integrity Preserving in Cloud Storage Devices
Privacy and Integrity Preserving in Cloud Storage Devices
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud Server
 
Towards Secure and Dependable Storage Services in Cloud Computing
Towards Secure and Dependable Storage Services in Cloud  Computing Towards Secure and Dependable Storage Services in Cloud  Computing
Towards Secure and Dependable Storage Services in Cloud Computing
 
Dynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for CloudDynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for Cloud
 
H1803035056
H1803035056H1803035056
H1803035056
 
234 237
234 237234 237
234 237
 
L04302088092
L04302088092L04302088092
L04302088092
 
Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...
Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...
Public Key Encryption algorithms Enabling Efficiency Using SaaS in Cloud Comp...
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Enabling Public Audit Ability and Data Dynamics for Storage Security in Clou...
Enabling Public Audit Ability and Data Dynamics for Storage  Security in Clou...Enabling Public Audit Ability and Data Dynamics for Storage  Security in Clou...
Enabling Public Audit Ability and Data Dynamics for Storage Security in Clou...
 
Secure distributed deduplication systems with improved reliability
Secure distributed deduplication systems with improved reliabilitySecure distributed deduplication systems with improved reliability
Secure distributed deduplication systems with improved reliability
 
J017236366
J017236366J017236366
J017236366
 

Último

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Iw qo s09 (1)

  • 1. Ensuring Data Storage Security in Cloud Computing Cong Wang, Qian Wang, and Kui Ren Wenjing Lou Department of ECE Department of ECE Illinois Institute of Technology Worcester Polytechnic Institute Email: {cwang, qwang, kren}@ece.iit.edu Email: wjlou@ece.wpi.edu Abstract—Cloud Computing has been envisioned as the next- number of reasons. Firstly, traditional cryptographic primitives generation architecture of IT Enterprise. In contrast to tradi- for the purpose of data security protection can not be directly tional solutions, where the IT services are under proper physical, adopted due to the users’ loss control of data under Cloud logical and personnel controls, Cloud Computing moves the application software and databases to the large data centers, Computing. Therefore, verification of correct data storage where the management of the data and services may not be in the cloud must be conducted without explicit knowledge fully trustworthy. This unique attribute, however, poses many of the whole data. Considering various kinds of data for new security challenges which have not been well understood. each user stored in the cloud and the demand of long term In this article, we focus on cloud data storage security, which continuous assurance of their data safety, the problem of has always been an important aspect of quality of service. To ensure the correctness of users’ data in the cloud, we propose an verifying correctness of data storage in the cloud becomes effective and flexible distributed scheme with two salient features, even more challenging. Secondly, Cloud Computing is not just opposing to its predecessors. By utilizing the homomorphic token a third party data warehouse. The data stored in the cloud with distributed verification of erasure-coded data, our scheme may be frequently updated by the users, including insertion, achieves the integration of storage correctness insurance and data deletion, modification, appending, reordering, etc. To ensure error localization, i.e., the identification of misbehaving server(s). Unlike most prior works, the new scheme further supports secure storage correctness under dynamic data update is hence of and efficient dynamic operations on data blocks, including: data paramount importance. However, this dynamic feature also update, delete and append. Extensive security and performance makes traditional integrity insurance techniques futile and analysis shows that the proposed scheme is highly efficient and entails new solutions. Last but not the least, the deployment resilient against Byzantine failure, malicious data modification of Cloud Computing is powered by data centers running in attack, and even server colluding attacks. a simultaneous, cooperated and distributed manner. Individual I. I NTRODUCTION user’s data is redundantly stored in multiple physical loca- Several trends are opening up the era of Cloud Computing, tions to further reduce the data integrity threats. Therefore, which is an Internet-based development and use of computer distributed protocols for storage correctness assurance will be technology. The ever cheaper and more powerful processors, of most importance in achieving a robust and secure cloud data together with the software as a service (SaaS) computing archi- storage system in the real world. However, such important area tecture, are transforming data centers into pools of computing remains to be fully explored in the literature. service on a huge scale. The increasing network bandwidth and Recently, the importance of ensuring the remote data reliable yet flexible network connections make it even possible integrity has been highlighted by the following research that users can now subscribe high quality services from data works [3]–[7]. These techniques, while can be useful to ensure and software that reside solely on remote data centers. the storage correctness without having users possessing data, Moving data into the cloud offers great convenience to can not address all the security threats in cloud data storage, users since they don’t have to care about the complexities since they are all focusing on single server scenario and of direct hardware management. The pioneer of Cloud Com- most of them do not consider dynamic data operations. As puting vendors, Amazon Simple Storage Service (S3) and an complementary approach, researchers have also proposed Amazon Elastic Compute Cloud (EC2) [1] are both well distributed protocols [8]–[10] for ensuring storage correct- known examples. While these internet-based online services ness across multiple servers or peers. Again, none of these do provide huge amounts of storage space and customizable distributed schemes is aware of dynamic data operations. computing resources, this computing platform shift, however, As a result, their applicability in cloud data storage can be is eliminating the responsibility of local machines for data drastically limited. maintenance at the same time. As a result, users are at the In this paper, we propose an effective and flexible distributed mercy of their cloud service providers for the availability and scheme with explicit dynamic data support to ensure the integrity of their data. Recent downtime of Amazon’s S3 is correctness of users’ data in the cloud. We rely on erasure- such an example [2]. correcting code in the file distribution preparation to provide From the perspective of data security, which has always redundancies and guarantee the data dependability. This con- been an important aspect of quality of service, Cloud Com- struction drastically reduces the communication and storage puting inevitably poses new challenging security threats for overhead as compared to the traditional replication-based file
  • 2. distribution techniques. By utilizing the homomorphic token S Me s e cu rity sag e with distributed verification of erasure-coded data, our scheme rit y low cu F F low Se age achieves the storage correctness insurance as well as data Mes s Optional Third Party Auditor Cloud Storage error localization: whenever data corruption has been detected Servers during the storage correctness verification, our scheme can Data Flow almost guarantee the simultaneous localization of data errors, Users i.e., the identification of the misbehaving server(s). essage Flow Security M Our work is among the first few ones in this field to consider Cloud Service Provider distributed data storage in Cloud Computing. Our contribution can be summarized as the following three aspects: Fig. 1: Cloud data storage architecture 1) Compared to many of its predecessors, which only provide binary results about the storage state across the distributed servers, the challenge-response protocol in our work further these operations we are considering are block update, delete, provides the localization of data error. insert and append. 2) Unlike most prior works for ensuring remote data integrity, As users no longer possess their data locally, it is of critical the new scheme supports secure and efficient dynamic opera- importance to assure users that their data are being correctly tions on data blocks, including: update, delete and append. stored and maintained. That is, users should be equipped with security means so that they can make continuous correctness 3) Extensive security and performance analysis shows that assurance of their stored data even without the existence of the proposed scheme is highly efficient and resilient against local copies. In case that users do not necessarily have the Byzantine failure, malicious data modification attack, and even time, feasibility or resources to monitor their data, they can server colluding attacks. delegate the tasks to an optional trusted TPA of their respective The rest of the paper is organized as follows. Section II choices. In our model, we assume that the point-to-point introduces the system model, adversary model, our design goal communication channels between each cloud server and the and notations. Then we provide the detailed description of our user is authenticated and reliable, which can be achieved in scheme in Section III and IV. Section V gives the security practice with little overhead. Note that we don’t address the analysis and performance evaluations, followed by Section VI issue of data privacy in this paper, as in Cloud Computing, which overviews the related work. Finally, Section VII gives data privacy is orthogonal to the problem we study here. the concluding remark of the whole paper. B. Adversary Model II. P ROBLEM S TATEMENT Security threats faced by cloud data storage can come A. System Model from two different sources. On the one hand, a CSP can A representative network architecture for cloud data storage be self-interested, untrusted and possibly malicious. Not only is illustrated in Figure 1. Three different network entities can does it desire to move data that has not been or is rarely be identified as follows: accessed to a lower tier of storage than agreed for monetary • User: users, who have data to be stored in the cloud and reasons, but it may also attempt to hide a data loss incident rely on the cloud for data computation, consist of both due to management errors, Byzantine failures and so on. individual consumers and organizations. On the other hand, there may also exist an economically- • Cloud Service Provider (CSP): a CSP, who has signif- motivated adversary, who has the capability to compromise icant resources and expertise in building and managing a number of cloud data storage servers in different time distributed cloud storage servers, owns and operates live intervals and subsequently is able to modify or delete users’ Cloud Computing systems. data while remaining undetected by CSPs for a certain period. • Third Party Auditor (TPA): an optional TPA, who has Specifically, we consider two types of adversary with different expertise and capabilities that users may not have, is levels of capability in this paper: trusted to assess and expose risk of cloud storage services Weak Adversary: The adversary is interested in corrupting the on behalf of the users upon request. user’s data files stored on individual servers. Once a server is In cloud data storage, a user stores his data through a CSP comprised, an adversary can pollute the original data files by into a set of cloud servers, which are running in a simulta- modifying or introducing its own fraudulent data to prevent neous, cooperated and distributed manner. Data redundancy the original data from being retrieved by the user. can be employed with technique of erasure-correcting code Strong Adversary: This is the worst case scenario, in which to further tolerate faults or server crash as user’s data grows we assume that the adversary can compromise all the storage in size and importance. Thereafter, for application purposes, servers so that he can intentionally modify the data files as the user interacts with the cloud servers via CSP to access or long as they are internally consistent. In fact, this is equivalent retrieve his data. In some cases, the user may need to perform to the case where all servers are colluding together to hide a block level operations on his data. The most general forms of data loss or corruption incident.
  • 3. C. Design Goals integrated with the verification of erasure-coded data [8] [12]. To ensure the security and dependability for cloud data Subsequently, it is also shown how to derive a challenge- storage under the aforementioned adversary model, we aim response protocol for verifying the storage correctness as well to design efficient mechanisms for dynamic data verification as identifying misbehaving servers. Finally, the procedure for and operation and achieve the following goals: (1) Storage file retrieval and error recovery based on erasure-correcting correctness: to ensure users that their data are indeed stored code is outlined. appropriately and kept intact all the time in the cloud. (2) A. File Distribution Preparation Fast localization of data error: to effectively locate the mal- functioning server when data corruption has been detected. (3) It is well known that erasure-correcting code may be used Dynamic data support: to maintain the same level of storage to tolerate multiple failures in distributed storage systems. In correctness assurance even if users modify, delete or append cloud data storage, we rely on this technique to disperse the their data files in the cloud. (4) Dependability: to enhance data data file F redundantly across a set of n = m + k distributed availability against Byzantine failures, malicious data modifi- servers. A (m + k, k) Reed-Solomon erasure-correcting code cation and server colluding attacks, i.e. minimizing the effect is used to create k redundancy parity vectors from m data brought by data errors or server failures. (5) Lightweight: vectors in such a way that the original m data vectors can be to enable users to perform storage correctness checks with reconstructed from any m out of the m + k data and parity minimum overhead. vectors. By placing each of the m + k vectors on a different server, the original data file can survive the failure of any k of D. Notation and Preliminaries the m+k servers without any data loss, with a space overhead • F – the data file to be stored. We assume that F can be of k/m. For support of efficient sequential I/O to the original denoted as a matrix of m equal-sized data vectors, each file, our file layout is systematic, i.e., the unmodified m data consisting of l blocks. Data blocks are all well represented file vectors together with k parity vectors is distributed across as elements in Galois Field GF (2p ) for p = 8 or 16. m + k different servers. • A – The dispersal matrix used for Reed-Solomon coding. Let F = (F1 , F2 , . . . , Fm ) and Fi = (f1i , f2i , . . . , fli )T • G – The encoded file matrix, which includes a set of (i ∈ {1, . . . , m}), where l ≤ 2p − 1. Note all these blocks n = m + k vectors, each consisting of l blocks. are elements of GF (2p ). The systematic layout with parity • fkey (·) – pseudorandom function (PRF), which is defined vectors is achieved with the information dispersal matrix A, as f : {0, 1}∗ × key → GF (2p ). derived from an m × (m + k) Vandermonde matrix [13]: • φkey (·) – pseudorandom permutation (PRP), which is   1 1 ... 1 1 ... 1 defined as φ : {0, 1}log2 (l) × key → {0, 1}log2 (l) .  β1  β2 ... βm βm+1 . . . βn   • ver – a version number bound with the index for individ-  , . . . . .. . . . . .. . . ual blocks, which records the times the block has been  . . . . . . .  modified. Initially we assume ver is 0 for all data blocks. m−1 m−1 m−1 m−1 m−1 β1 β2 . . . βm βm+1 . . . βn • sver – the seed for PRF, which depends on the file name, ij block index i, the server position j as well as the optional where βj (j ∈ {1, . . . , n}) are distinct elements randomly block version number ver. picked from GF (2p ). After a sequence of elementary row transformations, the III. E NSURING C LOUD DATA S TORAGE desired matrix A can be written as In cloud data storage system, users store their data in   1 0 . . . 0 p11 p12 . . . p1k the cloud and no longer possess the data locally. Thus, the  0 1 . . . 0 p21 p22 . . . p2k    correctness and availability of the data files being stored on the A = (I|P) =  . . . . . . .. . , distributed cloud servers must be guaranteed. One of the key  . . . . .. .. . . . . . .  . issues is to effectively detect any unauthorized data modifica- 0 0 . . . 1 pm1 pm2 . . . pmk tion and corruption, possibly due to server compromise and/or where I is a m × m identity matrix and P is the secret parity random Byzantine failures. Besides, in the distributed case generation matrix with size m × k. Note that A is derived when such inconsistencies are successfully detected, to find from a Vandermonde matrix, thus it has the property that any which server the data error lies in is also of great significance, m out of the m + k columns form an invertible matrix. since it can be the first step to fast recover the storage errors. By multiplying F by A, the user obtains the encoded file: To address these problems, our main scheme for ensuring cloud data storage is presented in this section. The first part of G= F·A = (G(1) , G(2) , . . . , G(m) , G(m+1) , . . . , G(n) ) the section is devoted to a review of basic tools from coding = (F1 , F2 , . . . , Fm , G(m+1) , . . . , G(n) ), theory that are needed in our scheme for file distribution across (j) (j) (j) cloud servers. Then, the homomorphic token is introduced. where G(j) = (g1 , g2 , . . . , gl )T (j ∈ {1, . . . , n}). As The token computation function we are considering belongs noticed, the multiplication reproduces the original data file to a family of universal hash function [11], chosen to pre- vectors of F and the remaining part (G(m+1) , . . . , G(n) ) are serve the homomorphic properties, which can be perfectly k parity vectors generated based on F.
  • 4. Algorithm 1 Token Pre-computation Algorithm 2 Correctness Verification and Error Localization 1: procedure 1: procedure C HALLENGE (i) (i) 2: Choose parameters l, n and function f, φ; 2: Recompute αi = fkchal (i) and kprp from KP RP ; (i) 3: Choose the number t of tokens; 3: Send {αi , kprp } to all the cloud servers; 4: Choose the number r of indices per verification; 4: Receive from servers: (j) 5: Generate master key Kprp and challenge kchal ; {Ri = r αq ∗ G(j) [φk(i) (q)]|1 ≤ j ≤ n} q=1 i for vector G(j) , j ← 1, n do prp 6: 5: for (j ← m + 1, n) do 7: for round i← 1, t do 6: R(j) ← R(j) − r fkj (sIq ,j )·αq , Iq = φk(i) (q) (i) q=1 i prp 8: Derive αi = fkchal (i) and kprp from KP RP . 7: end for (j) r 9: Compute vi = q=1 αq ∗ G(j) [φk(i) (q)] i 8: (1) (m) if ((Ri , . . . , Ri ) · P==(Ri (m+1) (n) , . . . , Ri )) then prp 10: end for 9: Accept and ready for the next challenge. 11: end for 10: else 12: Store all the vi s locally. 11: for (j ← 1, n) do 13: end procedure (j) (j) 12: if (Ri ! =vi ) then 13: return server j is misbehaving. 14: end if B. Challenge Token Precomputation 15: end for In order to achieve assurance of data storage correctness 16: end if 17: end procedure and data error localization simultaneously, our scheme entirely relies on the pre-computed verification tokens. The main idea is as follows: before file distribution the user pre-computes a certain number of short verification tokens on individual vector user stores them locally to obviate the need for encryption G(j) (j ∈ {1, . . . , n}), each token covering a random subset and lower the bandwidth overhead during dynamic data op- of data blocks. Later, when the user wants to make sure the eration which will be discussed shortly. The details of token storage correctness for the data in the cloud, he challenges generation are shown in Algorithm 1. Once all tokens are computed, the final step before the cloud servers with a set of randomly generated block (j) indices. Upon receiving challenge, each cloud server computes file distribution is to blind each parity block gi in a short “signature” over the specified blocks and returns them (G(m+1) , . . . , G(n) ) by to the user. The values of these signatures should match the (j) (j) gi ← gi + fkj (sij ), i ∈ {1, . . . , l}, corresponding tokens pre-computed by the user. Meanwhile, as all servers operate over the same subset of the indices, the where kj is the secret key for parity vector G(j) (j ∈ {m + requested response values for integrity check must also be a 1, . . . , n}). This is for protection of the secret matrix P. We valid codeword determined by secret matrix P. will discuss the necessity of using blinded parities in detail Suppose the user wants to challenge the cloud servers t in Section V. After blinding the parity information, the user times to ensure the correctness of data storage. Then, he disperses all the n encoded vectors G(j) (j ∈ {1, . . . , n}) must pre-compute t verification tokens for each G(j) (j ∈ across the cloud servers S1 , S2 , . . . , Sn . {1, . . . , n}), using a PRF f (·), a PRP φ(·), a challenge key C. Correctness Verification and Error Localization kchal and a master permutation key KP RP . To generate the Error localization is a key prerequisite for eliminating errors ith token for server j, the user acts as follows: in storage systems. However, many previous schemes do not 1) Derive a random challenge value αi of GF (2p ) by αi = explicitly consider the problem of data error localization, (i) fkchal (i) and a permutation key kprp based on KP RP . thus only provide binary results for the storage verification. 2) Compute the set of r randomly-chosen indices: Our scheme outperforms those by integrating the correctness verification and error localization in our challenge-response {Iq ∈ [1, ..., l]|1 ≤ q ≤ r}, where Iq = φk(i) (q). prp protocol: the response values from servers for each challenge 3) Calculate the token as: not only determine the correctness of the distributed storage, r but also contain information to locate potential data error(s). (j) (j) Specifically, the procedure of the i-th challenge-response for vi = αq ∗ G(j) [Iq ], where G(j) [Iq ] = gIq . i q=1 a cross-check over the n servers is described as follows: 1) The user reveals the αi as well as the i-th permutation (j) (i) Note that vi , which is an element of GF (2p ) with small key kprp to each servers. size, is the response the user expects to receive from server j 2) The server storing vector G(j) aggregates those r rows (i) when he challenges it on the specified data blocks. specified by index kprp into a linear combination After token generation, the user has the choice of either r (j) keeping the pre-computed tokens locally or storing them in Ri = αq ∗ G(j) [φk(i) (q)]. i prp encrypted form on the cloud servers. In our case here, the q=1
  • 5. (j) Algorithm 3 Error Recovery 3) Upon receiving Ri s from all the servers, the user takes away blind values in R(j) (j ∈ {m + 1, . . . , n}) by 1: procedure r % Assume the block corruptions have been detected (j) (j) among Ri ← Ri − fkj (sIq ,j ) · αq , where Iq = φk(i) (q). i q=1 prp % the specified r rows; % Assume s ≤ k servers have been identified misbehaving 4) Then the user verifies whether the received values re- 2: Download r rows of blocks from servers; main a valid codeword determined by secret matrix P: 3: Treat s servers as erasures and recover the blocks. (1) (m) ? (m+1) (n) 4: Resend the recovered blocks to corresponding servers. (Ri , . . . , Ri ) · P = (Ri , . . . , Ri ). 5: end procedure Because all the servers operate over the same subset of indices, the linear aggregation of these r specified rows (1) (n) (Ri , . . . , Ri ) has to be a codeword in the encoded file operations of update, delete and append to modify the data file matrix. If the above equation holds, the challenge is passed. while maintaining the storage correctness assurance. Otherwise, it indicates that among those specified rows, there The straightforward and trivial way to support these opera- exist file block corruptions. tions is for user to download all the data from the cloud servers Once the inconsistency among the storage has been success- and re-compute the whole parity blocks as well as verification fully detected, we can rely on the pre-computed verification tokens. This would clearly be highly inefficient. In this section, tokens to further determine where the potential data error(s) we will show how our scheme can explicitly and efficiently (j) lies in. Note that each response Ri is computed exactly in the handle dynamic data operations for cloud data storage. (j) same way as token vi , thus the user can simply find which server is misbehaving by verifying the following n equations: A. Update Operation (j) ? (j) In cloud data storage, sometimes the user may need to Ri = vi , j ∈ {1, . . . , n}. modify some data block(s) stored in the cloud, from its Algorithm 2 gives the details of correctness verification and current value fij to a new one, fij + ∆fij . We refer this error localization. operation as data update. Due to the linear property of Reed- Solomon code, a user can perform the update operation and D. File Retrieval and Error Recovery generate the updated parity blocks by using ∆fij only, without Since our layout of file matrix is systematic, the user can involving any other unchanged blocks. Specifically, the user reconstruct the original file by downloading the data vectors can construct a general update matrix ∆F as from the first m servers, assuming that they return the correct   ∆f11 ∆f12 . . . ∆f1m response values. Notice that our verification scheme is based  ∆f21 ∆f22 . . . ∆f2m    on random spot-checking, so the storage correctness assurance ∆F =  . . .. .  is a probabilistic one. However, by choosing system param-  . . . . . . .  eters (e.g., r, l, t) appropriately and conducting enough times ∆fl1 ∆fl2 . . . ∆flm of verification, we can guarantee the successful file retrieval = (∆F1 , ∆F2 , . . . , ∆Fm ). with high probability. On the other hand, whenever the data Note that we use zero elements in ∆F to denote the unchanged corruption is detected, the comparison of pre-computed tokens blocks. To maintain the corresponding parity vectors as well as and received response values can guarantee the identification be consistent with the original file layout, the user can multiply of misbehaving server(s), again with high probability, which ∆F by A and thus generate the update information for both will be discussed shortly. Therefore, the user can always the data vectors and parity vectors as follows: ask servers to send back blocks of the r rows specified in the challenge and regenerate the correct blocks by erasure ∆F · A = (∆G(1) , . . . , ∆G(m) , ∆G(m+1) , . . . , ∆G(n) ) correction, shown in Algorithm 3, as long as there are at most = (∆F1 , . . . , ∆Fm , ∆G(m+1) , . . . , ∆G(n) ), k misbehaving servers are identified. The newly recovered blocks can then be redistributed to the misbehaving servers where ∆G(j) (j ∈ {m + 1, . . . , n}) denotes the update to maintain the correctness of storage. information for the parity vector G(j) . Because the data update operation inevitably affects some IV. P ROVIDING DYNAMIC DATA O PERATION S UPPORT or all of the remaining verification tokens, after preparation of So far, we assumed that F represents static or archived update information, the user has to amend those unused tokens data. This model may fit some application scenarios, such for each vector G(j) to maintain the same storage correctness as libraries and scientific datasets. However, in cloud data assurance. In other words, for all the unused tokens, the user storage, there are many potential scenarios where data stored needs to exclude every occurrence of the old data block and in the cloud is dynamic, like electronic documents, photos, or replace it with the new one. Thanks to the homomorphic log files etc. Therefore, it is crucial to consider the dynamic construction of our verification token, the user can perform case, where a user may wish to perform various block-level the token update efficiently. To give more details, suppose a
  • 6. (j) block G(j) [Is ], which is covered by the specific token vi , has To support block append operation, we need a slight modi- been changed to G(j) [Is ] + ∆G(j) [Is ], where Is = φk(i) (s). fication to our token pre-computation. Specifically, we require prp (j) To maintain the usability of token vi , it is not hard to verify the user to expect the maximum size in blocks, denoted as that the user can simply update it by lmax , for each of his data vector. The idea of supporting block append, which is similar as adopted in [7], relies on (j) (j) vi ← vi + αs ∗ ∆G(j) [Is ], i the initial budget for the maximum anticipated data size lmax in each encoded data vector as well as the system parameter without retrieving any other r − 1 blocks required in the pre- (j) rmax = ⌈r ∗ (lmax /l)⌉ for each pre-computed challenge- computation of vi . response token. The pre-computation of the i-th token on After the amendment to the affected tokens1 , the user needs (j) server j is modified as follows: to blind the update information ∆gi for each parity block in rmax (∆G(m+1) , . . . , ∆G(n) ) to hide the secret matrix P by vi = αq ∗ G(j) [Iq ], i (j) (j) ∆gi ← ∆gi + fkj (sver ), i ij ∈ {1, . . . , l}. q=1 Here we use a new seed sver for the PRF. The version number where ij ver functions like a counter which helps the user to keep G(j) [φk(i) (q)] , [φk(i) (q)] ≤ l track of the blind information on the specific parity blocks. G(j) [Iq ] = prp prp After blinding, the user sends update information to the cloud 0 , [φk(i) (q)] > l prp servers, which perform the update operation as This formula guarantees that on average, there will be r indices G (j) ←G (j) + ∆G (j) , (j ∈ {1, . . . , n}). falling into the range of existing l blocks. Because the cloud servers and the user have the agreement on the number of B. Delete Operation existing blocks in each vector G(j) , servers will follow exactly Sometimes, after being stored in the cloud, certain data the above procedure when re-computing the token values upon blocks may need to be deleted. The delete operation we are receiving user’s challenge request. considering is a general one, in which user replaces the data Now when the user is ready to append new blocks, i.e., block with zero or some special reserved data symbol. From both the file blocks and the corresponding parity blocks are this point of view, the delete operation is actually a special case generated, the total length of each vector G(j) will be increased of the data update operation, where the original data blocks and fall into the range [l, lmax ]. Therefore, the user will update can be replaced with zeros or some predetermined special those affected tokens by adding αs ∗ G(j) [Is ] to the old vi i blocks. Therefore, we can rely on the update procedure to whenever G(j) [Is ] = 0 for Is > l, where Is = φk(i) (s). The prp support delete operation, i.e., by setting ∆fij in ∆F to be parity blinding is similar as introduced in update operation, −∆fij . Also, all the affected tokens have to be modified and thus is omitted here. the updated parity information has to be blinded using the same method specified in update operation. D. Insert Operation An insert operation to the data file refers to an append C. Append Operation operation at the desired index position while maintaining the In some cases, the user may want to increase the size of same data block structure for the whole data file, i.e., inserting his stored data by adding blocks at the end of the data file, a block F [j] corresponds to shifting all blocks starting with which we refer as data append. We anticipate that the most index j + 1 by one slot. An insert operation may affect many frequent append operation in cloud data storage is bulk append, rows in the logical data file matrix F, and a substantial number in which the user needs to upload a large number of blocks of computations are required to renumber all the subsequent (not a single block) at one time. blocks as well as re-compute the challenge-response tokens. Given the file matrix F illustrated in file distribution prepa- Therefore, an efficient insert operation is difficult to support ration, appending blocks towards the end of a data file is and thus we leave it for our future work. equivalent to concatenate corresponding rows at the bottom of the matrix layout for file F. In the beginning, there are V. S ECURITY A NALYSIS AND P ERFORMANCE E VALUATION only l rows in the file matrix. To simplify the presentation, In this section, we analyze our proposed scheme in terms we suppose the user wants to append m blocks at the end of of security and efficiency. Our security analysis focuses on file F, denoted as (fl+1,1 , fl+1,2 , ..., fl+1,m ) (We can always the adversary model defined in Section II. We also evaluate use zero-padding to make a row of m elements.). With the the efficiency of our scheme via implementation of both file secret matrix P, the user can directly calculate the append distribution preparation and verification token precomputation. blocks for each parity server as (m+1) (n) A. Security Strength Against Weak Adversary (fl+1,1 , ..., fl+1,m ) · P = (gl+1 , ..., gl+1 ). 1) Detection Probability against data modification: In our 1 In practice, it is possible that only a fraction of tokens need amendment, scheme, servers are required to operate on specified rows in since the updated blocks may not be covered by all the tokens. each correctness verification for the calculation of requested