SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
Reconstructing Provenance                         Sara Magliacane - VU University Amsterdam
                                                                                          Advisors: Paul Groth and Frank van Harmelen



                                   Problem Statement                                                                                                                        An initial prototype implementation
The provenance of a data item is the metadata describing how,                                                                                                      As a first step we focus on dependencies between files instead of
when and by whom the data item was produced.                                                                                                                       sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                                               We implemented a prototype of the pipeline using open-source
resulting in collections of files with only basic filesystem                                                                                                       components, like Apache Lucene, Apache Tika and Dropbox API.
metadata, e.g. timestamps.                                                                                                                                         As signal detectors we used well-known similarity measures.

In this case, is it possible to reconstruct provenance post hoc?                                                                                              <2,4%      C*.7*2,.4491;%                             D672)A.4.4%E.1.*+521%                                          D672)A.4.4%C*F191;%                                                 G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                                                                                                                                                                                                   !#$%


                                                                                                                                                                                                                                                                                         @9:).*%).-72*+:%                                                                                      !          "
                                                                                                                                                                        '()*+,)%-.)+/+)+%%                                   8.()%49-9:+*9)6%                                                                                                                           I.9;A)./%BF-%
                                                                                                                                                                                                                                                                                          91,2A.*.1,.%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        !
                                                                                      @*A#<7"#A,#8,/#                                                                                                                                                                                         B9-9:+*9)6%
                                                                                                                                                                                                                                                                                                                                                                                               &      $#"%
                                                                                                                                                               &          01/.(%,21).1)%                                     0-+;.%49-9:+*9)6%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                                      9*5,#.":*597B*"C#                                                                                                                                                                                      )A*.4A2:/4%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        "
                                                                                                                                                                          013.*%4.-+15,%                                     <2-+91=47.,9>,%                                             <2-+91=47.,9>,%
                                                                                                                                                                               )67.4%                                          49-9:+*9)6%                                                  >:).*91;%

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                  ?.)+/+)+%
                                                                                                                                                                                                                                  49-9:+*9)6%
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                           4,!5(
                                                          67"8#(
                                                                                   4,!5(
                                                                                 !"$"8$"!+(
                                                                                                           9"$"!+$"-#:
                                                                                                           !"$"8$"!+(
                                                                                                                                                                                        Initial (encouraging) results
                                 )#*+$#!,$)%!&'(
           !"!#$%!&'(
             =+",# #         #      #        #        #       #    #         #        #       #       #        #         #></*?,5#
                                                                                                                                                                   We performed an experiment with a small set of biomedical
                                                                       !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                          publications, annotated manually by two domain experts.
                                                                       ./01(                      ./31(                       ./21(


                                                                                                                                                                                                 Cluster 1: Blood Cultures                               Cluster 2: Markers                    Cluster 3: General
                                                                                                                                                                                                 EvidenceQ||                                             EvidenceQX                            Guideline




                                                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                                                                             22




                                                                                                                                                                                                                        23                                              17




                                                                                                                                                                                                                                                          15                                                                  2                6                                 7




                                    Research Question                                                                                                                                                 13
                                                                                                                                                                                                               14            20




                                                                                                                                                                                                                             16                     21
                                                                                                                                                                                                                                                          18                 19




                                                                                                                                                                                                                                                                                     0




                                                                                                                                                                                                                                                                                         1
                                                                                                                                                                                                                                                                                                                                       4




                                                                                                                                                                                                                                                                                                                                           3       5
                                                                                                                                                                                                                                                                                                                                                                             8




                                                                                                                                                                                                                                                                                                                                                                                     9    10




                                                                                                                                                                                                                                                                                                                                                                                         11




                                                                                                                                                                                                                                      24                                                                                                                   12



     How can one automatically, accurately and efficiently                                                                                                                                                           5




     reconstruct a plausible provenance of files in a shared folder,                                                                                                                                                                                                                                                 23




                                                                                                                                                                                )"*+#,-*+(
                                                                                                                                                                                                                                               20                                                              17




     intended as the sequences of operations connecting the files?
                                                                                                                                                                                                                                                                                    19                                                                          7




                                                                                                                                                                                                               4                                                                                15                                                                  8




                                                                                                                                                                                             3                                                                                                                                    14




                                                                                                                                                                                                  2                                                                                                   18                                               9




                                                                                                                                                                                                           6                                                            22




                                                                                                                                                                                                                                                                                                                         21



                                                                                                                                                                                                                                                                   16




                                                                                                                                                                                                                                           0                                                              13                                                            10




                                                                                                                                                                                                                                                               1                                                                                                        11




                             Approach & Methodology
                                                                                                                                                                                                                                                                                                                                                           12




                                                                                                                                                                                                                                                                                                     24




                                                                                                                                                                                                      Cluster 1: Blood Cultures                                              Cluster 2: Markers                                        Cluster 3: General
                                                                                                                                                                                                      EvidenceQ||                                                            EvidenceQX                                                Guideline




     We propose a multi-signal pipeline approach that reconstructs                                                                                              F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and                                                                                             F1-score of 0.70 for the aggregation of various similarities
     metadata as evidence of the relationships between files.

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                                                                           Future work
                                                             #$4:2-4#-';'<=>'

                                                                                                                                                #$%&'              Following the planned methodology, we will explore additional
8$#A'      @1-%1$#-AA)4,'               B&%$0C-A-A'D-4-1+E$4'      B&%$0C-A-A'@1F4)4,'             G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                            !           "
                                                                                                                                                                   components for each of the pipeline phases and consider also
           ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                                   computational efficiency.
                                                                                                                                            (        )*+,-'
 !
 (          342-/'#$40-40'                6),4+7'8-0-#0$1('              6),4+7'9)70-1('                  G,,1-,+0$1('                 #$4:2-4#-';'<=?'

 "
                  5'                             5'                              5'                            ==='
                                                                                                                                            !
                                                                                                                                                #$%&'


                                                                                                                                                        "
                                                                                                                                                                                                                                      Bibliography
                                                                                                                                                 (                    (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral
                                                                                                                                                                      Consortium 2012

        The research methodology is an iterative process, that will                                                                                                   (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata
        incrementally integrate existing approaches in literature and                                                                                                 Annotation through Reconstructing Provenance, Third International
        evaluate the performance on benchmark corpora.                                                                                                                Workshop on the role of Semantic Web in Provenance Management,
                                                                                                                                                                      ESWC 2012
Advisors: Paul Groth and Frank van Harmelen



                            Problem Statement                                                                                              An initial prototype im
The provenance of a data item is the metadata describing how,                                                                        As a first step we focus on dependen
when and by whom the data item was produced.                                                                                         sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                 We implemented a prototype of the p
resulting in collections of files with only basic filesystem                                                                         components, like Apache Lucene, Ap
metadata, e.g. timestamps.                                                                                                           As signal detectors we used well-kno

In this case, is it possible to reconstruct provenance post hoc?                                                                <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F


                                                                                                                                                                                                                             @9:).*%).-72*
                                                                                                                                        '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%
                                                                                                                                                                                                                              91,2A.*.1,
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                !
                                                                               @*A#<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                 &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                               9*5,#.":*597B*"C#                                                                                                                                 )A*.4A2:/4
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                "
                                                                                                                                         013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,
                                                                                                                                              )67.4%                              49-9:+*9)6%                                   >:).*91;%

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                           ?.)+/+)+%
                                                                                                                                                                                     49-9:+*9)6%
        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                  4,!5(
                                                 67"8#(
                                                                            4,!5(
                                                                          !"$"8$"!+(
                                                                                                   9"$"!+$"-#:
                                                                                                   !"$"8$"!+(
                                                                                                                                                        Initial (encouragin
                          )#*+$#!,$)%!&'(
      !"!#$%!&'(
        =+",# #       #       #       #      #       #       #        #        #       #       #      #      #></*?,5#
                                                                                                                                     We performed an experiment with a
                                                                 !,-)#$%!!)(               !,-)#$%!!)(            !,-)#$%!!)(        publications, annotated manually by
                                                                 ./01(                     ./31(                  ./21(


                                                                                                                                                             Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: G
                                                                                                                                                             EvidenceQ||                           EvidenceQX                      Guideline




                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                   22




                                                                                                                                                                           23                                 17




                                                                                                                                                                                                    15




                              Research Question                                                                                                                13
                                                                                                                                                                    14          20




                                                                                                                                                                                16            21
                                                                                                                                                                                                    18             19




                                                                                                                                                                                                                         0




                                                                                                                                                                                                                             1




                                                                                                                                                                                         24
013.*%4.-+15,%                                      <2-+91=47.,9>,%                                                 <2-+91=47.,9>,%
                                                                                       Advisors: Paul Groth and Frank van )67.4%
                                                                                                                           Harmelen 49-9:+*9)6%                                                                                                                                          >:).*91;%

           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                              ?.)+/+)+%
                                                                                                                                                                                                                           49-9:+*9)6%
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#




                               Problem Statement                                                                                                                   An initial prototype im
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




The provenance of a data item is the metadata describing how,
                                                       4,!5(
                                                      67"8#(
                                                                                4,!5(
                                                                              !"$"8$"!+(
                                                                                                        9"$"!+$"-#:
                                                                                                        !"$"8$"!+(
                                                                                                                                                                         Initial (encouraging
                                                                                                                                                              As a first step we focus on dependenc
when !"!#$%!&'( whom the data item was produced.
     and by )#*+$#!,$)%!&'(                                                                                                                                   sequences of operations.
                                                                                                                                                              We performed an experiment with a sm
          =+",# #        #       #        #       #        #    #         #        #       #       #        #         #></*?,5#
                                                                    !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                        publications, annotated manually by tw
Provenance is crucial in many ./01(
                                  settings, but often it is ./21( tracked,
                                              ./31(          not                                                                                              We implemented a prototype of the pip
resulting in collections of files with only basic filesystem                                                                                                  components, like Apache Lucene, Apa
                                                                                                                                                                                          Cluster 1: Blood Cultures
                                                                                                                                                                                          EvidenceQ||
                                                                                                                                                                                                                                                  Cluster 2: Markers
                                                                                                                                                                                                                                                  EvidenceQX
                                                                                                                                                                                                                                                                                            Cluster 3: General
                                                                                                                                                                                                                                                                                            Guideline

metadata, e.g. timestamps.                                                                                                                                    As signal detectors we used well-know




                                                                                                                                                                        !"#$#%&'(
                                                                                                                                                                                                                                                                      22




                                                                                                                                                                                                                 23                                              17




In this case, is it possible to reconstruct provenance post hoc?                                                                                         <2,4%   C*.7*2,.4491;%                              D672)A.4.4%E.1.*+521%                 15
                                                                                                                                                                                                                                                                            D672)A.4.4%C*F191;%                           2




                                 Research Question                                                                                                               '()*+,)%-.)+/+)+%%
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16


                                                                                                                                                                                                                       8.()%49-9:+*9)6%
                                                                                                                                                                                                                                             21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                      @9:).*%).-72*+:%
                                                                                                                                                                                                                                                                                       91,2A.*.1,.%
                                                                                                                                                          !
                                                                                                                                                                                                                                                                                      1


          !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#
                                                    @*A#<7"#A,#8,/#                                                                                                                                                            24

                                                                                                                                                                                                                                                                                           B9-9:+*9)6%
    How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
         can one automatically, accurately and efficiently
                                                    9*5,#.":*597B*"C#
                                                                                                                                                          &       01/.(%,21).1)%                             5
                                                                                                                                                                                                                      0-+;.%49-9:+*9)6%
                                                                                                                                                                                                                                                                                          )A*.4A2:/4%
    reconstruct a plausible provenance of files in a shared folder,
         !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                       "                                                                                                                                                     23




                                                                                                                                                                         )"*+#,-*+(
                                                                                                                                                                  013.*%4.-+15,%                                      <2-+91=47.,9>,%   20

                                                                                                                                                                                                                                                                                      <2-+91=47.,9>,%      17




                                                                                                                                                                                                                                                                                         >:).*91;%
    intended as the sequences of operations connecting the files?                                                                                                       )67.4%                                           49-9:+*9)6%
                                                                                                                                                                                                                                                                             19




                                                                                                                                                                                                        4                                                                                   15




                                                                                                                                                                                      3                                                                                                                                       14



           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                              2                               ?.)+/+)+%                                                              18




                                                                                                                                                                                                    6
                                                                                                                                                                                                                           49-9:+*9)6%                           22




           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                                                               21



                                                                                                                                                                                                                                                            16




                                                                                                                                                                                                                                    0                                                                 13


           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#                                                                                                                                                         1




                         Approach & Methodology
                                                                                                                                                                                 Initial (encouraging
                                                                                                                                                                                                                                                                                                 24




                                                                                                                                                                                               Cluster 1: Blood Cultures                                              Cluster 2: Markers                                           C
                                                                                                                                                                                               EvidenceQ||                                                            EvidenceQX                                                   G
                                                                                                        9"$"!+$"-#:
                                                                                4,!5(                   !"$"8$"!+(
                                                       4,!5(
                                                      67"8#(                  !"$"8$"!+(


    We !"!#$%!&'(
        propose )#*+$#!,$)%!&'(
                    a multi-signal pipeline approach that reconstructs                                                                                     F1-score of 0.49an experiment with a sm
                                                                                                                                                            We performed for only text similarity
    plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and
         =+",# #  #   #   #     #
                                     using
                                        #               #
                                                    !,-)#$%!!)(
                                                               # #></*?,5#
                                                                      !,-)#$%!!)(
                                                                                                                                                           F1-score of 0.70 for the aggregation of v
                                                                                                                                                            publications, annotated manually by tw
    metadata as evidence of the./01( relationships between./21(
                                                    ./31(
                                                                       files.
                                                                                                                                                                                          Cluster 1: Blood Cultures                               Cluster 2: Markers                        Cluster 3: General




                                                                                                                                                                                                                                    Future work
                                                                                                                                                                                          EvidenceQ||                                             EvidenceQX                                Guideline

    The pipeline consists of four stages, each containing several




                                                                                                                                                                         !"#$#%&'(
                                                                                                                                                                                                                                                                      22




    components that can be executed in parallel:            #$4:2-4#-';'<=>'
                                                                                                                                                                                                                 23                                              17




                                                                                                                                                                                                                                                   15                                                                     2



                                                                                                                                                            Following the planned methodology, we
8$#A'   @1-%1$#-AA)4,'
                                 Research Question
                                     B&%$0C-A-A'D-4-1+E$4'     B&%$0C-A-A'@1F4)4,'              G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                         !
                                                                                                                                             #$%&'


                                                                                                                                                     "
                                                                                                                                                            components for each of the pipeline ph
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16                     21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                                  0




        ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                            computational efficiency.                                                                                                  1




                                                                                                                                         (                                                                                     24
013.*%4.-+15,%
                                                                                                                                                                   013.*%4.-+15,%                                                   <2-+91=47.,9>,%
                                                                                                                                                                                                                                       <2-+91=47.,9>,%                                                                            <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     >:).*91;%
                                                                                                                                                                                                                                                                                                                                         >:).*91;%




                                                                                                                                                                     )"*+#,
                                                                                                                                                                    )67.4%
                                                                                                                                                                        )67.4%      2

                                                                                                                                                                                                                                      49-9:+*9)6%
                                                                                                                                                                                                                                          49-9:+*9)6%                                                                                                18




                                                                                                                                                                                                 6                                                                                                  22




                                                                                                                                                                                                                                                                                                                                                                                        21



              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                        ?.)+/+)+%
                                                                                                                                                                                                                                                ?.)+/+)+%                                      16




                                                                                                                                                                                                                                             49-9:+*9)6%
                                                                                                                                                                                                                                                 49-9:+*9)6%0                                                                                            13




              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                  1




                            Approach & Methodology
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                                                                                                                                                                                        Cluster 1: Blood Cultures
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide




     We propose a multi-signal pipeline approach that reconstructs
                                                      4,!5(
                                                          4,!5(
                                                     67"8#(
                                                         67"8#(
                                                                               4,!5(
                                                                                   4,!5(
                                                                             !"$"8$"!+(
                                                                                 !"$"8$"!+(
                                                                                                       9"$"!+$"-#:
                                                                                                           9"$"!+$"-#:
                                                                                                       !"$"8$"!+(
                                                                                                           !"$"8$"!+(
                                                                                                                                                                           Initial (encouraging)
                                                                                                                                                                            Initial (encouraging
                                                                                                                                                          F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and
                         )#*+$#!,$)%!&'(
                             )#*+$#!,$)%!&'(
                                                                                                                                                          F1-score of 0.70 for the aggregation of va
         !"!#$%!&'(
            !"!#$%!&'(
     metadata #as evidence of# the relationships between files.
           =+",# #
               =+",#   # # # # # # # #       # # # # # # # # # # # # # #></*?,5#
                                                                          #></*?,5#
                                                                                                                                                           We performed an experiment with a a sm
                                                                                                                                                             We performed an experiment with sma
                                                                     !,-)#$%!!)(
                                                                         !,-)#$%!!)(          !,-)#$%!!)(
                                                                                                  !,-)#$%!!)(             !,-)#$%!!)(
                                                                                                                              !,-)#$%!!)(                  publications, annotated manually by two
                                                                                                                                                             publications, annotated manually by tw
                                                                     ./01(
                                                                         ./01(                ./31(
                                                                                                  ./31(                   ./21(
                                                                                                                              ./21(

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                   Cluster 1: Blood Blood Cultures Cluster 2: Markers
                                                                                                                                                                                         Cluster 1: Cultures
                                                                                                                                                                                   EvidenceQ||
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                            Future work   EvidenceQX
                                                                                                                                                                                                                                                                               EvidenceQX
                                                                                                                                                                                                                                                                                                                                           Cluster 3: General
                                                                                                                                                                                                                                                                                                                                                 Cluster 3: General
                                                                                                                                                                                                                                                                                                                                           Guideline
                                                                                                                                                                                                                                                                                                                                                Guideline




                                                                                                                                                                     !"#$#%&'(
                                                                                                                                                                     !"#$#%&'(
                                                             #$4:2-4#-';'<=>'                                                                                                                                                                                                                            22             22




                                                                                                                                            #$%&'         Following the planned methodology, we w                              23            23                                                     17             17




8$#A'      @1-%1$#-AA)4,'            B&%$0C-A-A'D-4-1+E$4'         B&%$0C-A-A'@1F4)4,'          G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                                          components for each of the pipeline phas                                                                         15             15                                                                                                 2             2        6




                                    Research Question
                                     Research Question
                                                                                                                                       !            "                                                14                    14       20            20                       18             18             19             19                                                                             4




           ./01+#0'*-0+2+0+''           6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'               G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                       (
                                                                                                                                                                                        13                    13                    16            16                 21              21                                       0        0                                                                        3


                                                                                                                                                 )*+,-'
 !                                                                                                                                                                                                                                                                                                                                1        1




 (
     How can automatically, accurately and efficiently #$4:2-4#-';'<=?'
   How342-/'#$40-40' one automatically, 6),4+7'9)70-1('
         can one 6),4+7'8-0-#0$1('
                                                                                                                                                                                                                                                       24             24

                                                        G,,1-,+0$1('
                                         accurately and efficiently
 "
                                                                                                                                                                                                                                                       Bibliography
                                                                                                                                                                                                                   5                     5




                                                                     #$%&'
   reconstruct a a plausible provenance of files ===' a shared folder,
     reconstruct plausible provenance of files in in shared folder,
            5'               5'              5'               a                                                                                                                                                                                                                                                                                                    23                   23




                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                                                                                                                20              20                                                                            17                   17




   intended as the sequences ofof operations connecting the!files?
     intended as the sequences operations connecting the files?
                                                                                                                                                                                                                                                                                                                             19       19




                                                                           "                                                                                                                         4                     4                                                                                                                   15                   15




                                                                                                                                                            (1) Sara Magliacane: Reconstructing Prove
                                                                                                                                                                               3             3                                                                                                                                                                                                   14            14




                                                                                                                                             (
                                                                                                                                                                                    2                     2                                                                                                                                          18                   18




                                                                                                                                                            Consortium 2012
                                                                                                                                                                                                 6                     6                                                                            22             22




                                                                                                                                                                                                                                                                                                                                                                                        21            21



                                                                                                                                                                                                                                                                                               16             16




                                                                                                                                                                                                                                                            0              0                                                                             13                   13




        The research methodology is an iterative process, that will                                                                                         (2) Paul Groth, Yolanda Gil, Sara Magliacan
                                                                                                                                                                                                                                                                                1              1




                            Approach &&Methodology
                             Approach Methodology
        incrementally integrate existing approaches in literature and                                                                                       Annotation through Reconstructing Provena
                                                                                                                                                                                        Cluster 1: BloodBlood Cultures
                                                                                                                                                                                             Cluster 1: Cultures

                                                                                                                                                            Workshop on the role of Semantic Web in P
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                             EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                              Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                              EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24                   24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide
                                                                                                                                                                                                                                                                                                                                                                                                           C
                                                                                                                                                                                                                                                                                                                                                                                                           G

        evaluate the performance on benchmark corpora.
                                                                                                                                                            ESWC 2012
     We propose a a multi-signal pipeline approach that reconstructs
       We propose multi-signal pipeline approach that reconstructs                                                                                        F1-score ofof 0.49 for only text similarity
                                                                                                                                                           F1-score 0.49 for only text similarity
     plausible provenance traces using the contents ofof the files and
       plausible provenance traces using the contents the files and                                                                                        F1-score ofof 0.70 for the aggregation of v
                                                                                                                                                           F1-score 0.70 for the aggregation of va
     metadata as evidence ofof the relationships between files.
       metadata as evidence the relationships between files.

     The pipeline consists ofof four stages, each containing several
       The pipeline consists four stages, each containing several
     components that can be executed in in parallel:
       components that can be executed parallel:
                                                                                                                                                                                                                                                            Future work
                                                                                                                                                                                                                                                             Future work
                                                              #$4:2-4#-';'<=>'
                                                                 #$4:2-4#-';'<=>'

                                                                                                                                            #$%&'
                                                                                                                                                #$%&'     Following the planned methodology, we w
                                                                                                                                                            Following the planned methodology, we
8$#A'
   8$#A'   @1-%1$#-AA)4,'
               @1-%1$#-AA)4,'           B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,'
                                     B&%$0C-A-A'D-4-1+E$4'       B&%$0C-A-A'@1F4)4,'            G,,1-,+E$4'+42'1+4H)4,'
                                                                                                    G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                       ! ! " "            components for each ofof the pipeline ph
                                                                                                                                                            components for each the pipeline phas
           ./01+#0'*-0+2+0+''
               ./01+#0'*-0+2+0+''       6),4+7'8-0-#0$1!'
                                            6),4+7'8-0-#0$1!'          6),4+7'9)70-1!'
                                                                           6),4+7'9)70-1!'           G,,1-,+0$1!'
                                                                                                         G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                                            computational efficiency.
                                                                                                                                       ( (
isors: Paul Groth and Frank van Harmelen



nt                                                      An initial prototype implementation
adata describing how,                             As a first step we focus on dependencies between files instead of
duced.                                            sequences of operations.

t often it is not tracked,                        We implemented a prototype of the pipeline using open-source
sic filesystem                                    components, like Apache Lucene, Apache Tika and Dropbox API.
                                                  As signal detectors we used well-known similarity measures.

ovenance post hoc?                           <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F191;%                             G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                   !#$%


                                                                                                                                          @9:).*%).-72*+:%                                                     !          "
                                                     '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%                                                                             I.9;A)./%BF-%
                                                                                                                                           91,2A.*.1,.%
                                              !
<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                                                                                               &      $#"%
                                              &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
#.":*597B*"C#                                                                                                                                 )A*.4A2:/4%
                                              "
                                                      013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,9>,%
                                                           )67.4%                              49-9:+*9)6%                                   >:).*91;%

563-:6#################                                                                           ?.)+/+)+%
                                                                                                  49-9:+*9)6%
,<05,3*5/63-:6#

3,563-:6#




          9"$"!+$"-#:
          !"$"8$"!+(
                                                                     Initial (encouraging) results
    #          #          #></*?,5#
                                                  We performed an experiment with a small set of biomedical
,-)#$%!!)(                     !,-)#$%!!)(        publications, annotated manually by two domain experts.
 31(                           ./21(


                                                                          Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: General
                                                                          EvidenceQ||                           EvidenceQX                      Guideline
                                                             !"#$#%&'(




                                                                                                                                22




                                                                                        23                                 17




                                                                                                                 15                                           2              6                   7




on                                                                          13
                                                                                 14          20




                                                                                             16            21
                                                                                                                 18             19




                                                                                                                                      0




                                                                                                                                          1
                                                                                                                                                                     4




                                                                                                                                                                         3       5
                                                                                                                                                                                             8




                                                                                                                                                                                                     9    10




                                                                                                                                                                                                         11




                                                                                                      24                                                                             12
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"

Más contenido relacionado

Similar a ISWC DC poster "Reconstructing Provenance"

Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasMarcel Caraciolo
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignAndy Polaine
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandIlkka Kakko
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentationSiteriCR2
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2CR2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laicaguest45bb716a5
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localizationlzenki
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data Ed Parsons
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostilafabiobelem7
 
Organizational development
Organizational developmentOrganizational development
Organizational developmentSeta Wicaksana
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...YONG ZHENG
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentationSiteriCR2
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-sojaelbisaltico
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesdaenu
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Chiara Ojeda
 

Similar a ISWC DC poster "Reconstructing Provenance" (20)

Haiku licence experience - fossa2010
Haiku licence experience - fossa2010Haiku licence experience - fossa2010
Haiku licence experience - fossa2010
 
Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais Educativas
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service Design
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, Finland
 
Exec ed june '10 ss
Exec ed june '10 ssExec ed june '10 ss
Exec ed june '10 ss
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentation
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laica
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localization
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostila
 
Organizational development
Organizational developmentOrganizational development
Organizational development
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
 
Layouts
LayoutsLayouts
Layouts
 
All about Apache ACE
All about Apache ACEAll about Apache ACE
All about Apache ACE
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentation
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-soja
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprises
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

ISWC DC poster "Reconstructing Provenance"

  • 1. Reconstructing Provenance Sara Magliacane - VU University Amsterdam Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype implementation The provenance of a data item is the metadata describing how, As a first step we focus on dependencies between files instead of when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the pipeline using open-source resulting in collections of files with only basic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. metadata, e.g. timestamps. As signal detectors we used well-known similarity measures. In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a small set of biomedical !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 Research Question 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12 How can one automatically, accurately and efficiently 5 reconstruct a plausible provenance of files in a shared folder, 23 )"*+#,-*+( 20 17 intended as the sequences of operations connecting the files? 19 7 4 15 8 3 14 2 18 9 6 22 21 16 0 13 10 1 11 Approach & Methodology 12 24 Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline We propose a multi-signal pipeline approach that reconstructs F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and F1-score of 0.70 for the aggregation of various similarities metadata as evidence of the relationships between files. The pipeline consists of four stages, each containing several components that can be executed in parallel: Future work #$4:2-4#-';'<=>' #$%&' Following the planned methodology, we will explore additional 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! " components for each of the pipeline phases and consider also ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( )*+,-' ! ( 342-/'#$40-40' 6),4+7'8-0-#0$1(' 6),4+7'9)70-1(' G,,1-,+0$1(' #$4:2-4#-';'<=?' " 5' 5' 5' ===' ! #$%&' " Bibliography ( (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral Consortium 2012 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata incrementally integrate existing approaches in literature and Annotation through Reconstructing Provenance, Third International evaluate the performance on benchmark corpora. Workshop on the role of Semantic Web in Provenance Management, ESWC 2012
  • 2. Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype im The provenance of a data item is the metadata describing how, As a first step we focus on dependen when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the p resulting in collections of files with only basic filesystem components, like Apache Lucene, Ap metadata, e.g. timestamps. As signal detectors we used well-kno In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F @9:).*%).-72* '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% 91,2A.*.1, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47., )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouragin )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: G EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 Research Question 13 14 20 16 21 18 19 0 1 24
  • 3. 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% Advisors: Paul Groth and Frank van )67.4% Harmelen 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# Problem Statement An initial prototype im !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# The provenance of a data item is the metadata describing how, 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging As a first step we focus on dependenc when !"!#$%!&'( whom the data item was produced. and by )#*+$#!,$)%!&'( sequences of operations. We performed an experiment with a sm =+",# # # # # # # # # # # # # #></*?,5# !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by tw Provenance is crucial in many ./01( settings, but often it is ./21( tracked, ./31( not We implemented a prototype of the pip resulting in collections of files with only basic filesystem components, like Apache Lucene, Apa Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX Cluster 3: General Guideline metadata, e.g. timestamps. As signal detectors we used well-know !"#$#%&'( 22 23 17 In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% 15 D672)A.4.4%C*F191;% 2 Research Question '()*+,)%-.)+/+)+%% 13 14 20 16 8.()%49-9:+*9)6% 21 18 19 0 @9:).*%).-72*+:% 91,2A.*.1,.% ! 1 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# @*A#<7"#A,#8,/# 24 B9-9:+*9)6% How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# can one automatically, accurately and efficiently 9*5,#.":*597B*"C# & 01/.(%,21).1)% 5 0-+;.%49-9:+*9)6% )A*.4A2:/4% reconstruct a plausible provenance of files in a shared folder, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 23 )"*+#,-*+( 013.*%4.-+15,% <2-+91=47.,9>,% 20 <2-+91=47.,9>,% 17 >:).*91;% intended as the sequences of operations connecting the files? )67.4% 49-9:+*9)6% 19 4 15 3 14 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# 2 ?.)+/+)+% 18 6 49-9:+*9)6% 22 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 21 16 0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 1 Approach & Methodology Initial (encouraging 24 Cluster 1: Blood Cultures Cluster 2: Markers C EvidenceQ|| EvidenceQX G 9"$"!+$"-#: 4,!5( !"$"8$"!+( 4,!5( 67"8#( !"$"8$"!+( We !"!#$%!&'( propose )#*+$#!,$)%!&'( a multi-signal pipeline approach that reconstructs F1-score of 0.49an experiment with a sm We performed for only text similarity plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and =+",# # # # # # using # # !,-)#$%!!)( # #></*?,5# !,-)#$%!!)( F1-score of 0.70 for the aggregation of v publications, annotated manually by tw metadata as evidence of the./01( relationships between./21( ./31( files. Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General Future work EvidenceQ|| EvidenceQX Guideline The pipeline consists of four stages, each containing several !"#$#%&'( 22 components that can be executed in parallel: #$4:2-4#-';'<=>' 23 17 15 2 Following the planned methodology, we 8$#A' @1-%1$#-AA)4,' Research Question B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! #$%&' " components for each of the pipeline ph 13 14 20 16 21 18 19 0 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. 1 ( 24
  • 4. 013.*%4.-+15,% 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% >:).*91;% >:).*91;% )"*+#, )67.4% )67.4% 2 49-9:+*9)6% 49-9:+*9)6% 18 6 22 21 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% ?.)+/+)+% 16 49-9:+*9)6% 49-9:+*9)6%0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 1 Approach & Methodology !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX 24 Cluste Guide We propose a multi-signal pipeline approach that reconstructs 4,!5( 4,!5( 67"8#( 67"8#( 4,!5( 4,!5( !"$"8$"!+( !"$"8$"!+( 9"$"!+$"-#: 9"$"!+$"-#: !"$"8$"!+( !"$"8$"!+( Initial (encouraging) Initial (encouraging F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and )#*+$#!,$)%!&'( )#*+$#!,$)%!&'( F1-score of 0.70 for the aggregation of va !"!#$%!&'( !"!#$%!&'( metadata #as evidence of# the relationships between files. =+",# # =+",# # # # # # # # # # # # # # # # # # # # # # #></*?,5# #></*?,5# We performed an experiment with a a sm We performed an experiment with sma !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two publications, annotated manually by tw ./01( ./01( ./31( ./31( ./21( ./21( The pipeline consists of four stages, each containing several components that can be executed in parallel: Cluster 1: Blood Blood Cultures Cluster 2: Markers Cluster 1: Cultures EvidenceQ|| EvidenceQ|| Cluster 2: Markers Future work EvidenceQX EvidenceQX Cluster 3: General Cluster 3: General Guideline Guideline !"#$#%&'( !"#$#%&'( #$4:2-4#-';'<=>' 22 22 #$%&' Following the planned methodology, we w 23 23 17 17 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' components for each of the pipeline phas 15 15 2 2 6 Research Question Research Question ! " 14 14 20 20 18 18 19 19 4 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( 13 13 16 16 21 21 0 0 3 )*+,-' ! 1 1 ( How can automatically, accurately and efficiently #$4:2-4#-';'<=?' How342-/'#$40-40' one automatically, 6),4+7'9)70-1(' can one 6),4+7'8-0-#0$1(' 24 24 G,,1-,+0$1(' accurately and efficiently " Bibliography 5 5 #$%&' reconstruct a a plausible provenance of files ===' a shared folder, reconstruct plausible provenance of files in in shared folder, 5' 5' 5' a 23 23 )"*+#,-*+( )"*+#,-*+( 20 20 17 17 intended as the sequences ofof operations connecting the!files? intended as the sequences operations connecting the files? 19 19 " 4 4 15 15 (1) Sara Magliacane: Reconstructing Prove 3 3 14 14 ( 2 2 18 18 Consortium 2012 6 6 22 22 21 21 16 16 0 0 13 13 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacan 1 1 Approach &&Methodology Approach Methodology incrementally integrate existing approaches in literature and Annotation through Reconstructing Provena Cluster 1: BloodBlood Cultures Cluster 1: Cultures Workshop on the role of Semantic Web in P EvidenceQ|| EvidenceQ|| Cluster 2: Markers Cluster 2: Markers EvidenceQX EvidenceQX 24 24 Cluste Guide C G evaluate the performance on benchmark corpora. ESWC 2012 We propose a a multi-signal pipeline approach that reconstructs We propose multi-signal pipeline approach that reconstructs F1-score ofof 0.49 for only text similarity F1-score 0.49 for only text similarity plausible provenance traces using the contents ofof the files and plausible provenance traces using the contents the files and F1-score ofof 0.70 for the aggregation of v F1-score 0.70 for the aggregation of va metadata as evidence ofof the relationships between files. metadata as evidence the relationships between files. The pipeline consists ofof four stages, each containing several The pipeline consists four stages, each containing several components that can be executed in in parallel: components that can be executed parallel: Future work Future work #$4:2-4#-';'<=>' #$4:2-4#-';'<=>' #$%&' #$%&' Following the planned methodology, we w Following the planned methodology, we 8$#A' 8$#A' @1-%1$#-AA)4,' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' G,,1-,+E$4'+42'1+4H)4,' ! ! " " components for each ofof the pipeline ph components for each the pipeline phas ./01+#0'*-0+2+0+'' ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' 6),4+7'9)70-1!' G,,1-,+0$1!' G,,1-,+0$1!' computational efficiency. computational efficiency. ( (
  • 5. isors: Paul Groth and Frank van Harmelen nt An initial prototype implementation adata describing how, As a first step we focus on dependencies between files instead of duced. sequences of operations. t often it is not tracked, We implemented a prototype of the pipeline using open-source sic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. As signal detectors we used well-known similarity measures. ovenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% ! <7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% #.":*597B*"C# )A*.4A2:/4% " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% 563-:6################# ?.)+/+)+% 49-9:+*9)6% ,<05,3*5/63-:6# 3,563-:6# 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results # # #></*?,5# We performed an experiment with a small set of biomedical ,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. 31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 on 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12