SlideShare una empresa de Scribd logo
1 de 154
Descargar para leer sin conexión
The Dataverse Network:
                      An Infrastructure for Data Sharing

                                            Gary King
                                        Harvard University


                  (5/28/08 talk at the Society for Scholarly Publishing)




                                                                             (5/28/08 talk at the Society
Gary King Harvard University ()   The Dataverse Network: An Infrastructure for Data Sharing          / 22
Papers




   Gary King (Harvard)   Dataverse Network   2 / 22
Papers




   Gary King, An Introduction to the Dataverse Network as an
   Infrastructure for Data Sharing, Sociological Methods and Research,
   32, 2 (November, 2007): 173–199.




   Gary King (Harvard)       Dataverse Network                       2 / 22
Papers




   Gary King, An Introduction to the Dataverse Network as an
   Infrastructure for Data Sharing, Sociological Methods and Research,
   32, 2 (November, 2007): 173–199.
   Micah Altman and Gary King. A Proposed Standard for the Scholarly
   Citation of Quantitative Data, D-Lib Magazine, 13, 3/4
   (March/April).




   Gary King (Harvard)       Dataverse Network                       2 / 22
Papers




   Gary King, An Introduction to the Dataverse Network as an
   Infrastructure for Data Sharing, Sociological Methods and Research,
   32, 2 (November, 2007): 173–199.
   Micah Altman and Gary King. A Proposed Standard for the Scholarly
   Citation of Quantitative Data, D-Lib Magazine, 13, 3/4
   (March/April).
   Dataverse Network project: http://TheData.org




   Gary King (Harvard)       Dataverse Network                       2 / 22
Infrastructure for Quantitative Data




    Gary King (Harvard)   Dataverse Network   3 / 22
Infrastructure for Quantitative Data


    Accessibility:




    Gary King (Harvard)   Dataverse Network   3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives




    Gary King (Harvard)            Dataverse Network   3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:
           Data in different archives have different identifiers




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:
           Data in different archives have different identifiers
           One major archive renumbered all its acquisitions




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:
           Data in different archives have different identifiers
           One major archive renumbered all its acquisitions
           Changes to data are made; identifiers are reused or deaccessioned; old
           data are lost




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:
           Data in different archives have different identifiers
           One major archive renumbered all its acquisitions
           Changes to data are made; identifiers are reused or deaccessioned; old
           data are lost
    Data sets are not like books




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:
           Data in different archives have different identifiers
           One major archive renumbered all its acquisitions
           Changes to data are made; identifiers are reused or deaccessioned; old
           data are lost
    Data sets are not like books
           Static data files (even if on the web): unreadable after a few years




    Gary King (Harvard)             Dataverse Network                                3 / 22
Infrastructure for Quantitative Data


    Accessibility:
           Most large data sets: in public archives
           Most data in published articles: not accessible, results not replicable
           without the original author
           Most data sets from NSF & NIH grants: not publicly available
    Problems even with professional archives:
           Data in different archives have different identifiers
           One major archive renumbered all its acquisitions
           Changes to data are made; identifiers are reused or deaccessioned; old
           data are lost
    Data sets are not like books
           Static data files (even if on the web): unreadable after a few years
           When storage methods change: some data sets are lost; others have
           altered content!


    Gary King (Harvard)             Dataverse Network                                3 / 22
What About a Centralized Data Access Solution?




   Gary King (Harvard)   Dataverse Network       4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible




   Gary King (Harvard)        Dataverse Network   4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place




   Gary King (Harvard)        Dataverse Network                      4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.




   Gary King (Harvard)        Dataverse Network                       4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.
   Why don’t researchers put data in public archives?




   Gary King (Harvard)        Dataverse Network                       4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.
   Why don’t researchers put data in public archives?
          The Archive gets the credit




   Gary King (Harvard)            Dataverse Network                   4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.
   Why don’t researchers put data in public archives?
          The Archive gets the credit
          Upon questioning: they want credit, control, and visibility




   Gary King (Harvard)            Dataverse Network                     4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.
   Why don’t researchers put data in public archives?
          The Archive gets the credit
          Upon questioning: they want credit, control, and visibility
          (So why don’t they worry about print publishers getting all the credit?




   Gary King (Harvard)            Dataverse Network                            4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.
   Why don’t researchers put data in public archives?
          The Archive gets the credit
          Upon questioning: they want credit, control, and visibility
          (So why don’t they worry about print publishers getting all the credit?
          Lack of data citations!)




   Gary King (Harvard)            Dataverse Network                            4 / 22
What About a Centralized Data Access Solution?


   Highly desirable when feasible
   Works great in genomics, astronomy, etc., when data formats are
   universal, goals are common, and agreements are in place
   Impossible when data are heterogeneous in format, origin, size, effort
   needed to collect or analyze, IRB access rules, etc.
   Why don’t researchers put data in public archives?
          The Archive gets the credit
          Upon questioning: they want credit, control, and visibility
          (So why don’t they worry about print publishers getting all the credit?
          Lack of data citations!)
   We propose: technological solutions to these political and social
   problems



   Gary King (Harvard)            Dataverse Network                            4 / 22
Requirements for Effective Data Sharing Infrastructure




   Gary King (Harvard)   Dataverse Network              5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.




   Gary King (Harvard)          Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author




   Gary King (Harvard)          Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met




   Gary King (Harvard)          Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization




   Gary King (Harvard)          Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .




   Gary King (Harvard)          Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted




   Gary King (Harvard)          Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R,




   Gary King (Harvard)          Dataverse Network                        5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux,




   Gary King (Harvard)          Dataverse Network                        5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.




   Gary King (Harvard)          Dataverse Network                        5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.
    Ease of Use Neither editors nor authors employ professional archivists




   Gary King (Harvard)         Dataverse Network                        5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.
    Ease of Use Neither editors nor authors employ professional archivists
    Legal Protection:




   Gary King (Harvard)         Dataverse Network                        5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.
    Ease of Use Neither editors nor authors employ professional archivists
    Legal Protection:
          Journals have liability protection for print; none for data




   Gary King (Harvard)             Dataverse Network                    5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.
    Ease of Use Neither editors nor authors employ professional archivists
    Legal Protection:
          Journals have liability protection for print; none for data
          If you put data on the web without IRB approval, you are violating
          federal regulations



   Gary King (Harvard)            Dataverse Network                            5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.
    Ease of Use Neither editors nor authors employ professional archivists
    Legal Protection:
          Journals have liability protection for print; none for data
          If you put data on the web without IRB approval, you are violating
          federal regulations
          (IRB approval must be for data distribution, not merely for the study)

   Gary King (Harvard)            Dataverse Network                           5 / 22
Requirements for Effective Data Sharing Infrastructure
    Recognition, for authors, journals, etc. in (1) citations to data, (2)
    citations to associated articles, and (3) visibility on the web.
    Public Distribution, without permission from the author
    Authorization: fulfill requirements the author originally met
    Validation: check that data exists, without authorization
    Persistence Decades from now. . . .
    Verification: data remains unchanged, even if converted from SPSS
    to Stata to R, from a PC to a Mac to Linux, and from 8 inch
    magnetic tape to 5.25 inch floppies to a DVD.
    Ease of Use Neither editors nor authors employ professional archivists
    Legal Protection:
          Journals have liability protection for print; none for data
          If you put data on the web without IRB approval, you are violating
          federal regulations
          (IRB approval must be for data distribution, not merely for the study)
          Solution must not require lawyers (we’ve automated the IRB)
   Gary King (Harvard)            Dataverse Network                           5 / 22
Rules for Citing Printed Matter




   Gary King (Harvard)   Dataverse Network   6 / 22
Rules for Citing Printed Matter



  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




First author (last name first)




    Gary King (Harvard)         Dataverse Network                    6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Second author




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Third author




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Year




       Gary King (Harvard)    Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Article title




     Gary King (Harvard)      Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Journal (no longer exists)




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Volume number




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Issue number




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Season




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Pages




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Special formatting codes




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Special indentation




    Gary King (Harvard)       Dataverse Network                      6 / 22
Rules for Citing Printed Matter


  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Citations: rule-based, precise, redundant




    Gary King (Harvard)         Dataverse Network                    6 / 22
Rules for Citing Printed Matter

  Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on
    Factor Analyzing Dichotomous Variables: The Case of Political
    Participation,” Political Methodology, Vol. 4: No. 2 (Spring):
    Pp. 39–62.




Print Citations Work: authors don’t think publishers get all the credit;
cited articles can be found; copyeditors don’t need to see the original to
know it exists; the link from citation to print persists



    Gary King (Harvard)         Dataverse Network                            6 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




    Gary King (Harvard)        Dataverse Network                        7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




  1   Author




      Gary King (Harvard)      Dataverse Network                        7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




  1   Author
  2   Year




      Gary King (Harvard)      Dataverse Network                        7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




  1   Author
  2   Year
  3   Title




      Gary King (Harvard)      Dataverse Network                        7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




  1   Author
  2   Year
  3   Title
  4   Unique Global Identifier: will work after URLs stop working




      Gary King (Harvard)       Dataverse Network                       7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




  1   Author
  2   Year
  3   Title
  4   Unique Global Identifier: will work after URLs stop working
  5   Linked to a Bridge Service (presently a URL:
      http://id.thedata.org/hdl%3A1902.4%2F00754)




      Gary King (Harvard)       Dataverse Network                       7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?==




  1   Author
  2   Year
  3   Title
  4   Unique Global Identifier: will work after URLs stop working
  5   Linked to a Bridge Service (presently a URL:
      http://id.thedata.org/hdl%3A1902.4%2F00754)
  6   Universal Numeric Fingerprint (UNF)



      Gary King (Harvard)       Dataverse Network                       7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics
  [Distributor];



  1   Author
  2   Year
  3   Title
  4   Unique Global Identifier: will work after URLs stop working
  5   Linked to a Bridge Service (presently a URL:
      http://id.thedata.org/hdl%3A1902.4%2F00754)
  6   Universal Numeric Fingerprint (UNF)
  7   Standard rules for adding citation elements

      Gary King (Harvard)        Dataverse Network                  7 / 22
A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,
  UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics
  [Distributor]; NORC [Producer].



  1   Author
  2   Year
  3   Title
  4   Unique Global Identifier: will work after URLs stop working
  5   Linked to a Bridge Service (presently a URL:
      http://id.thedata.org/hdl%3A1902.4%2F00754)
  6   Universal Numeric Fingerprint (UNF)
  7   Standard rules for adding citation elements

      Gary King (Harvard)        Dataverse Network                  7 / 22
Data to Universal Numeric Fingerprints




   Gary King (Harvard)   Dataverse Network   8 / 22
Data to Universal Numeric Fingerprints


      1   4    4     21   ···    121
                                      
  
     1   2    2     91   ···    212   
                                       
  
     1   9    2     72   ···    104   
                                       
  
     0   2    2      2   ···    321   
                                       
  
     1   6    2     12   ···    204   
                                       
  
     1   9    4     52   ···    311   
                                       
  
     0   3    2     23   ···    92    
                                       
  
     0   2    5     91   ···    212   
                                       
  
     0   5    8     91   ···    91    
                                       
  
     1   9    1     72   ···    104   
                                       
     .
      .   .
          .    .
               .      .
                      .   ..      .
                                  .
                                       
     .   .    .      .      .    .    
      1 2 2 91 · · · 212


   Gary King (Harvard)                 Dataverse Network   8 / 22
Data to Universal Numeric Fingerprints


      1   4    4     21   ···    121
                                      
  
     1   2    2     91   ···    212   
                                       
  
     1   9    2     72   ···    104   
                                       
  
     0   2    2      2   ···    321   
                                       
  
     1   6    2     12   ···    204   
                                       
     1   9    4     52   ···    311   
                                        =⇒ ZNQRI14053UZq389x0Bffg?==
                                      
  
     0   3    2     23   ···    92    
  
     0   2    5     91   ···    212   
                                       
  
     0   5    8     91   ···    91    
                                       
  
     1   9    1     72   ···    104   
                                       
     .
      .   .
          .    .
               .      .
                      .   ..      .
                                  .
                                       
     .   .    .      .      .    .    
      1 2 2 91 · · · 212


   Gary King (Harvard)                 Dataverse Network               8 / 22
Advantages of UNFs




   Gary King (Harvard)   Dataverse Network   9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file:


               .




   Gary King (Harvard)       Dataverse Network       9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware,

               .




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,

               .




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system,
           .




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software,
           .




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database,
           .




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)
   Noninvertible properties




   Gary King (Harvard)      Dataverse Network                         9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)
   Noninvertible properties
          UNFs convey no information about data content




   Gary King (Harvard)          Dataverse Network                     9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)
   Noninvertible properties
          UNFs convey no information about data content
          OK to distribute for highly sensitive, confidential, or proprietary data




   Gary King (Harvard)             Dataverse Network                            9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)
   Noninvertible properties
          UNFs convey no information about data content
          OK to distribute for highly sensitive, confidential, or proprietary data
          Copyeditor can validate data’s existence even without authorization




   Gary King (Harvard)             Dataverse Network                            9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)
   Noninvertible properties
          UNFs convey no information about data content
          OK to distribute for highly sensitive, confidential, or proprietary data
          Copyeditor can validate data’s existence even without authorization
   The citation refers to one specific data set that can’t ever be altered,
   even if journal doesn’t keep a copy




   Gary King (Harvard)             Dataverse Network                            9 / 22
Advantages of UNFs

   UNF is calculated from the content not the file: Its the Same UNF
   regardless of changes in computer hardware, storage medium,
   operating system, statistical software, database, or spreadsheet
   software.
   Cryptographic technology: any change in data content changes the
   UNF. (cannot tinker after the fact!)
   Noninvertible properties
          UNFs convey no information about data content
          OK to distribute for highly sensitive, confidential, or proprietary data
          Copyeditor can validate data’s existence even without authorization
   The citation refers to one specific data set that can’t ever be altered,
   even if journal doesn’t keep a copy
   Future researchers can quickly check that they have the same data as
   used by the author: merely recalculate the UNF

   Gary King (Harvard)             Dataverse Network                            9 / 22
Web 2.0 Terminology




   Gary King (Harvard)   Dataverse Network   10 / 22
Web 2.0 Terminology



   Software: find CD, install locally,




   Gary King (Harvard)        Dataverse Network   10 / 22
Web 2.0 Terminology



   Software: find CD, install locally, hit next,




   Gary King (Harvard)         Dataverse Network   10 / 22
Web 2.0 Terminology



   Software: find CD, install locally, hit next, hit next,




   Gary King (Harvard)         Dataverse Network            10 / 22
Web 2.0 Terminology



   Software: find CD, install locally, hit next, hit next, hit next. . .




   Gary King (Harvard)          Dataverse Network                         10 / 22
Web 2.0 Terminology



   Software: find CD, install locally, hit next, hit next, hit next. . .
   Web application software: no installation; load web browser and run
   (Dataverse Network Software)




   Gary King (Harvard)          Dataverse Network                         10 / 22
Web 2.0 Terminology



   Software: find CD, install locally, hit next, hit next, hit next. . .
   Web application software: no installation; load web browser and run
   (Dataverse Network Software)
   Host: The computers where the web application software runs
   (universities, archives, libraries)




   Gary King (Harvard)          Dataverse Network                         10 / 22
Web 2.0 Terminology



   Software: find CD, install locally, hit next, hit next, hit next. . .
   Web application software: no installation; load web browser and run
   (Dataverse Network Software)
   Host: The computers where the web application software runs
   (universities, archives, libraries)
   Virtual host: Where the web application software seems to run, but
   does not (web sites of: authors, journals, granting agencies, research
   centers, universities, scholarly organizations, etc.)




   Gary King (Harvard)          Dataverse Network                         10 / 22
po wered by the
                                                                                               Dataverse
                                                                                               Network™
                                                                                                    Pr oject




               Your web site                       Your dataverse branded as your web site but
                                                   served by the Dataverse Network, therefore re-
                                                   quiring no local installation and providing an
                                                   enormous array of services




Gary King (Harvard)            Dataverse Network                                                           11 / 22
po wered by the
                                          Dataverse
                                          Network™
                                               Pr oject




Gary King (Harvard)   Dataverse Network               12 / 22
po wered by the
                                          Dataverse
                                          Network™
                                               Pr oject




Gary King (Harvard)   Dataverse Network                     13 / 22
po wered by the
                                          Dataverse
                                          Network™
                                               Pr oject




Gary King (Harvard)   Dataverse Network                     14 / 22
po wered by the
                                                                                               Dataverse
                                                                                               Network™
                                                                                                    Pr oject




               Your web site                       Your dataverse branded as your web site but
                                                   served by the Dataverse Network, therefore re-
                                                   quiring no local installation and providing an
                                                   enormous array of services




Gary King (Harvard)            Dataverse Network                                                           15 / 22
Gary King (Harvard)   Dataverse Network   16 / 22
A Journal Dataverse for a Replication Data Archive




   Gary King (Harvard)   Dataverse Network           17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site
    Easy to setup: give DVN your style, and include a link to your new
    dataverse




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site
    Easy to setup: give DVN your style, and include a link to your new
    dataverse
    Easy to manage: no software or hardware installation, backups, worry
    about archiving standards, or data format transations; still exists if
    you move; easy to rebrand




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site
    Easy to setup: give DVN your style, and include a link to your new
    dataverse
    Easy to manage: no software or hardware installation, backups, worry
    about archiving standards, or data format transations; still exists if
    you move; easy to rebrand
    Does not disrupt workflow: copyeditor ensures data is cited; author is
    given password to upload data. Everything else is self-service




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site
    Easy to setup: give DVN your style, and include a link to your new
    dataverse
    Easy to manage: no software or hardware installation, backups, worry
    about archiving standards, or data format transations; still exists if
    you move; easy to rebrand
    Does not disrupt workflow: copyeditor ensures data is cited; author is
    given password to upload data. Everything else is self-service
    High acceptability: experiments indicate > 90% uptake for authors
    (without publication at stake)




   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site
    Easy to setup: give DVN your style, and include a link to your new
    dataverse
    Easy to manage: no software or hardware installation, backups, worry
    about archiving standards, or data format transations; still exists if
    you move; easy to rebrand
    Does not disrupt workflow: copyeditor ensures data is cited; author is
    given password to upload data. Everything else is self-service
    High acceptability: experiments indicate > 90% uptake for authors
    (without publication at stake)
    Reuse: data may appear on author’s dataverse too


   Gary King (Harvard)           Dataverse Network                         17 / 22
A Journal Dataverse for a Replication Data Archive
    Full service virtual archive, with numerous data services (citation,
    metadata, archiving, subsetting, conversion, translation, analysis, . . . )
    Hierarchical list of data: year/volume/issue/article/dataset
    Branded as yours: with the look and feel of your site
    Easy to setup: give DVN your style, and include a link to your new
    dataverse
    Easy to manage: no software or hardware installation, backups, worry
    about archiving standards, or data format transations; still exists if
    you move; easy to rebrand
    Does not disrupt workflow: copyeditor ensures data is cited; author is
    given password to upload data. Everything else is self-service
    High acceptability: experiments indicate > 90% uptake for authors
    (without publication at stake)
    Reuse: data may appear on author’s dataverse too
    Results: Journals with replication policies have three times the impact
    factor! (with dataverse, it should be more)
   Gary King (Harvard)           Dataverse Network                         17 / 22
Dataverse Uses




   Gary King (Harvard)   Dataverse Network   18 / 22
Dataverse Uses

   Journals, for replication data archives




   Gary King (Harvard)         Dataverse Network   18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data




   Gary King (Harvard)         Dataverse Network   18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download




   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis




   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data




   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data
   Granting agencies




   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data
   Granting agencies
   Research centers




   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data
   Granting agencies
   Research centers
   Major Research Projects




   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data
   Granting agencies
   Research centers
   Major Research Projects
   Academic departments, universities, data centers, libraries



   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data
   Granting agencies
   Research centers
   Major Research Projects
   Academic departments, universities, data centers, libraries
   Data archives


   Gary King (Harvard)        Dataverse Network                       18 / 22
Dataverse Uses

   Journals, for replication data archives
   Authors, for their own data
   Future Researchers: browse or search for a dataverse or dataset;
   forward citation search; verification via UNFs; subsetting; read
   metdata, abstract, & documentation; check for new versions;
   translate format; statistical analyses; download
   Teachers, a list or for in depth analysis
   Sections of scholarly organizations, to organize existing data
   Granting agencies
   Research centers
   Major Research Projects
   Academic departments, universities, data centers, libraries
   Data archives
   Using DVN tools with outside data
   Gary King (Harvard)        Dataverse Network                       18 / 22
The Universe of Data meets the Universe of Methods




   Gary King (Harvard)   Dataverse Network           19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing




   Gary King (Harvard)       Dataverse Network       19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first




   Gary King (Harvard)          Dataverse Network                   19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality




   Gary King (Harvard)           Dataverse Network                      19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers




   Gary King (Harvard)            Dataverse Network                      19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software




   Gary King (Harvard)            Dataverse Network                      19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods




   Gary King (Harvard)            Dataverse Network                      19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language




   Gary King (Harvard)            Dataverse Network                           19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language
          Result: Unified Syntax, the same 3 commands to use any method




   Gary King (Harvard)            Dataverse Network                           19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language
          Result: Unified Syntax, the same 3 commands to use any method
          Automatically generated Graphical User Interface with all the world’s
          methods




   Gary King (Harvard)            Dataverse Network                          19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language
          Result: Unified Syntax, the same 3 commands to use any method
          Automatically generated Graphical User Interface with all the world’s
          methods
   R + Zelig + Dataverse Network




   Gary King (Harvard)            Dataverse Network                          19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language
          Result: Unified Syntax, the same 3 commands to use any method
          Automatically generated Graphical User Interface with all the world’s
          methods
   R + Zelig + Dataverse Network
          Greatly reduced time from methods development to use




   Gary King (Harvard)            Dataverse Network                          19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language
          Result: Unified Syntax, the same 3 commands to use any method
          Automatically generated Graphical User Interface with all the world’s
          methods
   R + Zelig + Dataverse Network
          Greatly reduced time from methods development to use
          Easy for applied researchers even if non-programmers



   Gary King (Harvard)            Dataverse Network                          19 / 22
The Universe of Data meets the Universe of Methods

   R Project for Statistical Computing
          nearly 1000 packages; most new methods appear in R first
          Highly diverse examples, syntax, documentation, and quality
          Difficult for statistics-types; harder for applied researchers
   Zelig: Everyone’s Statistical Software
          An ontology we developed of almost all statistical methods
          Users incorporate original packages via simple bridge functions via a
          simple model description language
          Result: Unified Syntax, the same 3 commands to use any method
          Automatically generated Graphical User Interface with all the world’s
          methods
   R + Zelig + Dataverse Network
          Greatly reduced time from methods development to use
          Easy for applied researchers even if non-programmers
          Can save R script for replication or further analysis


   Gary King (Harvard)            Dataverse Network                          19 / 22
Licensing




   Gary King (Harvard)   Dataverse Network   20 / 22
Licensing


    Web application software




   Gary King (Harvard)         Dataverse Network   20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs




   Gary King (Harvard)             Dataverse Network   20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time




   Gary King (Harvard)           Dataverse Network              20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software




   Gary King (Harvard)           Dataverse Network              20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License




   Gary King (Harvard)           Dataverse Network              20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership




   Gary King (Harvard)           Dataverse Network              20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership
          If you don’t like the new version: you can make a new one




   Gary King (Harvard)           Dataverse Network                    20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership
          If you don’t like the new version: you can make a new one
          You own the software & the underlying code




   Gary King (Harvard)           Dataverse Network                    20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership
          If you don’t like the new version: you can make a new one
          You own the software & the underlying code
          The license guarantees that this will remain true in the future




   Gary King (Harvard)            Dataverse Network                         20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership
          If you don’t like the new version: you can make a new one
          You own the software & the underlying code
          The license guarantees that this will remain true in the future
    Licensing data




   Gary King (Harvard)            Dataverse Network                         20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership
          If you don’t like the new version: you can make a new one
          You own the software & the underlying code
          The license guarantees that this will remain true in the future
    Licensing data
          DVN automates the IRB process; no lawyers necessary




   Gary King (Harvard)            Dataverse Network                         20 / 22
Licensing


    Web application software
          Pros: easy to use, no installation costs
          Cons: the software can vanish or change at any time
    Dataverse Network Software
          Affero GPL License
          Open source, public ownership
          If you don’t like the new version: you can make a new one
          You own the software & the underlying code
          The license guarantees that this will remain true in the future
    Licensing data
          DVN automates the IRB process; no lawyers necessary
          Data may be restricted in many ways, while metadata is available




   Gary King (Harvard)            Dataverse Network                          20 / 22
For more information




                         http://TheData.org




   Gary King (Harvard)         Dataverse Network   21 / 22
Technology used in DVN Software




   Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team
   picked for JavaOne; Sun engineers regularly call for advice)




   Gary King (Harvard)      Dataverse Network                      22 / 22
Technology used in DVN Software




   Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team
   picked for JavaOne; Sun engineers regularly call for advice)
   Application server: GlassFish (wrote press release on our project)




   Gary King (Harvard)        Dataverse Network                         22 / 22
Technology used in DVN Software




   Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team
   picked for JavaOne; Sun engineers regularly call for advice)
   Application server: GlassFish (wrote press release on our project)
   Database: we use PostgreSQL (can substitute others)




   Gary King (Harvard)        Dataverse Network                         22 / 22
Technology used in DVN Software




   Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team
   picked for JavaOne; Sun engineers regularly call for advice)
   Application server: GlassFish (wrote press release on our project)
   Database: we use PostgreSQL (can substitute others)
   Statistical computing: R and Zelig




   Gary King (Harvard)        Dataverse Network                         22 / 22

Más contenido relacionado

La actualidad más candente

Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data DiscoveryARDC
 
D4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementResearch Data Alliance
 
Fair data principles for AOASG
Fair data principles for AOASGFair data principles for AOASG
Fair data principles for AOASGKeith Russell
 
Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Rebekah Cummings
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse CommonsMerce Crosas
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
 
LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Europe
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessBen Busby
 
dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...datascienceiqss
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureResearch Data Alliance
 

La actualidad más candente (20)

"Cool" metadata for FAIR data
"Cool" metadata for FAIR data"Cool" metadata for FAIR data
"Cool" metadata for FAIR data
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
 
D4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data management
 
Fair data principles for AOASG
Fair data principles for AOASGFair data principles for AOASG
Fair data principles for AOASG
 
Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...
 
Shareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your ResearchShareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your Research
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
Demography pro sem
Demography pro semDemography pro sem
Demography pro sem
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 
Rusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing AccessRusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing Access
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_access
 
dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology Infrastructure
 

Destacado

Dataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth QuigleyDataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth Quigleydatascienceiqss
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...Merce Crosas
 
Introduction to the Logic of social inquiry
Introduction to the Logic of social inquiryIntroduction to the Logic of social inquiry
Introduction to the Logic of social inquiryJohn Bradford
 
Dataverse: Helping Researchers Publish Their Data Through Automation
Dataverse: Helping Researchers Publish Their Data Through Automation�Dataverse: Helping Researchers Publish Their Data Through Automation�
Dataverse: Helping Researchers Publish Their Data Through AutomationEleni Castro, MLIS
 
Metadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai ChristianMetadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai Christiandatascienceiqss
 
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin ShenqinDataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqindatascienceiqss
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgmandatascienceiqss
 
Dataverse at Cariniana network
Dataverse at Cariniana networkDataverse at Cariniana network
Dataverse at Cariniana networkCariniana Rede
 

Destacado (9)

Dataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth QuigleyDataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth Quigley
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
 
Introduction to the Logic of social inquiry
Introduction to the Logic of social inquiryIntroduction to the Logic of social inquiry
Introduction to the Logic of social inquiry
 
Dataverse: Helping Researchers Publish Their Data Through Automation
Dataverse: Helping Researchers Publish Their Data Through Automation�Dataverse: Helping Researchers Publish Their Data Through Automation�
Dataverse: Helping Researchers Publish Their Data Through Automation
 
Metadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai ChristianMetadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai Christian
 
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin ShenqinDataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Dataverse at Cariniana network
Dataverse at Cariniana networkDataverse at Cariniana network
Dataverse at Cariniana network
 

Similar a 295 king

Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesMicah Altman
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...Hilmar Lapp
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseGigaScience, BGI Hong Kong
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUKerstin Lehnert
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverseMerce Crosas
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0mehmood78
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
Reading Group: From Database to Dataspaces
Reading Group: From Database to DataspacesReading Group: From Database to Dataspaces
Reading Group: From Database to DataspacesJürgen Umbrich
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 

Similar a 295 king (20)

Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGU
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Reading Group: From Database to Dataspaces
Reading Group: From Database to DataspacesReading Group: From Database to Dataspaces
Reading Group: From Database to Dataspaces
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 

Más de Society for Scholarly Publishing

04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadowsSociety for Scholarly Publishing
 
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterickSociety for Scholarly Publishing
 

Más de Society for Scholarly Publishing (20)

10052016 ssp seminar2_newsham
10052016 ssp seminar2_newsham10052016 ssp seminar2_newsham
10052016 ssp seminar2_newsham
 
10052016 ssp seminar2_rivera
10052016 ssp seminar2_rivera10052016 ssp seminar2_rivera
10052016 ssp seminar2_rivera
 
10052016 ssp seminar2_pesanelli
10052016 ssp seminar2_pesanelli10052016 ssp seminar2_pesanelli
10052016 ssp seminar2_pesanelli
 
10052016 ssp seminar2_harley
10052016 ssp seminar2_harley10052016 ssp seminar2_harley
10052016 ssp seminar2_harley
 
10042016 ssp seminar1_session4_myers
10042016 ssp seminar1_session4_myers10042016 ssp seminar1_session4_myers
10042016 ssp seminar1_session4_myers
 
10042016 ssp seminar1_session4_demers
10042016 ssp seminar1_session4_demers10042016 ssp seminar1_session4_demers
10042016 ssp seminar1_session4_demers
 
10042016 ssp seminar1_session4_cochran
10042016 ssp seminar1_session4_cochran10042016 ssp seminar1_session4_cochran
10042016 ssp seminar1_session4_cochran
 
10042016 ssp seminar1_session3_stanley
10042016 ssp seminar1_session3_stanley10042016 ssp seminar1_session3_stanley
10042016 ssp seminar1_session3_stanley
 
10042016 ssp seminar1_session3_ranganathan
10042016 ssp seminar1_session3_ranganathan10042016 ssp seminar1_session3_ranganathan
10042016 ssp seminar1_session3_ranganathan
 
10042016 ssp seminar1_session3_odike
10042016 ssp seminar1_session3_odike10042016 ssp seminar1_session3_odike
10042016 ssp seminar1_session3_odike
 
10042016 ssp seminar1_session3_cochran
10042016 ssp seminar1_session3_cochran10042016 ssp seminar1_session3_cochran
10042016 ssp seminar1_session3_cochran
 
10042016 ssp seminar1_session2_walker
10042016 ssp seminar1_session2_walker10042016 ssp seminar1_session2_walker
10042016 ssp seminar1_session2_walker
 
10042016 ssp seminar1_session2_ivins
10042016 ssp seminar1_session2_ivins10042016 ssp seminar1_session2_ivins
10042016 ssp seminar1_session2_ivins
 
10042016 ssp seminar1_session2_holland
10042016 ssp seminar1_session2_holland10042016 ssp seminar1_session2_holland
10042016 ssp seminar1_session2_holland
 
10042016 ssp seminar1_session1_stanley
10042016 ssp seminar1_session1_stanley10042016 ssp seminar1_session1_stanley
10042016 ssp seminar1_session1_stanley
 
10042016 ssp seminar1_session1_keane
10042016 ssp seminar1_session1_keane10042016 ssp seminar1_session1_keane
10042016 ssp seminar1_session1_keane
 
10042016 ssp seminar1_session1_ivins
10042016 ssp seminar1_session1_ivins10042016 ssp seminar1_session1_ivins
10042016 ssp seminar1_session1_ivins
 
10042016 ssp seminar1_session1_asadilari
10042016 ssp seminar1_session1_asadilari10042016 ssp seminar1_session1_asadilari
10042016 ssp seminar1_session1_asadilari
 
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
 
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
 

295 king

  • 1. The Dataverse Network: An Infrastructure for Data Sharing Gary King Harvard University (5/28/08 talk at the Society for Scholarly Publishing) (5/28/08 talk at the Society Gary King Harvard University () The Dataverse Network: An Infrastructure for Data Sharing / 22
  • 2. Papers Gary King (Harvard) Dataverse Network 2 / 22
  • 3. Papers Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199. Gary King (Harvard) Dataverse Network 2 / 22
  • 4. Papers Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199. Micah Altman and Gary King. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib Magazine, 13, 3/4 (March/April). Gary King (Harvard) Dataverse Network 2 / 22
  • 5. Papers Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199. Micah Altman and Gary King. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib Magazine, 13, 3/4 (March/April). Dataverse Network project: http://TheData.org Gary King (Harvard) Dataverse Network 2 / 22
  • 6. Infrastructure for Quantitative Data Gary King (Harvard) Dataverse Network 3 / 22
  • 7. Infrastructure for Quantitative Data Accessibility: Gary King (Harvard) Dataverse Network 3 / 22
  • 8. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Gary King (Harvard) Dataverse Network 3 / 22
  • 9. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Gary King (Harvard) Dataverse Network 3 / 22
  • 10. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Gary King (Harvard) Dataverse Network 3 / 22
  • 11. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Gary King (Harvard) Dataverse Network 3 / 22
  • 12. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Data in different archives have different identifiers Gary King (Harvard) Dataverse Network 3 / 22
  • 13. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Data in different archives have different identifiers One major archive renumbered all its acquisitions Gary King (Harvard) Dataverse Network 3 / 22
  • 14. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost Gary King (Harvard) Dataverse Network 3 / 22
  • 15. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost Data sets are not like books Gary King (Harvard) Dataverse Network 3 / 22
  • 16. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost Data sets are not like books Static data files (even if on the web): unreadable after a few years Gary King (Harvard) Dataverse Network 3 / 22
  • 17. Infrastructure for Quantitative Data Accessibility: Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from NSF & NIH grants: not publicly available Problems even with professional archives: Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost Data sets are not like books Static data files (even if on the web): unreadable after a few years When storage methods change: some data sets are lost; others have altered content! Gary King (Harvard) Dataverse Network 3 / 22
  • 18. What About a Centralized Data Access Solution? Gary King (Harvard) Dataverse Network 4 / 22
  • 19. What About a Centralized Data Access Solution? Highly desirable when feasible Gary King (Harvard) Dataverse Network 4 / 22
  • 20. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Gary King (Harvard) Dataverse Network 4 / 22
  • 21. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Gary King (Harvard) Dataverse Network 4 / 22
  • 22. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives? Gary King (Harvard) Dataverse Network 4 / 22
  • 23. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives? The Archive gets the credit Gary King (Harvard) Dataverse Network 4 / 22
  • 24. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives? The Archive gets the credit Upon questioning: they want credit, control, and visibility Gary King (Harvard) Dataverse Network 4 / 22
  • 25. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives? The Archive gets the credit Upon questioning: they want credit, control, and visibility (So why don’t they worry about print publishers getting all the credit? Gary King (Harvard) Dataverse Network 4 / 22
  • 26. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives? The Archive gets the credit Upon questioning: they want credit, control, and visibility (So why don’t they worry about print publishers getting all the credit? Lack of data citations!) Gary King (Harvard) Dataverse Network 4 / 22
  • 27. What About a Centralized Data Access Solution? Highly desirable when feasible Works great in genomics, astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives? The Archive gets the credit Upon questioning: they want credit, control, and visibility (So why don’t they worry about print publishers getting all the credit? Lack of data citations!) We propose: technological solutions to these political and social problems Gary King (Harvard) Dataverse Network 4 / 22
  • 28. Requirements for Effective Data Sharing Infrastructure Gary King (Harvard) Dataverse Network 5 / 22
  • 29. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Gary King (Harvard) Dataverse Network 5 / 22
  • 30. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Gary King (Harvard) Dataverse Network 5 / 22
  • 31. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Gary King (Harvard) Dataverse Network 5 / 22
  • 32. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Gary King (Harvard) Dataverse Network 5 / 22
  • 33. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Gary King (Harvard) Dataverse Network 5 / 22
  • 34. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted Gary King (Harvard) Dataverse Network 5 / 22
  • 35. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, Gary King (Harvard) Dataverse Network 5 / 22
  • 36. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, Gary King (Harvard) Dataverse Network 5 / 22
  • 37. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Gary King (Harvard) Dataverse Network 5 / 22
  • 38. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Gary King (Harvard) Dataverse Network 5 / 22
  • 39. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection: Gary King (Harvard) Dataverse Network 5 / 22
  • 40. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection: Journals have liability protection for print; none for data Gary King (Harvard) Dataverse Network 5 / 22
  • 41. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection: Journals have liability protection for print; none for data If you put data on the web without IRB approval, you are violating federal regulations Gary King (Harvard) Dataverse Network 5 / 22
  • 42. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection: Journals have liability protection for print; none for data If you put data on the web without IRB approval, you are violating federal regulations (IRB approval must be for data distribution, not merely for the study) Gary King (Harvard) Dataverse Network 5 / 22
  • 43. Requirements for Effective Data Sharing Infrastructure Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection: Journals have liability protection for print; none for data If you put data on the web without IRB approval, you are violating federal regulations (IRB approval must be for data distribution, not merely for the study) Solution must not require lawyers (we’ve automated the IRB) Gary King (Harvard) Dataverse Network 5 / 22
  • 44. Rules for Citing Printed Matter Gary King (Harvard) Dataverse Network 6 / 22
  • 45. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Gary King (Harvard) Dataverse Network 6 / 22
  • 46. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. First author (last name first) Gary King (Harvard) Dataverse Network 6 / 22
  • 47. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Second author Gary King (Harvard) Dataverse Network 6 / 22
  • 48. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Third author Gary King (Harvard) Dataverse Network 6 / 22
  • 49. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Year Gary King (Harvard) Dataverse Network 6 / 22
  • 50. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Article title Gary King (Harvard) Dataverse Network 6 / 22
  • 51. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Journal (no longer exists) Gary King (Harvard) Dataverse Network 6 / 22
  • 52. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Volume number Gary King (Harvard) Dataverse Network 6 / 22
  • 53. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Issue number Gary King (Harvard) Dataverse Network 6 / 22
  • 54. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Season Gary King (Harvard) Dataverse Network 6 / 22
  • 55. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Pages Gary King (Harvard) Dataverse Network 6 / 22
  • 56. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Special formatting codes Gary King (Harvard) Dataverse Network 6 / 22
  • 57. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Special indentation Gary King (Harvard) Dataverse Network 6 / 22
  • 58. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Citations: rule-based, precise, redundant Gary King (Harvard) Dataverse Network 6 / 22
  • 59. Rules for Citing Printed Matter Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring): Pp. 39–62. Print Citations Work: authors don’t think publishers get all the credit; cited articles can be found; copyeditors don’t need to see the original to know it exists; the link from citation to print persists Gary King (Harvard) Dataverse Network 6 / 22
  • 60. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== Gary King (Harvard) Dataverse Network 7 / 22
  • 61. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== 1 Author Gary King (Harvard) Dataverse Network 7 / 22
  • 62. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== 1 Author 2 Year Gary King (Harvard) Dataverse Network 7 / 22
  • 63. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== 1 Author 2 Year 3 Title Gary King (Harvard) Dataverse Network 7 / 22
  • 64. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== 1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working Gary King (Harvard) Dataverse Network 7 / 22
  • 65. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== 1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL: http://id.thedata.org/hdl%3A1902.4%2F00754) Gary King (Harvard) Dataverse Network 7 / 22
  • 66. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== 1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL: http://id.thedata.org/hdl%3A1902.4%2F00754) 6 Universal Numeric Fingerprint (UNF) Gary King (Harvard) Dataverse Network 7 / 22
  • 67. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics [Distributor]; 1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL: http://id.thedata.org/hdl%3A1902.4%2F00754) 6 Universal Numeric Fingerprint (UNF) 7 Standard rules for adding citation elements Gary King (Harvard) Dataverse Network 7 / 22
  • 68. A New Citation Standard for Numeric Data Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics [Distributor]; NORC [Producer]. 1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL: http://id.thedata.org/hdl%3A1902.4%2F00754) 6 Universal Numeric Fingerprint (UNF) 7 Standard rules for adding citation elements Gary King (Harvard) Dataverse Network 7 / 22
  • 69. Data to Universal Numeric Fingerprints Gary King (Harvard) Dataverse Network 8 / 22
  • 70. Data to Universal Numeric Fingerprints 1 4 4 21 ··· 121     1 2 2 91 ··· 212     1 9 2 72 ··· 104     0 2 2 2 ··· 321     1 6 2 12 ··· 204     1 9 4 52 ··· 311     0 3 2 23 ··· 92     0 2 5 91 ··· 212     0 5 8 91 ··· 91     1 9 1 72 ··· 104    . . . . . . . . .. . .   . . . . . .  1 2 2 91 · · · 212 Gary King (Harvard) Dataverse Network 8 / 22
  • 71. Data to Universal Numeric Fingerprints 1 4 4 21 ··· 121     1 2 2 91 ··· 212     1 9 2 72 ··· 104     0 2 2 2 ··· 321     1 6 2 12 ··· 204    1 9 4 52 ··· 311   =⇒ ZNQRI14053UZq389x0Bffg?==     0 3 2 23 ··· 92    0 2 5 91 ··· 212     0 5 8 91 ··· 91     1 9 1 72 ··· 104    . . . . . . . . .. . .   . . . . . .  1 2 2 91 · · · 212 Gary King (Harvard) Dataverse Network 8 / 22
  • 72. Advantages of UNFs Gary King (Harvard) Dataverse Network 9 / 22
  • 73. Advantages of UNFs UNF is calculated from the content not the file: . Gary King (Harvard) Dataverse Network 9 / 22
  • 74. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, . Gary King (Harvard) Dataverse Network 9 / 22
  • 75. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, . Gary King (Harvard) Dataverse Network 9 / 22
  • 76. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, . Gary King (Harvard) Dataverse Network 9 / 22
  • 77. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, . Gary King (Harvard) Dataverse Network 9 / 22
  • 78. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, . Gary King (Harvard) Dataverse Network 9 / 22
  • 79. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Gary King (Harvard) Dataverse Network 9 / 22
  • 80. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Gary King (Harvard) Dataverse Network 9 / 22
  • 81. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Noninvertible properties Gary King (Harvard) Dataverse Network 9 / 22
  • 82. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Noninvertible properties UNFs convey no information about data content Gary King (Harvard) Dataverse Network 9 / 22
  • 83. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Noninvertible properties UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Gary King (Harvard) Dataverse Network 9 / 22
  • 84. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Noninvertible properties UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Copyeditor can validate data’s existence even without authorization Gary King (Harvard) Dataverse Network 9 / 22
  • 85. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Noninvertible properties UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Copyeditor can validate data’s existence even without authorization The citation refers to one specific data set that can’t ever be altered, even if journal doesn’t keep a copy Gary King (Harvard) Dataverse Network 9 / 22
  • 86. Advantages of UNFs UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, operating system, statistical software, database, or spreadsheet software. Cryptographic technology: any change in data content changes the UNF. (cannot tinker after the fact!) Noninvertible properties UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Copyeditor can validate data’s existence even without authorization The citation refers to one specific data set that can’t ever be altered, even if journal doesn’t keep a copy Future researchers can quickly check that they have the same data as used by the author: merely recalculate the UNF Gary King (Harvard) Dataverse Network 9 / 22
  • 87. Web 2.0 Terminology Gary King (Harvard) Dataverse Network 10 / 22
  • 88. Web 2.0 Terminology Software: find CD, install locally, Gary King (Harvard) Dataverse Network 10 / 22
  • 89. Web 2.0 Terminology Software: find CD, install locally, hit next, Gary King (Harvard) Dataverse Network 10 / 22
  • 90. Web 2.0 Terminology Software: find CD, install locally, hit next, hit next, Gary King (Harvard) Dataverse Network 10 / 22
  • 91. Web 2.0 Terminology Software: find CD, install locally, hit next, hit next, hit next. . . Gary King (Harvard) Dataverse Network 10 / 22
  • 92. Web 2.0 Terminology Software: find CD, install locally, hit next, hit next, hit next. . . Web application software: no installation; load web browser and run (Dataverse Network Software) Gary King (Harvard) Dataverse Network 10 / 22
  • 93. Web 2.0 Terminology Software: find CD, install locally, hit next, hit next, hit next. . . Web application software: no installation; load web browser and run (Dataverse Network Software) Host: The computers where the web application software runs (universities, archives, libraries) Gary King (Harvard) Dataverse Network 10 / 22
  • 94. Web 2.0 Terminology Software: find CD, install locally, hit next, hit next, hit next. . . Web application software: no installation; load web browser and run (Dataverse Network Software) Host: The computers where the web application software runs (universities, archives, libraries) Virtual host: Where the web application software seems to run, but does not (web sites of: authors, journals, granting agencies, research centers, universities, scholarly organizations, etc.) Gary King (Harvard) Dataverse Network 10 / 22
  • 95. po wered by the Dataverse Network™ Pr oject Your web site Your dataverse branded as your web site but served by the Dataverse Network, therefore re- quiring no local installation and providing an enormous array of services Gary King (Harvard) Dataverse Network 11 / 22
  • 96. po wered by the Dataverse Network™ Pr oject Gary King (Harvard) Dataverse Network 12 / 22
  • 97. po wered by the Dataverse Network™ Pr oject Gary King (Harvard) Dataverse Network 13 / 22
  • 98. po wered by the Dataverse Network™ Pr oject Gary King (Harvard) Dataverse Network 14 / 22
  • 99. po wered by the Dataverse Network™ Pr oject Your web site Your dataverse branded as your web site but served by the Dataverse Network, therefore re- quiring no local installation and providing an enormous array of services Gary King (Harvard) Dataverse Network 15 / 22
  • 100. Gary King (Harvard) Dataverse Network 16 / 22
  • 101. A Journal Dataverse for a Replication Data Archive Gary King (Harvard) Dataverse Network 17 / 22
  • 102. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Gary King (Harvard) Dataverse Network 17 / 22
  • 103. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Gary King (Harvard) Dataverse Network 17 / 22
  • 104. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Gary King (Harvard) Dataverse Network 17 / 22
  • 105. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Gary King (Harvard) Dataverse Network 17 / 22
  • 106. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand Gary King (Harvard) Dataverse Network 17 / 22
  • 107. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand Does not disrupt workflow: copyeditor ensures data is cited; author is given password to upload data. Everything else is self-service Gary King (Harvard) Dataverse Network 17 / 22
  • 108. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand Does not disrupt workflow: copyeditor ensures data is cited; author is given password to upload data. Everything else is self-service High acceptability: experiments indicate > 90% uptake for authors (without publication at stake) Gary King (Harvard) Dataverse Network 17 / 22
  • 109. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand Does not disrupt workflow: copyeditor ensures data is cited; author is given password to upload data. Everything else is self-service High acceptability: experiments indicate > 90% uptake for authors (without publication at stake) Reuse: data may appear on author’s dataverse too Gary King (Harvard) Dataverse Network 17 / 22
  • 110. A Journal Dataverse for a Replication Data Archive Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) Hierarchical list of data: year/volume/issue/article/dataset Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand Does not disrupt workflow: copyeditor ensures data is cited; author is given password to upload data. Everything else is self-service High acceptability: experiments indicate > 90% uptake for authors (without publication at stake) Reuse: data may appear on author’s dataverse too Results: Journals with replication policies have three times the impact factor! (with dataverse, it should be more) Gary King (Harvard) Dataverse Network 17 / 22
  • 111. Dataverse Uses Gary King (Harvard) Dataverse Network 18 / 22
  • 112. Dataverse Uses Journals, for replication data archives Gary King (Harvard) Dataverse Network 18 / 22
  • 113. Dataverse Uses Journals, for replication data archives Authors, for their own data Gary King (Harvard) Dataverse Network 18 / 22
  • 114. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Gary King (Harvard) Dataverse Network 18 / 22
  • 115. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Gary King (Harvard) Dataverse Network 18 / 22
  • 116. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Gary King (Harvard) Dataverse Network 18 / 22
  • 117. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Gary King (Harvard) Dataverse Network 18 / 22
  • 118. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Gary King (Harvard) Dataverse Network 18 / 22
  • 119. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects Gary King (Harvard) Dataverse Network 18 / 22
  • 120. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects Academic departments, universities, data centers, libraries Gary King (Harvard) Dataverse Network 18 / 22
  • 121. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects Academic departments, universities, data centers, libraries Data archives Gary King (Harvard) Dataverse Network 18 / 22
  • 122. Dataverse Uses Journals, for replication data archives Authors, for their own data Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects Academic departments, universities, data centers, libraries Data archives Using DVN tools with outside data Gary King (Harvard) Dataverse Network 18 / 22
  • 123. The Universe of Data meets the Universe of Methods Gary King (Harvard) Dataverse Network 19 / 22
  • 124. The Universe of Data meets the Universe of Methods R Project for Statistical Computing Gary King (Harvard) Dataverse Network 19 / 22
  • 125. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Gary King (Harvard) Dataverse Network 19 / 22
  • 126. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Gary King (Harvard) Dataverse Network 19 / 22
  • 127. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Gary King (Harvard) Dataverse Network 19 / 22
  • 128. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software Gary King (Harvard) Dataverse Network 19 / 22
  • 129. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Gary King (Harvard) Dataverse Network 19 / 22
  • 130. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Gary King (Harvard) Dataverse Network 19 / 22
  • 131. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Result: Unified Syntax, the same 3 commands to use any method Gary King (Harvard) Dataverse Network 19 / 22
  • 132. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Result: Unified Syntax, the same 3 commands to use any method Automatically generated Graphical User Interface with all the world’s methods Gary King (Harvard) Dataverse Network 19 / 22
  • 133. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Result: Unified Syntax, the same 3 commands to use any method Automatically generated Graphical User Interface with all the world’s methods R + Zelig + Dataverse Network Gary King (Harvard) Dataverse Network 19 / 22
  • 134. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Result: Unified Syntax, the same 3 commands to use any method Automatically generated Graphical User Interface with all the world’s methods R + Zelig + Dataverse Network Greatly reduced time from methods development to use Gary King (Harvard) Dataverse Network 19 / 22
  • 135. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Result: Unified Syntax, the same 3 commands to use any method Automatically generated Graphical User Interface with all the world’s methods R + Zelig + Dataverse Network Greatly reduced time from methods development to use Easy for applied researchers even if non-programmers Gary King (Harvard) Dataverse Network 19 / 22
  • 136. The Universe of Data meets the Universe of Methods R Project for Statistical Computing nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Difficult for statistics-types; harder for applied researchers Zelig: Everyone’s Statistical Software An ontology we developed of almost all statistical methods Users incorporate original packages via simple bridge functions via a simple model description language Result: Unified Syntax, the same 3 commands to use any method Automatically generated Graphical User Interface with all the world’s methods R + Zelig + Dataverse Network Greatly reduced time from methods development to use Easy for applied researchers even if non-programmers Can save R script for replication or further analysis Gary King (Harvard) Dataverse Network 19 / 22
  • 137. Licensing Gary King (Harvard) Dataverse Network 20 / 22
  • 138. Licensing Web application software Gary King (Harvard) Dataverse Network 20 / 22
  • 139. Licensing Web application software Pros: easy to use, no installation costs Gary King (Harvard) Dataverse Network 20 / 22
  • 140. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Gary King (Harvard) Dataverse Network 20 / 22
  • 141. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Gary King (Harvard) Dataverse Network 20 / 22
  • 142. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Gary King (Harvard) Dataverse Network 20 / 22
  • 143. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership Gary King (Harvard) Dataverse Network 20 / 22
  • 144. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership If you don’t like the new version: you can make a new one Gary King (Harvard) Dataverse Network 20 / 22
  • 145. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership If you don’t like the new version: you can make a new one You own the software & the underlying code Gary King (Harvard) Dataverse Network 20 / 22
  • 146. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership If you don’t like the new version: you can make a new one You own the software & the underlying code The license guarantees that this will remain true in the future Gary King (Harvard) Dataverse Network 20 / 22
  • 147. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership If you don’t like the new version: you can make a new one You own the software & the underlying code The license guarantees that this will remain true in the future Licensing data Gary King (Harvard) Dataverse Network 20 / 22
  • 148. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership If you don’t like the new version: you can make a new one You own the software & the underlying code The license guarantees that this will remain true in the future Licensing data DVN automates the IRB process; no lawyers necessary Gary King (Harvard) Dataverse Network 20 / 22
  • 149. Licensing Web application software Pros: easy to use, no installation costs Cons: the software can vanish or change at any time Dataverse Network Software Affero GPL License Open source, public ownership If you don’t like the new version: you can make a new one You own the software & the underlying code The license guarantees that this will remain true in the future Licensing data DVN automates the IRB process; no lawyers necessary Data may be restricted in many ways, while metadata is available Gary King (Harvard) Dataverse Network 20 / 22
  • 150. For more information http://TheData.org Gary King (Harvard) Dataverse Network 21 / 22
  • 151. Technology used in DVN Software Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Gary King (Harvard) Dataverse Network 22 / 22
  • 152. Technology used in DVN Software Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Application server: GlassFish (wrote press release on our project) Gary King (Harvard) Dataverse Network 22 / 22
  • 153. Technology used in DVN Software Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Application server: GlassFish (wrote press release on our project) Database: we use PostgreSQL (can substitute others) Gary King (Harvard) Dataverse Network 22 / 22
  • 154. Technology used in DVN Software Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Application server: GlassFish (wrote press release on our project) Database: we use PostgreSQL (can substitute others) Statistical computing: R and Zelig Gary King (Harvard) Dataverse Network 22 / 22