SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Scien&fic	
  Data	
  Management	
  
        A	
  tutorial	
  at	
  ICADL	
  2011	
  
              October	
  24,	
  2011	
  
                          	
  
                   Jian	
  Qin	
  
          School	
  of	
  Informa&on	
  Studies	
  
              Syracuse	
  University	
  
           hGp://eslib.ischool.syr.edu/	
  
                             	
  
The	
  morning	
  ahead	
  
                        An	
  environmental	
  scan	
  
                        •  E-­‐Science,	
  cyberinfrastructure,	
  and	
  data	
  
                        •  What	
  do	
  all	
  these	
  	
  have	
  to	
  do	
  with	
  me?	
  


                                  Case	
  study:	
  The	
  gravita&onal	
  wave	
  
                                  research	
  data	
  management	
  	
  


                                                    Group	
  work:	
  Role	
  play	
  in	
  
                                                    developing	
  data	
  management	
  
                                                    ini&a&ves	
  	
  
12/18/11	
  15:51	
                                   Overview	
  of	
  E-­‐Science	
              2	
  
An	
  environmental	
  scan	
  
•  E-­‐Science,	
  cyberinfrastructure,	
  and	
  data	
  
•  What	
  do	
  all	
  these	
  	
  have	
  to	
  do	
  with	
  me?	
  




             Overview	
  of	
  E-­‐Science	
  
       Characteris&cs	
  of	
  e-­‐science	
  
   Data	
  sets,	
  data	
  collec&ons,	
  and	
  data	
  
                     repositories	
  
     Why	
  does	
  it	
  maGer	
  to	
  libraries?	
  
E-­‐Science	
  
   	
  	
  	
  	
  “In	
  the	
  future,	
  e-­‐Science	
  will	
  refer	
  to	
  the	
  
   large	
  scale	
  science	
  that	
  will	
  increasingly	
  be	
  
   carried	
  out	
  through	
  distributed	
  global	
  
   collabora&ons	
  enabled	
  by	
  the	
  Internet.	
  ”	
  

   	
  

                        Na&onal	
  e-­‐Science	
  Center.	
  (2008).	
  Defining	
  e-­‐Science.	
  
                        hGp://www.nesc.ac.uk/nesc/define.html	
  	
  

12/18/11	
  15:51	
                         Overview	
  of	
  E-­‐Science	
                           4	
  
E-­‐Infrastructure	
  for	
  the	
  research	
  	
  lifecycle	
  
                                                              hGp://epubs.cclrc.ac.uk/bitstream/
                                                              3857/
                                                              science_lifecycle_STFC_poster1.PD
                                                              F	
  	
  




 12/18/11	
  15:51	
      Overview	
  of	
  E-­‐Science	
                                 5	
  
 Shib	
  in	
  Science	
  Paradigms	
  
     Thousand	
  years	
           A	
  few	
  hundred	
         A	
  few	
  decades	
                          Today	
  
          ago	
                          years	
  ago	
                  ago	
  




                                                                                                         Data	
  explora7on	
  (eScience)	
  
                                                                                                    unify	
  theory,	
  experiment,	
  and	
  
                                                                                                                     simula&on	
  
                                                   A	
  computa7onal	
   -­‐-­‐	
  Data	
  captured	
  by	
  instruments	
  
                                                         approach	
                            or	
  generated	
  by	
  simulator	
  
                                                  simula&ng	
  complex	
   -­‐-­‐	
  Processed	
  	
  by	
  sobware	
  
                      Theore7cal	
  branch	
  	
   phenomena	
                                 -­‐-­‐	
  Informa&on/Knowledge	
  
                             using	
  models,	
                                                stored	
  in	
  computer	
  
                            generaliza&ons	
                                                   -­‐-­‐	
  Scien&st	
  analyzes	
  database/
                                                                                               files	
  using	
  data	
  management	
  
   Science	
  was	
                                                                            and	
  sta&s&cs	
  
empirical	
  describing	
  
natural	
  phenomena	
                                    Gray,	
  J.	
  &	
  Szalay,	
  A.	
  (2007).	
  eScience	
  –	
  A	
  transformed	
  
                                                                     scien&fic	
  method.	
  hGp://research.microsob.com/en-­‐us/um/
                                                                     people/gray/talks/NRC-­‐CSTB_eScience.ppt	
  
12/18/11	
  15:51	
     Overview	
  of	
  E-­‐Science	
     7	
  
Gray,	
  J.	
  &	
  Szalay,	
  A.	
  (2007).	
  eScience	
  –	
  A	
  transformed	
  

                                           X-­‐Info	
                                                                       scien&fic	
  method.	
  hGp://research.microsob.com/en-­‐us/um/
                                                                                                                            people/gray/talks/NRC-­‐CSTB_eScience.ppt	
  

•  The	
  evolu&on	
  of	
  X-­‐Info	
  and	
  Comp-­‐X	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
     for	
  each	
  discipline	
  X	
  
•  How	
  to	
  codify	
  and	
  represent	
  our	
  knowledge	
  	
  
	
             Experiments	
  &	
  
                                       Instruments	
  
                                    Other	
  Archives	
   facts	
                                                                     ques&ons	
  
                                      Literature	
   facts	
                                                   ?	
                    answers	
  
                                        Simula&ons	
  

                                                                       The	
  Generic	
  Problems	
  
    •         Data	
  ingest	
  	
  	
  
    •         Managing	
  a	
  petabyte	
                                                                                 •         Query	
  and	
  Vis	
  tools	
  	
  
                                                                                                                          •         Building	
  and	
  execu&ng	
  models	
  
    •         Common	
  schema	
  
    •         How	
  to	
  organize	
  it	
  	
                                                                           •         Integra&ng	
  data	
  and	
  Literature	
  	
  	
  
                                                                                                                          •         Documen&ng	
  experiments	
  
    •         How	
  to	
  reorganize	
  it	
  
    •         How	
  to	
  share	
  with	
  others	
  
                                                                                                                          •         Cura&on	
  and	
  long-­‐term	
  preserva&on	
  
Useful	
  resources	
  
                                      •    What	
  is	
  eScience?	
  	
  	
  	
  
                                      •    eScience	
  Ini7a7ves	
  	
  	
  	
  
                                      •    Science	
  Research	
  and	
  Data	
  	
  	
  	
  
                                      •    Science	
  Data	
  Management	
  	
  	
  	
  
                                      •    Literature	
  Reviews	
  	
  	
  	
  
                                      •    Data	
  Policy	
  Issues	
  	
  	
  	
  
                                      •    eScience	
  Research	
  Centers	
  	
  	
  	
  

                                      •  hGp://eslib.ischool.syr.edu/index.php?
                                         op&on=com_content&view=sec&on&id
hGp://research.microsob.com/en-­‐        =9&Itemid=83	
  
us/collabora&on/fourthparadigm/	
  


   12/18/11	
  15:51	
                     Overview	
  of	
  E-­‐Science	
                      9	
  
A	
  FEW	
  IMPORTANT	
  CONCEPTS	
  


12/18/11	
  15:51	
     Overview	
  of	
  E-­‐Science	
     10	
  
Data	
  

	
  	
  	
  	
  	
  Any	
  and	
  all	
  complex	
  data	
  
en&&es	
  from	
  observa&ons,	
  
experiments,	
  simula&ons,	
  
models,	
  and	
  higher	
  order	
  
assemblies,	
  along	
  with	
  the	
  
associated	
  documenta&on	
  
needed	
  to	
  describe	
  and	
                                           An	
  ar&st’s	
  concep&on	
  (above)	
  depicts	
  
                                                                            fundamental	
  NEON	
  observatory	
  

interpret	
  the	
  data.                                                   instrumenta&on	
  and	
  systems	
  as	
  well	
  as	
  
                                                                            poten&al	
  spa&al	
  organiza&on	
  of	
  the	
  
                                                                            environmental	
  measurements	
  made	
  by	
  these	
  
                                                                            instruments	
  and	
  systems.	
  
                                                                            hGp://www.nsf.gov/pubs/2007/nsf0728/
                                                                            nsf0728_4.pdf	
  


  12/18/11	
  15:51	
                                          Overview	
  of	
  E-­‐Science	
                                     11	
  
Scien&fic	
  data	
  formats	
  

                                 Common	
  data	
  format	
  
                                    Image	
  formats	
  
                                   Matrix	
  formats	
  
                                Microarray	
  file	
  formats	
  
                               Communica&on	
  protocols	
  




12/18/11	
  15:51	
                  Overview	
  of	
  E-­‐Science	
     12	
  
Scien&fic	
  datasets	
  
•  The	
  scien&fic	
  data	
  set,	
  
   or	
  SDS,	
  is	
  a	
  group	
  of	
  
   data	
  structures	
  used	
  
   to	
  store	
  and	
  describe	
  
   mul&dimensional	
  
   arrays	
  of	
  scien&fic	
  
   data.	
  
•  The	
  boundaries	
  of	
  
   datasets	
  vary	
  from	
  
   discipline	
  to	
  discipline	
  	
  
                                    NCSA	
  HDF	
  Development	
  Group.	
  (1998).	
  HDF	
  4.1r2	
  User's	
  Guide.	
  
                                    hGp://www.hdfgroup.org/training/HDFtraining/UsersGuide/
                                    SDS_SD.fm1.html#48894	
  
   12/18/11	
  15:51	
                               Overview	
  of	
  E-­‐Science	
                                          13	
  
Scien&fic	
  workflows	
  
•  Steps	
  in	
  data	
  collec&on	
  and	
  analysis	
  process	
  
•  Different	
  types	
  of	
  scien&fic	
  workflows:	
  
     –  Data-­‐intensive	
  
     –  Compute-­‐intensive	
  
     –  Analysis-­‐intensive	
  
     –  Visualiza&on-­‐intensive	
  

     Ludäscher,	
  B.,	
  Al&ntas,	
  I.,	
  Berkley,	
  C.,	
  Higgins,	
  D.,	
  Jaeger,	
  E.,	
  Jones,	
  E.,	
  Lee,	
  E.A.,	
  Tao,	
  J.,	
  &	
  
     Zhao,	
  Y.	
  (2006).	
  Scien&fic	
  workflow	
  management	
  and	
  the	
  Kepler	
  system.	
  Currency	
  and	
  
     Computa>on:	
  Prac>ce	
  and	
  Experience,	
  18(10):	
  1039-­‐1065.	
  	
  

  12/18/11	
  15:51	
                                                 Overview	
  of	
  E-­‐Science	
                                                         14	
  
Example:	
  Ecological	
  dataset	
  
•  Floris&c	
  diversity	
  
   data	
  
     –  Related	
  links	
  
     –  Data	
  aGributes	
  
     –  Download	
  link	
  




 12/18/11	
  15:51	
             Overview	
  of	
  E-­‐Science	
     15	
  
Example:	
  Biodiversity	
  dataset	
  
•    Ac7ons	
  for	
  Porcupine	
  
     Marine	
  Natural	
  History	
  
     Society	
  -­‐	
  Marine	
  flora	
  and	
  
     fauna	
  records	
  from	
  the	
  
     North-­‐east	
  Atlan7c	
  
      –  Metadata	
  record	
  output	
  
         in	
  different	
  standard	
  
         formats	
  
      –  URL	
  for	
  dataset	
  download	
  	
  




      12/18/11	
  15:51	
                            Overview	
  of	
  E-­‐Science	
     16	
  
Example:	
  The	
  Significant	
  Earthquake	
  
                     Database	
  	
  
                                                            •  The	
  Significant	
  
                                                               Earthquake	
  Database	
  
                                                                –  A	
  database	
  containing	
  data	
  
                                                                   about	
  significant	
  
                                                                   earthquake	
  events	
  and	
  the	
  
                                                                   damages	
  caused	
  
                                                                –  An	
  interface	
  for	
  extrac&ng	
  
                                                                   a	
  subset	
  of	
  data	
  
                                                                –  A	
  link	
  to	
  download	
  the	
  
                                                                   whole	
  dataset	
  
                                                                –  Documenta&on	
  	
  




12/18/11	
  15:51	
     Overview	
  of	
  E-­‐Science	
                                           17	
  
Social	
  Science	
  Data	
  




     12/18/11	
  15:51	
        Overview	
  of	
  E-­‐Science	
     18	
  
Research	
  data	
  collec&ons	
  
  Data	
  output	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Size	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Metadata	
  	
  	
  	
  	
  	
  	
  Management	
  
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Standards	
  

                                                                                  Larger,	
                                           Mul&ple,	
                                                            Organized	
  
                                                                                discipline-­‐                                       comprehensive	
                                                      Ins&tu&onalized,	
  	
  
                                                                                  based	
  




                                                                                                                                                                                                                     Heroic	
  
                                                                              Smaller,	
                                                                                                                           individual	
  
                                                                            team-­‐based	
                                                   None	
  or	
                                                          inside	
  the	
  
                                                                                                                                             random	
                                                                team	
  

12/18/11	
  15:51	
                                                                                Overview	
  of	
  E-­‐Science	
                                                                                                    19	
  
Research	
  collec&ons	
  
•  Limited	
  processing	
  or	
  long-­‐term	
  
   management
•  Not	
  conformed	
  to	
  any	
  data	
  
   standards
•  Varying	
  sizes	
  and	
  formats	
  of	
  data	
  
   files	
  
•  Low	
  level	
  of	
  processing,	
  lack	
  of	
  plan	
  
   for	
  data	
  products	
  
•  Low	
  awareness	
  of	
  metadata	
  
   standards	
  and	
  data	
  management	
  
   issues	
  
12/18/11	
  15:51	
                  Overview	
  of	
  E-­‐Science	
     20	
  
Resource	
  collec&ons	
  
•  Authored	
  by	
  a	
  community	
  of	
  inves&gators,	
  within	
  
   a	
  domain	
  or	
  science	
  or	
  engineering	
  
•  Developed	
  with	
  community	
  level	
  standards	
  
•  Life	
  &me	
  is	
  between	
  mid-­‐	
  and	
  long-­‐term	
  

•  Example:	
  Hubbard	
  Brook	
  Ecosystem	
  Study	
  (
   hGp://www.hubbardbrook.org	
  )	
  	
  
      –  One	
  of	
  the	
  regional	
  sites	
  in	
  the	
  Long	
  term	
  
         Ecological	
  Research	
  Network	
  (LTER)	
  
      –  Community	
  of	
  the	
  ecological	
  domain	
  
      –  Community	
  of	
  inves&gators	
  from	
  around	
  the	
  
         country	
  on	
  ecosystem	
  study	
  
      –  Ecological	
  Metadata	
  Language	
  (EML),	
  a	
  
         community-­‐level	
  standard	
  
      –  Cataloged,	
  searchable	
  dataset	
  collec&ons	
  


  12/18/11	
  15:51	
                                   Overview	
  of	
  E-­‐Science	
     21	
  
Reference	
  collec&on	
  
•  Example:	
  Global	
  Biodiversity	
  Informa&on	
  Facility	
  
      –  Created	
  by	
  large	
  segments	
  of	
  science	
  community	
  	
  
      –  Conform	
  to	
  robust,	
  well-­‐established	
  and	
  comprehensive	
  
         standards,	
  e.g.	
  
                •    ABCD	
  (Access	
  to	
  Biological	
  Collec&on	
  Data)	
  	
  
                •    Darwin	
  Core	
  	
  
                •    DiGIR	
  (Distributed	
  Generic	
  Informa&on	
  Retrieval)	
  	
  
                •    Dublin	
  Core	
  Metadata	
  standard	
  	
  
                •    GGF	
  	
  (Global	
  Grid	
  Forum)	
  	
  
                •    Invasive	
  Alien	
  Species	
  Profile	
  	
  
                •    LSID	
  (Life	
  Sciences	
  Iden&fier)	
  	
  
                •    OGC	
  (Open	
  Geospa&al	
  Consor&um)



  12/18/11	
  15:51	
                                   Overview	
  of	
  E-­‐Science	
     22	
  
hGp://www.tdwg.org/
Global	
  Biodiversity	
                                              standards/	
  
Informa7on	
  Facility	
  




hGp://www.gbif.org/informa&cs/discoverymetadata/a-­‐metadata-­‐infrastructure/	
  
    12/18/11	
  15:51	
              Overview	
  of	
  E-­‐Science	
                   23	
  
Datasets,	
  data	
  collec&ons,	
  and	
  data	
  
                      repositories	
  	
   System	
  for	
  storing,	
  
                                                                                 managing,	
  preserving,	
  
                                                                                 and	
  providing	
  access	
  to	
  
•  Data	
  collec&ons	
  are	
  built	
  for	
                                   datasets	
  	
  
   larger	
  segments	
  of	
  science	
  
   and	
  engineering	
                                                                  Data	
  
•  Datasets	
                                                                         repository	
  
         –  typically	
  centered	
  around	
  an	
                              A	
  repository	
  may	
  
            event	
  or	
  a	
  study	
                                          contain	
  one	
  or	
  more	
  
         –  contain	
  a	
  single	
  file	
  or	
  mul&ple	
                     data	
  collec&ons	
  	
  
            files	
  in	
  various	
  formats	
                                   	
  
                                                                                 A	
  data	
  collec&on	
  may	
  
         –  coupled	
  with	
  documenta&on	
                                    contain	
  one	
  or	
  more	
  
            about	
  the	
  background	
  of	
  data	
                           datasets	
  
            collec&on	
  and	
  processing	
                                     	
  
                                                                                 A	
  dataset	
  may	
  contain	
  
                                                                                 one	
  or	
  more	
  data	
  files	
  
12/18/11	
  15:51	
                          Overview	
  of	
  E-­‐Science	
                                       24	
  
An	
  emerging	
  trend	
  in	
  academic	
  libraries	
  




12/18/11	
  15:51	
          Overview	
  of	
  E-­‐Science	
       25	
  
Ini&a&ves	
  in	
  research	
  libraries	
  

         Data	
  support	
  and	
                                                                   Libraries	
  involved	
  in	
  
            services	
  in	
                                                                        suppor&ng	
  eScience:	
  
           ins&tu&ons:	
                                                                                    73%	
  
                  45%	
  
                   •  Pressure	
  points:	
  
                             –  Lack	
  of	
  resources	
  
                             –  Difficulty	
  acquiring	
  the	
  appropriate	
  staff	
  and	
  
                                exper&se	
  to	
  provide	
  eScience	
  and	
  data	
  
                                management	
  or	
  cura&on	
  services	
  
                             –  Lack	
  of	
  a	
  unifying	
  direc&on	
  on	
  campus	
  
Source:	
  Soehner,	
  C.,	
  Steeves,	
  C.	
  &	
  Ward,	
  J.	
  (2010).	
  E-­‐Science	
  and	
  data	
  support	
  services:	
  A	
  study	
  of	
  
ARL	
  member	
  ins&tu&on.	
  hGp://www.arl.org/bm~doc/escience_report2010.pdf	
  	
  	
  	
  
   12/18/11	
  15:51	
                                                     Overview	
  of	
  E-­‐Science	
                                                  26	
  
Data	
  management	
  challenges	
  
•  No	
  one-­‐size-­‐fits-­‐all	
  solu&on	
  
•  Requires	
  an	
  in-­‐depth	
  understanding	
  of	
  
   scien&fic	
  workflows	
  and	
  research	
  lifecycle	
  
•  Involves	
  not	
  only	
  technical	
  design	
  and	
  
   planning	
  but	
  also	
  organiza&onal	
  collabora&on	
  
   and	
  ins&tu&onaliza&on	
  of	
  data	
  policy	
  	
  



12/18/11	
  15:51	
        Overview	
  of	
  E-­‐Science	
     27	
  
Data	
  preserva&on	
  challenges	
  
•  Data	
  formats	
  
         –  Vary	
  in	
  data	
  types,	
  e.g.	
  vector	
  and	
  raster	
  data	
  types	
  	
  
         –  Format	
  conversions,	
  e.g.	
  from	
  an	
  old	
  version	
  to	
  a	
  newer	
  
            one	
  
•  Data	
  rela&ons	
  	
  
         –  e.g.	
  there	
  are	
  data	
  models,	
  annota&ons,	
  classifica&on	
  
            schemes,	
  and	
  symboliza&on	
  files	
  for	
  a	
  digital	
  map	
  
•  Seman&c	
  issues	
  
         –  Naming	
  datasets	
  and	
  aGributes	
  


12/18/11	
  15:51	
                          Overview	
  of	
  E-­‐Science	
                           28	
  
Data	
  access	
  challenges	
  
•  Reliability	
  	
  
•  Authen&city	
  
•  Leverage	
  technology	
  to	
  make	
  data	
  access	
  
   easier	
  and	
  more	
  effec&ve	
  
         –  Cross-­‐database	
  search	
  
         –  Integra&on	
  applica&ons	
  




12/18/11	
  15:51	
                Overview	
  of	
  E-­‐Science	
     29	
  
Suppor&ng	
  digital	
  research	
  data	
  
   •  Lifecycle	
  of	
  research	
  data	
  
            –  Create:	
  data	
  crea&on/capture/gathering	
  from	
  laboratory	
  
               experiments,	
  field	
  work,	
  surveys,	
  devices,	
  media,	
  
               simula&on	
  output…	
  
            –  Edit:	
  organize,	
  annotate,	
  clean,	
  filter…	
  
            –  Use/reuse:	
  analyze,	
  mine,	
  model,	
  derive	
  addi&onal	
  data,	
  
               visualize,	
  input	
  to	
  instruments	
  /computers	
  
            –  Publish:	
  disseminate	
  data	
  via	
  portals	
  and	
  associate	
  
               datasets	
  with	
  research	
  publica&ons	
  
            –  Preserve/destroy:	
  store	
  /	
  preserve,	
  store	
  /replicate	
  /
               preserve,	
  store	
  /	
  ignore,	
  destroy…	
  

12/18/11	
  15:51	
                      Overview	
  of	
  E-­‐Science	
                   30	
  
Suppor&ng	
  data	
  management	
  




 The	
  data	
  deluge	
                                                Researchers	
  need:	
  	
  
Numerical,	
  image,	
  video	
                                         Specialized	
  search	
  
	
                                                                      engines	
  to	
  discover	
  the	
  
Models,	
  simula&ons,	
  bit	
                                         data	
  they	
  need	
  
streams	
                                                               	
  
	
                                                                      Powerful	
  data	
  mining	
  
XML,	
  CVS,	
  DB,	
  HTML	
                                           tools	
  to	
  use	
  and	
  analyze	
  
                                                                        the	
  data	
  


   12/18/11	
  15:51	
              Overview	
  of	
  E-­‐Science	
                                        31	
  
Research	
  data	
  management	
  
                                                                                                                Community	
  
           Ins&tu&on	
  
                                                           eScience	
  
                                                           librarian	
  


Financial	
  and	
  policy	
  
     support	
                             Science	
                                Data	
  content	
                       User	
  
                                           domain	
                                idiosyncrasies	
  	
                 requirements	
  



    Evolving	
  and	
  interconnec&ng	
  –	
  	
  	
  


    Ins&tu&onal	
                  Community	
                               Na&onal	
                      Interna&onal	
  
     repository	
                   repository	
                            repository	
                      repository	
  

12/18/11	
  15:51	
                                      Overview	
  of	
  E-­‐Science	
                                         32	
  
Implica&ons	
  to	
  scholarly	
  communica&on	
  
                      process	
  

  Publishing	
  	
                           Cura&on	
                               Archiving	
  
    Data	
  publishing;	
             Maintaining,	
  preserving	
  
                                                                                     The	
  long-­‐term	
  storage,	
  
New	
  scholarly	
  publishing	
     and	
  adding	
  value	
  to	
  digital	
  
                                                                                      retrieval,	
  and	
  use	
  of	
  
 models—open	
  access,	
            research	
  data	
  throughout	
  
                                                                                       scien&fic	
  data	
  and	
  
    ins&tu&onal	
  and	
                      its	
  lifecycle.	
  
                                                                                              methods.	
  
community	
  	
  repositories,	
  
 self-­‐publishing,	
  library	
  
       publishing,	
  ....	
  	
  

  12/18/11	
  15:51	
                            Overview	
  of	
  E-­‐Science	
                                     33	
  
术语的演变	





12/18/11	
  15:50	
     促进学术交流:如何踢开第一脚?	
     34	
  
个案研究1:制定数据保存
           分享的机构政策	





12/18/11	
  15:50	
     促进学术交流:如何踢开第一脚?	
     35	
  
有无学科仓储?	
  
                             现状	
                  有无呈交?	
  
                                                   校内仓储有无与学科仓储连接?	
  

                               院、系服务器	
  
     研究人员	
  
                                 数据、                        学科仓储	
  
                                 文件	
  


                               校园服务器	
  

                 校内机
•    什么文件格式?	
                                               期刊、会议
                 构仓储	
  
•    如何组织的?	
                                                 论文出版	
  
•    如何使用的?	
  
•    能否与非项目团队人员分享?	
  
•    如果能,有什么条件和规定?	
  
•    文件和数据的保存是如何做的?	
  
•    有哪些法律条例需要遵守?	
  
     12/18/11	
  15:50	
              促进学术交流:如何踢开第一脚?	
                  36	
  
目标	

    现状	

无统一规章                   调查现有                                建立统一的数据获
条例	
                   机构数据                                取、使用、管理、
                         政策	

	
                                                         分享的政策	

无文件、数                                                       	

据管理的认                   获取校领                                建立机构数据仓储
                        导及有关
识	
                                                        (campus
                        部门的支
	
                     持	
                                cyberinfrastructure-
无数据使用                                                       enabled support)	

和分享的政                   Proof of                            	

                        Concept
策规定	
                   Project	
                         广泛宣传、用事实
                        	

                                                            说服研究人员	



12/18/11	
  15:50	
                   促进学术交流:如何踢开第一脚?	
               37	
     37	
  
Ac&ons!	
  
                                                         校长 	
  


                                                                             VP	
  for	
  
                            VP	
  for	
  
                                                                            Academic	
  
                           Research	
  
                                                                             Affairs	
  


        科研处	
               图书馆	
              IT	
  services	
     iSchool	
         College⋯	
  



                                                                    调查现有机构数据政
                                                                    策,写出报告并给VP	
  
                        与学校有关部门协作	
  
                                                                    for	
  Research提出建议
                                                                    参考意见	
  

12/18/11	
  15:50	
                         促进学术交流:如何踢开第一脚?	
                                        38	
  
12/18/11	
  15:50	
     促进学术交流:如何踢开第一脚?	
     39	
  
DATA	
  MANAGEMENT	
  PRACTICES	
  IN	
  
ACADEMIC	
  LIBRARIES	
  
hGp://researchdata.wisc.edu/	
  	
  
hGps://confluence.cornell.edu/display/
rdmsgweb/Home	
  	
  
hGp://libraries.mit.edu/guides/subjects/data-­‐
management/	
  	
  
Summary	
  	
  
•  Managing	
  research	
  data	
  is	
  mo&vated	
  by:	
  
    –  Government	
  funding	
  agency’s	
  policy	
  
    –  Needs	
  for	
  data	
  sharing,	
  cross	
  valida&on	
  of	
  data	
  and	
  
       research,	
  credit,	
  and	
  large-­‐scale	
  interdisciplinary	
  
       discovery	
  
•  Organiza&onal	
  changes:	
  
    –  New	
  organiza&onal	
  units	
  within	
  the	
  university	
  library	
  
       or	
  at	
  the	
  university	
  level	
  
    –  Virtual	
  group	
  	
  
    –  Collabora&on	
  among	
  key	
  units:	
  Libraries,	
  IT	
  services,	
  
       research	
  administra&on	
  office	
  
Summary	
  	
  
•  Types	
  of	
  services	
  
    –  Training	
  faculty	
  and	
  students	
  for	
  data	
  literacy	
  
    –  Data	
  cura&on	
  services	
  (data	
  repositories,	
  digital	
  
       libraries,	
  archiving	
  data)	
  
    –  Consul&ng	
  services	
  
    –  Data	
  management	
  plan	
  
    –  Developing	
  data	
  policies	
  
Scientific data management (v2)

Más contenido relacionado

Similar a Scientific data management (v2)

Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data ManagementJian Qin
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information LifecycleMicah Altman
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsJian Qin
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Cobb u mass_neal_e_science_v06
Cobb u mass_neal_e_science_v06Cobb u mass_neal_e_science_v06
Cobb u mass_neal_e_science_v06John Cobb
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
ALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics SymposiumALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics SymposiumSimon Buckingham Shum
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowEric Stephan
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsJason Hattrick-Simpers
 

Similar a Scientific data management (v2) (20)

Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data Management
 
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information Lifecycle
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Keller geo edu
Keller geo eduKeller geo edu
Keller geo edu
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
Cobb u mass_neal_e_science_v06
Cobb u mass_neal_e_science_v06Cobb u mass_neal_e_science_v06
Cobb u mass_neal_e_science_v06
 
data curation issues
data curation issuesdata curation issues
data curation issues
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
ALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics SymposiumALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics Symposium
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 

Más de Jian Qin

Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Survey research
Survey research Survey research
Survey research Jian Qin
 
Developing Data Services to Support Scientific Data Management (v3)
Developing Data Services to Support Scientific Data Management (v3)Developing Data Services to Support Scientific Data Management (v3)
Developing Data Services to Support Scientific Data Management (v3)Jian Qin
 
Research literature review
Research literature reviewResearch literature review
Research literature reviewJian Qin
 
Scholarly communication
Scholarly communicationScholarly communication
Scholarly communicationJian Qin
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Jian Qin
 

Más de Jian Qin (7)

Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Survey research
Survey research Survey research
Survey research
 
Developing Data Services to Support Scientific Data Management (v3)
Developing Data Services to Support Scientific Data Management (v3)Developing Data Services to Support Scientific Data Management (v3)
Developing Data Services to Support Scientific Data Management (v3)
 
Research literature review
Research literature reviewResearch literature review
Research literature review
 
Scholarly communication
Scholarly communicationScholarly communication
Scholarly communication
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)
 

Último

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 

Último (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Scientific data management (v2)

  • 1. Scien&fic  Data  Management   A  tutorial  at  ICADL  2011   October  24,  2011     Jian  Qin   School  of  Informa&on  Studies   Syracuse  University   hGp://eslib.ischool.syr.edu/    
  • 2. The  morning  ahead   An  environmental  scan   •  E-­‐Science,  cyberinfrastructure,  and  data   •  What  do  all  these    have  to  do  with  me?   Case  study:  The  gravita&onal  wave   research  data  management     Group  work:  Role  play  in   developing  data  management   ini&a&ves     12/18/11  15:51   Overview  of  E-­‐Science   2  
  • 3. An  environmental  scan   •  E-­‐Science,  cyberinfrastructure,  and  data   •  What  do  all  these    have  to  do  with  me?   Overview  of  E-­‐Science   Characteris&cs  of  e-­‐science   Data  sets,  data  collec&ons,  and  data   repositories   Why  does  it  maGer  to  libraries?  
  • 4. E-­‐Science          “In  the  future,  e-­‐Science  will  refer  to  the   large  scale  science  that  will  increasingly  be   carried  out  through  distributed  global   collabora&ons  enabled  by  the  Internet.  ”     Na&onal  e-­‐Science  Center.  (2008).  Defining  e-­‐Science.   hGp://www.nesc.ac.uk/nesc/define.html     12/18/11  15:51   Overview  of  E-­‐Science   4  
  • 5. E-­‐Infrastructure  for  the  research    lifecycle   hGp://epubs.cclrc.ac.uk/bitstream/ 3857/ science_lifecycle_STFC_poster1.PD F     12/18/11  15:51   Overview  of  E-­‐Science   5  
  • 6.  Shib  in  Science  Paradigms   Thousand  years   A  few  hundred   A  few  decades   Today   ago   years  ago   ago   Data  explora7on  (eScience)   unify  theory,  experiment,  and   simula&on   A  computa7onal   -­‐-­‐  Data  captured  by  instruments   approach   or  generated  by  simulator   simula&ng  complex   -­‐-­‐  Processed    by  sobware   Theore7cal  branch     phenomena   -­‐-­‐  Informa&on/Knowledge   using  models,   stored  in  computer   generaliza&ons   -­‐-­‐  Scien&st  analyzes  database/ files  using  data  management   Science  was   and  sta&s&cs   empirical  describing   natural  phenomena   Gray,  J.  &  Szalay,  A.  (2007).  eScience  –  A  transformed   scien&fic  method.  hGp://research.microsob.com/en-­‐us/um/ people/gray/talks/NRC-­‐CSTB_eScience.ppt  
  • 7. 12/18/11  15:51   Overview  of  E-­‐Science   7  
  • 8. Gray,  J.  &  Szalay,  A.  (2007).  eScience  –  A  transformed   X-­‐Info   scien&fic  method.  hGp://research.microsob.com/en-­‐us/um/ people/gray/talks/NRC-­‐CSTB_eScience.ppt   •  The  evolu&on  of  X-­‐Info  and  Comp-­‐X                                                                                     for  each  discipline  X   •  How  to  codify  and  represent  our  knowledge       Experiments  &   Instruments   Other  Archives   facts   ques&ons   Literature   facts   ?   answers   Simula&ons   The  Generic  Problems   •  Data  ingest       •  Managing  a  petabyte   •  Query  and  Vis  tools     •  Building  and  execu&ng  models   •  Common  schema   •  How  to  organize  it     •  Integra&ng  data  and  Literature       •  Documen&ng  experiments   •  How  to  reorganize  it   •  How  to  share  with  others   •  Cura&on  and  long-­‐term  preserva&on  
  • 9. Useful  resources   •  What  is  eScience?         •  eScience  Ini7a7ves         •  Science  Research  and  Data         •  Science  Data  Management         •  Literature  Reviews         •  Data  Policy  Issues         •  eScience  Research  Centers         •  hGp://eslib.ischool.syr.edu/index.php? op&on=com_content&view=sec&on&id hGp://research.microsob.com/en-­‐ =9&Itemid=83   us/collabora&on/fourthparadigm/   12/18/11  15:51   Overview  of  E-­‐Science   9  
  • 10. A  FEW  IMPORTANT  CONCEPTS   12/18/11  15:51   Overview  of  E-­‐Science   10  
  • 11. Data            Any  and  all  complex  data   en&&es  from  observa&ons,   experiments,  simula&ons,   models,  and  higher  order   assemblies,  along  with  the   associated  documenta&on   needed  to  describe  and   An  ar&st’s  concep&on  (above)  depicts   fundamental  NEON  observatory   interpret  the  data. instrumenta&on  and  systems  as  well  as   poten&al  spa&al  organiza&on  of  the   environmental  measurements  made  by  these   instruments  and  systems.   hGp://www.nsf.gov/pubs/2007/nsf0728/ nsf0728_4.pdf   12/18/11  15:51   Overview  of  E-­‐Science   11  
  • 12. Scien&fic  data  formats   Common  data  format   Image  formats   Matrix  formats   Microarray  file  formats   Communica&on  protocols   12/18/11  15:51   Overview  of  E-­‐Science   12  
  • 13. Scien&fic  datasets   •  The  scien&fic  data  set,   or  SDS,  is  a  group  of   data  structures  used   to  store  and  describe   mul&dimensional   arrays  of  scien&fic   data.   •  The  boundaries  of   datasets  vary  from   discipline  to  discipline     NCSA  HDF  Development  Group.  (1998).  HDF  4.1r2  User's  Guide.   hGp://www.hdfgroup.org/training/HDFtraining/UsersGuide/ SDS_SD.fm1.html#48894   12/18/11  15:51   Overview  of  E-­‐Science   13  
  • 14. Scien&fic  workflows   •  Steps  in  data  collec&on  and  analysis  process   •  Different  types  of  scien&fic  workflows:   –  Data-­‐intensive   –  Compute-­‐intensive   –  Analysis-­‐intensive   –  Visualiza&on-­‐intensive   Ludäscher,  B.,  Al&ntas,  I.,  Berkley,  C.,  Higgins,  D.,  Jaeger,  E.,  Jones,  E.,  Lee,  E.A.,  Tao,  J.,  &   Zhao,  Y.  (2006).  Scien&fic  workflow  management  and  the  Kepler  system.  Currency  and   Computa>on:  Prac>ce  and  Experience,  18(10):  1039-­‐1065.     12/18/11  15:51   Overview  of  E-­‐Science   14  
  • 15. Example:  Ecological  dataset   •  Floris&c  diversity   data   –  Related  links   –  Data  aGributes   –  Download  link   12/18/11  15:51   Overview  of  E-­‐Science   15  
  • 16. Example:  Biodiversity  dataset   •  Ac7ons  for  Porcupine   Marine  Natural  History   Society  -­‐  Marine  flora  and   fauna  records  from  the   North-­‐east  Atlan7c   –  Metadata  record  output   in  different  standard   formats   –  URL  for  dataset  download     12/18/11  15:51   Overview  of  E-­‐Science   16  
  • 17. Example:  The  Significant  Earthquake   Database     •  The  Significant   Earthquake  Database   –  A  database  containing  data   about  significant   earthquake  events  and  the   damages  caused   –  An  interface  for  extrac&ng   a  subset  of  data   –  A  link  to  download  the   whole  dataset   –  Documenta&on     12/18/11  15:51   Overview  of  E-­‐Science   17  
  • 18. Social  Science  Data   12/18/11  15:51   Overview  of  E-­‐Science   18  
  • 19. Research  data  collec&ons   Data  output                          Size                            Metadata              Management                                                                                                            Standards   Larger,   Mul&ple,   Organized   discipline-­‐ comprehensive   Ins&tu&onalized,     based   Heroic   Smaller,   individual   team-­‐based   None  or   inside  the   random   team   12/18/11  15:51   Overview  of  E-­‐Science   19  
  • 20. Research  collec&ons   •  Limited  processing  or  long-­‐term   management •  Not  conformed  to  any  data   standards •  Varying  sizes  and  formats  of  data   files   •  Low  level  of  processing,  lack  of  plan   for  data  products   •  Low  awareness  of  metadata   standards  and  data  management   issues   12/18/11  15:51   Overview  of  E-­‐Science   20  
  • 21. Resource  collec&ons   •  Authored  by  a  community  of  inves&gators,  within   a  domain  or  science  or  engineering   •  Developed  with  community  level  standards   •  Life  &me  is  between  mid-­‐  and  long-­‐term   •  Example:  Hubbard  Brook  Ecosystem  Study  ( hGp://www.hubbardbrook.org  )     –  One  of  the  regional  sites  in  the  Long  term   Ecological  Research  Network  (LTER)   –  Community  of  the  ecological  domain   –  Community  of  inves&gators  from  around  the   country  on  ecosystem  study   –  Ecological  Metadata  Language  (EML),  a   community-­‐level  standard   –  Cataloged,  searchable  dataset  collec&ons   12/18/11  15:51   Overview  of  E-­‐Science   21  
  • 22. Reference  collec&on   •  Example:  Global  Biodiversity  Informa&on  Facility   –  Created  by  large  segments  of  science  community     –  Conform  to  robust,  well-­‐established  and  comprehensive   standards,  e.g.   •  ABCD  (Access  to  Biological  Collec&on  Data)     •  Darwin  Core     •  DiGIR  (Distributed  Generic  Informa&on  Retrieval)     •  Dublin  Core  Metadata  standard     •  GGF    (Global  Grid  Forum)     •  Invasive  Alien  Species  Profile     •  LSID  (Life  Sciences  Iden&fier)     •  OGC  (Open  Geospa&al  Consor&um) 12/18/11  15:51   Overview  of  E-­‐Science   22  
  • 23. hGp://www.tdwg.org/ Global  Biodiversity   standards/   Informa7on  Facility   hGp://www.gbif.org/informa&cs/discoverymetadata/a-­‐metadata-­‐infrastructure/   12/18/11  15:51   Overview  of  E-­‐Science   23  
  • 24. Datasets,  data  collec&ons,  and  data   repositories     System  for  storing,   managing,  preserving,   and  providing  access  to   •  Data  collec&ons  are  built  for   datasets     larger  segments  of  science   and  engineering   Data   •  Datasets   repository   –  typically  centered  around  an   A  repository  may   event  or  a  study   contain  one  or  more   –  contain  a  single  file  or  mul&ple   data  collec&ons     files  in  various  formats     A  data  collec&on  may   –  coupled  with  documenta&on   contain  one  or  more   about  the  background  of  data   datasets   collec&on  and  processing     A  dataset  may  contain   one  or  more  data  files   12/18/11  15:51   Overview  of  E-­‐Science   24  
  • 25. An  emerging  trend  in  academic  libraries   12/18/11  15:51   Overview  of  E-­‐Science   25  
  • 26. Ini&a&ves  in  research  libraries   Data  support  and   Libraries  involved  in   services  in   suppor&ng  eScience:   ins&tu&ons:   73%   45%   •  Pressure  points:   –  Lack  of  resources   –  Difficulty  acquiring  the  appropriate  staff  and   exper&se  to  provide  eScience  and  data   management  or  cura&on  services   –  Lack  of  a  unifying  direc&on  on  campus   Source:  Soehner,  C.,  Steeves,  C.  &  Ward,  J.  (2010).  E-­‐Science  and  data  support  services:  A  study  of   ARL  member  ins&tu&on.  hGp://www.arl.org/bm~doc/escience_report2010.pdf         12/18/11  15:51   Overview  of  E-­‐Science   26  
  • 27. Data  management  challenges   •  No  one-­‐size-­‐fits-­‐all  solu&on   •  Requires  an  in-­‐depth  understanding  of   scien&fic  workflows  and  research  lifecycle   •  Involves  not  only  technical  design  and   planning  but  also  organiza&onal  collabora&on   and  ins&tu&onaliza&on  of  data  policy     12/18/11  15:51   Overview  of  E-­‐Science   27  
  • 28. Data  preserva&on  challenges   •  Data  formats   –  Vary  in  data  types,  e.g.  vector  and  raster  data  types     –  Format  conversions,  e.g.  from  an  old  version  to  a  newer   one   •  Data  rela&ons     –  e.g.  there  are  data  models,  annota&ons,  classifica&on   schemes,  and  symboliza&on  files  for  a  digital  map   •  Seman&c  issues   –  Naming  datasets  and  aGributes   12/18/11  15:51   Overview  of  E-­‐Science   28  
  • 29. Data  access  challenges   •  Reliability     •  Authen&city   •  Leverage  technology  to  make  data  access   easier  and  more  effec&ve   –  Cross-­‐database  search   –  Integra&on  applica&ons   12/18/11  15:51   Overview  of  E-­‐Science   29  
  • 30. Suppor&ng  digital  research  data   •  Lifecycle  of  research  data   –  Create:  data  crea&on/capture/gathering  from  laboratory   experiments,  field  work,  surveys,  devices,  media,   simula&on  output…   –  Edit:  organize,  annotate,  clean,  filter…   –  Use/reuse:  analyze,  mine,  model,  derive  addi&onal  data,   visualize,  input  to  instruments  /computers   –  Publish:  disseminate  data  via  portals  and  associate   datasets  with  research  publica&ons   –  Preserve/destroy:  store  /  preserve,  store  /replicate  / preserve,  store  /  ignore,  destroy…   12/18/11  15:51   Overview  of  E-­‐Science   30  
  • 31. Suppor&ng  data  management   The  data  deluge   Researchers  need:     Numerical,  image,  video   Specialized  search     engines  to  discover  the   Models,  simula&ons,  bit   data  they  need   streams       Powerful  data  mining   XML,  CVS,  DB,  HTML   tools  to  use  and  analyze   the  data   12/18/11  15:51   Overview  of  E-­‐Science   31  
  • 32. Research  data  management   Community   Ins&tu&on   eScience   librarian   Financial  and  policy   support   Science   Data  content   User   domain   idiosyncrasies     requirements   Evolving  and  interconnec&ng  –       Ins&tu&onal   Community   Na&onal   Interna&onal   repository   repository   repository   repository   12/18/11  15:51   Overview  of  E-­‐Science   32  
  • 33. Implica&ons  to  scholarly  communica&on   process   Publishing     Cura&on   Archiving   Data  publishing;   Maintaining,  preserving   The  long-­‐term  storage,   New  scholarly  publishing   and  adding  value  to  digital   retrieval,  and  use  of   models—open  access,   research  data  throughout   scien&fic  data  and   ins&tu&onal  and   its  lifecycle.   methods.   community    repositories,   self-­‐publishing,  library   publishing,  ....     12/18/11  15:51   Overview  of  E-­‐Science   33  
  • 34. 术语的演变  12/18/11  15:50   促进学术交流:如何踢开第一脚?   34  
  • 35. 个案研究1:制定数据保存 分享的机构政策  12/18/11  15:50   促进学术交流:如何踢开第一脚?   35  
  • 36. 有无学科仓储?   现状   有无呈交?   校内仓储有无与学科仓储连接?   院、系服务器   研究人员   数据、 学科仓储   文件   校园服务器   校内机 •  什么文件格式?   期刊、会议 构仓储   •  如何组织的?   论文出版   •  如何使用的?   •  能否与非项目团队人员分享?   •  如果能,有什么条件和规定?   •  文件和数据的保存是如何做的?   •  有哪些法律条例需要遵守?   12/18/11  15:50   促进学术交流:如何踢开第一脚?   36  
  • 37. 目标  现状  无统一规章 调查现有 建立统一的数据获 条例  机构数据 取、使用、管理、 政策   分享的政策  无文件、数  据管理的认 获取校领 建立机构数据仓储 导及有关 识  (campus 部门的支  持  cyberinfrastructure- 无数据使用 enabled support)  和分享的政 Proof of  Concept 策规定  Project  广泛宣传、用事实  说服研究人员  12/18/11  15:50   促进学术交流:如何踢开第一脚?   37   37  
  • 38. Ac&ons!   校长   VP  for   VP  for   Academic   Research   Affairs   科研处   图书馆   IT  services   iSchool   College⋯   调查现有机构数据政 策,写出报告并给VP   与学校有关部门协作   for  Research提出建议 参考意见   12/18/11  15:50   促进学术交流:如何踢开第一脚?   38  
  • 39. 12/18/11  15:50   促进学术交流:如何踢开第一脚?   39  
  • 40. DATA  MANAGEMENT  PRACTICES  IN   ACADEMIC  LIBRARIES  
  • 44.
  • 45. Summary     •  Managing  research  data  is  mo&vated  by:   –  Government  funding  agency’s  policy   –  Needs  for  data  sharing,  cross  valida&on  of  data  and   research,  credit,  and  large-­‐scale  interdisciplinary   discovery   •  Organiza&onal  changes:   –  New  organiza&onal  units  within  the  university  library   or  at  the  university  level   –  Virtual  group     –  Collabora&on  among  key  units:  Libraries,  IT  services,   research  administra&on  office  
  • 46. Summary     •  Types  of  services   –  Training  faculty  and  students  for  data  literacy   –  Data  cura&on  services  (data  repositories,  digital   libraries,  archiving  data)   –  Consul&ng  services   –  Data  management  plan   –  Developing  data  policies