Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Sla2009 D Curation Heidorn


Eche un vistazo a continuación

1 de 36 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (19)


Similares a Sla2009 D Curation Heidorn (20)

Más reciente (20)


Sla2009 D Curation Heidorn

  1. 1. Societal Need for Digital Curation Specialists in the Library Setting June 16, 2009 Special Libraries Association P. Bryan Heidorn
  2. 2. Introduction <ul><li>Program Manager, Division of Biological Infrastructure, National Science Foundation </li></ul><ul><li>Associate Professor, Graduate School of Library and Information Science, University of Illinois </li></ul><ul><li>JRS Biodiversity Foundation, Board of Directors </li></ul>
  3. 3. Why Libraries <ul><li>Libraries manage the scholarly output of society </li></ul><ul><li>Scholars in the humanities and sciences are generating primary and secondary data at unprecedented rates </li></ul><ul><li>Social investment is not only in journal publications but all scholarly knowledge </li></ul><ul><li>Need for specialists for information organization, access and preservation </li></ul><ul><li>Libraries have the institutional structure and many of the skills needed to curate data and other digital resources. </li></ul>
  4. 4. Cyberinfrastructure Vision <ul><li>“ The anticipated growth in both the production and repurposing of digital data raises complex issues not only of scale and heterogeneity, but also of stewardship, curation and long-term access . ” </li></ul><ul><ul><li>NSF Cyberinfrastructure Vision for 21st Century Discovery (2007), Chapter 3 </li></ul></ul>
  5. 5. Recognition of need for data curation <ul><li>“ Recommendation 6 : The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.” </li></ul><ul><li>Long-Lived Digital Data Collections: Enabling Research and Education in the 21 st Century (2005), Recommendations </li></ul>
  6. 6. <ul><li>Recognition of the importance of Information </li></ul><ul><li>Recognition of the need for education </li></ul><ul><li>New work roles within traditional institutions </li></ul>Interagency Working Group on Digital Data
  7. 7. New Information Disciplines <ul><li>Digital Curator : an expert knowledgeable of and with responsibility for the content of a digital collection(s) </li></ul><ul><li>Digital Archivist : an expert competent to appraise, acquire, authenticate, preserve, and provide access to records in digital form </li></ul><ul><li>Data Scientists : the information and computer scientists, database and software engineers and programmers, disciplinary experts, expert annotators, and others, who are crucial to the successful management of a digital data collection </li></ul><ul><li>(Long Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, report of the National Science Board, September, 2005) </li></ul>
  8. 8. Library Skills
  9. 9. Where is the data now? <ul><li>Not in reference collections </li></ul><ul><li>Varies mandates for sharing </li></ul><ul><li>Unsustainable models </li></ul><ul><ul><li>Individual researchers </li></ul></ul><ul><ul><li>Boutique databases </li></ul></ul><ul><li>Most data is from small projects </li></ul><ul><li>Big science and independent science </li></ul>
  10. 10. Economics of the long tail <ul><li>The Long Tail , By Chris Anderson. Wired Magizine.12.10, 2004. ( ) </li></ul><ul><li>NetFlix versus BlockBuster </li></ul><ul><li>Genbank versus Mary’s Lab </li></ul>
  11. 11. Naive View of Science Data GenBank PDB f ( x )= ax k + o ( x k ) Power Law of Science Data f ( x )= ax k + o ( x k )| X<.20 Data Volume Science Projects and Initiatives
  12. 12. Does NSF’s Data Follow the Power Law? I do not know but if $1 = X bytes…..
  13. 13. 20-80 Rule The small are big! Total Grants 9347 $2,137,636,716 20% 80% Number Grants 1869 7478 Total Dollars $1,199,088,125 $938,548,595 Range $6,892,810-$350,000 $350,000- $831
  14. 14. <ul><li>Dark data is the data that we know is/was there but we can’t see it. </li></ul>Hubble Space Telescope composite image &quot;ring&quot; of dark matter in the galaxy cluster Cl 0024+17
  15. 15. Related Ideas <ul><li>John Porter: </li></ul><ul><ul><li>Deep verses Wide databases </li></ul></ul><ul><li>Swanson: </li></ul><ul><ul><li>Undiscovered Public Knowledge </li></ul></ul><ul><li>Science Commons: </li></ul><ul><ul><li>Big Verses Small science </li></ul></ul>
  16. 16. Why is the tail also important <ul><li>Valuable science data is in the tail </li></ul><ul><li>Many scientists could use the tail data </li></ul><ul><li>Unpublished observations of flowing time in Concord by Alfred Hosmer from 1888 to 1902 </li></ul><ul><li>Photographs of Flowers </li></ul><ul><li>Blue Hill Observatory meteorological data </li></ul><ul><li>Richard B. Primack, Abraham J. Miller-Rushing, Daniel Primack, and Sharda Mukunda (2007). Using Photographs to Show the Effects of Climate Change on Flowing Time. Arnoldia 65(1), p2-9. </li></ul><ul><li>Valuable science data is in the tail </li></ul><ul><li>Many scientists could use the tail data </li></ul><ul><li>Science innovation occurs in the long tail </li></ul><ul><li>Unpublished negative results / aka dark data </li></ul><ul><li>We know very little about the tail </li></ul><ul><li>Transformative science happens in the tail </li></ul><ul><li>Computational thinking needed to free the tail </li></ul><ul><li>NSF Current investments in the tail </li></ul><ul><li>OECD Principles and Guidelines for Access to Research Data from Public Funding </li></ul>
  17. 17. The Case of Lake Victoria Data <ul><li>Lake Victoria is the largest fresh water lake in Africa </li></ul><ul><li>Nile Perch, Water Hyacinth, Deforestation and human waste are destroying the fishery </li></ul><ul><li>Hundreds of data sets have been created over 50 years </li></ul><ul><li>There is no access to most of that information </li></ul>
  18. 18. Barriers <ul><li>Lack of professional reward structure </li></ul><ul><li>Lack of education in data curation </li></ul><ul><li>Intellectual property rights (IPR) </li></ul><ul><li>Lack of technology </li></ul><ul><li>Lack of financial reward structure </li></ul><ul><li>Under valuation / lack of investment </li></ul><ul><li>Cost of infrastructure creation </li></ul><ul><li>Cost of infrastructure maintenance </li></ul><ul><li>PDF, excel, MS word, arcview, floppy disks </li></ul>
  19. 19. Technical Solutions: Move the tail to the head (increase k) <ul><li>Data standards </li></ul><ul><ul><li>e.g. Environmental Markup Language (EML) </li></ul></ul><ul><ul><li>e.g. TaxonX - taXMLit </li></ul></ul><ul><li>Metadata </li></ul><ul><ul><li>Darwin Core (DwC) </li></ul></ul><ul><ul><li>Access to Biological Collection Data (ABCD) </li></ul></ul><ul><li>Protocols </li></ul><ul><ul><li>TAPIR </li></ul></ul>
  20. 20. Solutions <ul><li>Controlled Vocabularies </li></ul><ul><ul><li>MeSH, ZooBank, IPNI, ITIS </li></ul></ul><ul><li>Ontologies </li></ul><ul><ul><li>Gene Ontology (GO) </li></ul></ul><ul><ul><li>Science Environment for Ecological Knowledge (SEEK) </li></ul></ul><ul><ul><li>EcoGrid </li></ul></ul><ul><ul><li>Leopold Semi-Automated ontology generation for Amphibian Morphology DBI-0640053 </li></ul></ul><ul><li>(Semantic) web software </li></ul><ul><li>DataNet </li></ul>
  21. 21. Institutional Solutions <ul><li>Well Paid Librarians </li></ul><ul><li>Well-heeled Museums </li></ul><ul><li>Professional societies </li></ul><ul><li>Generous Publishers </li></ul>Library director John Hanson told the Associated Press that a couple of dozen people are cited each year for failure to return materials or pay fines. The incident cost Dalibor about $30 for the two overdue paperbacks. It cost her mother $172 to free her.
  22. 22. Organizational Solutions <ul><li>Phase One of a Lake Victoria Biodiversity Informatics Project </li></ul><ul><li>DataNet (DataOne and Data Conservancy) </li></ul><ul><li>Dryad </li></ul><ul><li>LTER, NEON, GBIF, TDWG </li></ul><ul><li>National Center for Ecological Analysis and Synthesis (NCEAS) </li></ul><ul><li>National Evolutionary Synthesis Center (NESCent) </li></ul><ul><li>European Union Networks of Excellence (NoE) </li></ul><ul><li>European Distributed Institute of Taxonomy (EDIT) </li></ul>
  23. 23. Education Programs <ul><li>Biological Information Specialist </li></ul><ul><li>Concentration in Data Curation (MSLIS) </li></ul><ul><li>Certificate of Advanced Study in Data Curation </li></ul><ul><li>Summer Institutes in Data Curation </li></ul><ul><li>Information and professional education in biodiversity informatics </li></ul>
  24. 24. Biological Information Specialists <ul><li>At present: </li></ul><ul><li>Biologists at all degree levels self-trained in information technology </li></ul><ul><li>Information technologists at all degree levels self-trained in biology </li></ul><ul><ul><li>(both with gaps in knowledge for many months, years) </li></ul></ul><ul><li>Differing roles of BIS in large and small science </li></ul>
  25. 25. Master of Science in Biological Informatics <ul><li>Degree Program began September 2007 </li></ul><ul><li>Part of campus-wide bioinformatics masters program </li></ul><ul><li>NSF/CISE/IIS, Education Research and Curriculum Development, 0534567 (Palmer, PI) </li></ul><ul><li>Combines Biology, Bioinformatics, Computer Science core with LIS courses </li></ul>
  26. 26. What does a BIS need to know? <ul><li>Biological training and interest in solving biological research problems </li></ul><ul><li>Information skills </li></ul><ul><li>Evaluation and implementation of information systems: user based assessment and continual quality improvement for the development of tools that work and are used. </li></ul><ul><li>Information acquisition, management, and dissemination: development of digital libraries, data archives, institutional repositories, and related tools. </li></ul><ul><li>Information organization and integration: ontology development, structuring information for optimal use and sharing, and standards development. </li></ul>
  27. 27. UIUC bioinformatics core coursework <ul><li>Cross-disciplinary course distribution requirement </li></ul><ul><ul><ul><li>Bioinformatics: Computing in Molecular Biology Algorithms in Bioinformatics Principles of Systematics </li></ul></ul></ul><ul><ul><ul><li>Computer Science: Algorithms Database Systems </li></ul></ul></ul><ul><ul><ul><li>Biology: Human Genetics Introductory Biochemistry Macromolecular Modeling </li></ul></ul></ul>
  28. 28. Sample of existing LIS courses <ul><li>Information Organization and Knowledge Representation </li></ul><ul><li>LIS 551 Interfaces to Information Systems </li></ul><ul><li>LIS 590DM Document Modeling </li></ul><ul><li>LIS 590RO Representing and Organizing Information Resources </li></ul><ul><li>LIS590ON Ontologies in Natural Science </li></ul><ul><li>Information Resources, Uses and users </li></ul><ul><li>LIS 503 Use and Users of Information </li></ul><ul><li>LIS 522 Information Sources in the Sciences </li></ul><ul><li>LIS 590TR Information Transfer and Collaboration in Science </li></ul><ul><li>Information Systems </li></ul><ul><li>LIS 456 Information Storage and Retrieval </li></ul><ul><li>LIS 509 Building Digital Libraries </li></ul><ul><li>LIS 566 Architecture of Network Information Systems </li></ul><ul><li>LIS 590EP Electronic Publishing </li></ul><ul><li>Disciplinary Focus </li></ul><ul><li>LIS 530B Health Sciences Information Services and Resources </li></ul><ul><li>LIS 590HI Healthcare Informatics (Healthcare Infrastructure) </li></ul><ul><li>LIS 590EI/BDI Ecological Informatics (Biodiversity Informatics) </li></ul>
  29. 29. MSLIS Data Curation Concentration <ul><li>Data Curation Educational Program (DCEP) </li></ul><ul><ul><li>IMLS – Laura Bush 21 st Century Librarian Program, </li></ul></ul><ul><ul><ul><li>RE-05-06-0036-06 (Heidorn, PI) </li></ul></ul></ul><ul><li>Students with the DC concentration will be trained to add value to data and promote sharing across labs and disciplinary specializations </li></ul>
  30. 30. DCEP Curriculum <ul><li>Required of All Master's Students </li></ul><ul><li>LIS501 Information Organization and Access </li></ul><ul><li>LIS502 (2 hrs only) Libraries, Information and Society </li></ul><ul><li>Required for the DC Concentration </li></ul><ul><li>LIS590DC Foundations of Data Curation </li></ul><ul><li>LIS590PD Digital Preservation </li></ul><ul><li>LIS453 Systems Analysis and Management </li></ul><ul><li>Field Experience Seminar (Req’d if taking a practicum, 2 hours) </li></ul>
  31. 31. DCEP courses, cont’d <ul><li>DCEP List of Recommended Electives </li></ul><ul><li>(Students required to take two, we recommend four) </li></ul><ul><li>LIS452 Foundations of Information Processing in LIS </li></ul><ul><li>LIS590BDI Biodiversity and Ecoinformatics </li></ul><ul><li>LIS590DI Digital Libraries: Research and Practice </li></ul><ul><li>LIS590DM Document Modeling </li></ul><ul><li>LIS590IM Information Modeling </li></ul><ul><li>LIS590MD Metadata in Theory and Practice </li></ul><ul><li>LIS590OD Ontology Development </li></ul><ul><li>LIS590RO Representing and Organizing Information Resources </li></ul>
  32. 32. <ul><li>Foundations of Data Curation </li></ul><ul><li>Digital Data and Collections </li></ul><ul><li>Scholarly Communication and Scientific Information Work </li></ul><ul><li>Lifecycles, Workflows; Data Re-use and Value </li></ul><ul><li>Infrastructures and Repositories </li></ul><ul><li>Selection and Appraisal </li></ul><ul><li>Metadata, Standards and Protocols </li></ul><ul><li>Archiving and Preservation </li></ul><ul><li>Intellectual Property and Legal Issues </li></ul><ul><li>Policy, Collaboration and Cooperative Alignments </li></ul><ul><li>Assignments on: </li></ul><ul><li>Analysis of Data Management Plans </li></ul><ul><li>Discipline-based data curation needs assessment </li></ul>Core course content <ul><li>Digital Preservation </li></ul><ul><li>Archival Theory & Diplomatics </li></ul><ul><li>OAIS Reference Model </li></ul><ul><li>Data Formats </li></ul><ul><li>Digital Archival Objects </li></ul><ul><li>Data Curation </li></ul><ul><li>Preservation Strategies: </li></ul><ul><li>Emulation vs. migration </li></ul><ul><li>Authenticity, Integrity & Trust </li></ul><ul><li>Evaluation & Value </li></ul><ul><li>Digital Preservation & The Law </li></ul><ul><li>Assignments on: </li></ul><ul><li>Planning Grant Application </li></ul><ul><li>Trusted Repository Assessment </li></ul>
  33. 33. Summer Institute in Data Curation 1 <ul><li>Seminar format </li></ul><ul><ul><li>opportunities for small group discussion </li></ul></ul><ul><ul><li>hands-on session </li></ul></ul><ul><li>10 presenters (GSLIS; National Snow and Ice Data Center; Purdue, UIUC, and Johns Hopkins Univ. Libraries) </li></ul><ul><li>6-person panel (3 librarians and 3 scientists): Librarians and Scientists </li></ul><ul><li>Keynote by Anna Gold, Associate Dean for Public Services at the at California Polytechnic State University </li></ul>
  34. 34. Field Work Opportunities <ul><li>Internships </li></ul><ul><li>6 week, funded placements </li></ul><ul><li>project-oriented </li></ul><ul><ul><li>Digital Research and Curation Center, Sheridan Libraries, Johns Hopkins University (2008) </li></ul></ul><ul><ul><li>National Agriculture Library, USDA (2009) </li></ul></ul><ul><ul><li>Distributed Data Curation Center, Purdue University Libraries (2009) </li></ul></ul><ul><ul><li>Smithsonian (2009) </li></ul></ul><ul><li>Practica </li></ul><ul><li>100 hours, course credit </li></ul><ul><li>organizational orientation; shadowing </li></ul><ul><ul><li>Nat’l Snow and Ice Data Center (2009) </li></ul></ul>
  35. 35. New research directions <ul><li>Focus on integration and scale </li></ul><ul><li>Informatics infrastructure as competitive edge </li></ul><ul><ul><li>Sample areas of development </li></ul></ul><ul><ul><li>Landinformatics Group </li></ul></ul><ul><ul><ul><li>Atmospheric science, hydrology, nutrient balance, carbon cycle, ecology, agronomy </li></ul></ul></ul><ul><li>Focus on data integration problems across larger range of sciences </li></ul>
  36. 36. Conclusion <ul><li>Data is increasingly a scholarly product </li></ul><ul><li>Data is currently no managed for the long term </li></ul><ul><li>Libraries are the logical institutions to manage data </li></ul><ul><li>Additional Training will be needed </li></ul><ul><li>Librarians do not do it for free unless they want to </li></ul>

Notas del editor