SlideShare una empresa de Scribd logo
1 de 30
HathiTrust Research Center Data Capsule
v1.0: An Overview of Functionality
IU Libraries’ Digital Library Brownbag Series | 09.10.14
Beth Plale | Jiaan Zeng | Robert H. McDonald | Miao Chen
Data To Insight Center
Indiana University
Tweet us - @HathiTrust #HTRC
HATHI TRUST RESEARCH CENTER
Many thanks …
HTRC Data Capsule@IU Team
• Beth Plale (PI)
• Jiaan Zeng
• Guangchen Ruan
HTRC Data Capsule@Michigan Team
• Atul Prakash (PI)
• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and
Beth Plale. 2014. Cloud computing data capsules for non-
consumptiveuse of texts. In Proceedings of the 5th ACM workshop
on Scientific cloud computing (ScienceCloud '14). ACM, New York,
NY, USA, 9-16. DOI=10.1145/2608029.2608031
http://doi.acm.org/10.1145/2608029.2608031
Special Thanks to
• Samitha Liyanage
• Milinda Pathirage
• Zong Peng
• Earlence Fernandes
• Ajit Aluri
9/10/2014 #HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of academic &
research institutions, offering a collection of
millions of titles digitized from libraries around
the world.
– IU is a founding member of the HathiTrust along
with University of Michigan, University of
California, and the University of Virginia.
http://www.hathitrust.org/htrc
http://www.hathitrust.org
9/10/2014 #HTRC @HathiTrust
 HathiTrust is large corpus
providing opportunity for new
forms of computational
investigation
 Bigger the data, less able we
are to move it to researcher’s
desktop
 Research on large collections
requires
computation moves to data,
not data to computation
9/10/2014 #HTRC @HathiTrust
Mission of the HT Research Center
• Public research arm of HathiTrust
• Goal: enable researchers world-wide to
accomplish tera-scale text data-mining and
analysis
– Develop cutting-edge software tools for processing,
analyzing text
– Develop cyberinfrastructure to enable HPC access to
the HathiTrust Digital Library
• Established: July, 2011
• Collaborative center: Indiana University &
University of Illinois
9/10/2014 #HTRC @HathiTrust
HTRC Timeline
• Phase I: development 01 Jul 2011 – 31 Mar 2013
– HTRC software and services release v1.0
https://github.com/htrc
• Phase II: outreach, 01 Apr 2013 – 30 June 2014
– 2nd HTRC UnCamp Sep ’13
• Phase III: operations, 01 July 2014 - present
9/10/2014 #HTRC @HathiTrust
HTRC 2014-2018
org chart
HathiTrust HTRC Advisory Board
HTRC Execu ve
Management
Administra ve Support
Senior Library Personnel
(4 supervisors at .05 FTE)
Senior Project
Coordinator
(.25 FTE)
Execu ve Assistant
(.5 FTE)
Core Development
Sr. So ware Architect
(1.0 FTE)
Research Programmer
(.5 FTE)
Library Research
Programmer
(.5 FTE)
IU Systems
Administrator
(.25 FTE)
User Interface Specialist
(2 years at 1.0 FTE)
Informa cs Developers
(2 developers for 2 years
at .15 FTE)
Advanced Research
CS PhD Students
LIS PhD Students
UI Systems
Administrator
(.5 FTE)
Advanced Collabora ve
Support (coordinated by
M. Chen)
Research Programmer
(.5 FTE)
Computa onal Research
Liaison
(.5 FTE)
Asst Dir Outreach &
Educa on (M. Chen)
(1 year at .25 FTE)
Scholarly Commons
Dig Humani es Specialist
(1.0 FTE)
CLIR Postdoctoral
Research Associate
(2 years at 1.0 FTE)
Digital Research
Librarian support
(.2 FTE)
Scholars Commons
Support
(.5 FTE)
LIS MS Students
IU Managing Director
(.25 FTE)
UI Managing Director
(.11 FTE)
Key:
Area
Proposed for funding by HathTrust
Funded by Indiana University
Funded by University of Illinois
IU Systems
Administrator
(.25 FTE)
User Interface Specialist
(2 years at 1.0 FTE)
Informa cs Developers
(2 developers for 2 years
at .15 FTE)
Key:
Area
Proposed for funding by HathTrust
Funded by Indiana University
Funded by University of Illinois
Proposed for joint funding by HathiTrust
Proposed for joint funding by HathiTrust
9/10/2014 #HTRC @HathiTrust
Scholarly Commons
User Support Service
• Develop training materials
• Educational workshops
• Tool and workset creation
• Collaborate with librarians and DH
centers at HT institutions
• Assist researchers in HTRC text data
mining research projects
• Led out of University of Illinois
Library; smaller group at IU
• Resourced at 2.7 FTE.
8
Administra ve Support
Senior Library Personnel
(4 supervisors at .05 FTE)
Senior Project
Coordinator
(.25 FTE)
Execu ve Assistant
(.5 FTE)
Core Development
Sr. So ware Architect
(1.0 FTE)
Research Programmer
(.5 FTE)
Library Research
Programmer
(.5 FTE)
IU Systems
Administrator
(.25 FTE)
User Interface Specialist
(2 years at 1.0 FTE)
Informa cs Developers
(2 developers for 2 years
at .15 FTE)
Advanced Research
CS PhD Students
LIS PhD Students
UI Systems
Administrator
(.5 FTE)
Advanced Collabora ve
Support (coordinated by
M. Chen)
Research Programmer
(.5 FTE)
Computa onal Research
Liaison
(.5 FTE)
Asst Dir Outreach &
Educa on (M. Chen)
(1 year at .25 FTE)
Scholarly Commons
Dig Humani es Specialist
(1.0 FTE)
CLIR Postdoctoral
Research Associate
(2 years at 1.0 FTE)
Digital Research
Librarian support
(.2 FTE)
Scholars Commons
Support
(.5 FTE)
LIS MS Students
(.25 FTE) (.11 FTE)
Key:
Area
Proposed for funding by HathTrust
9/10/2014 #HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with
other users over duration of one or multiple
sessions can result in sufficient information
gathered from collection of copyrighted works
to reassemble pages from collection.
• Definition disallows collusion between users,
or accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
HTRC
Complexity hiding interface
All the complexity
Tabular info
Statistical plots
Spatial plots
Request
9/10/2014 #HTRC @HathiTrust
HTRC v2.0
9/10/2014 #HTRC @HathiTrust
HTRC v2.0 + HTRC Data Capsule
• There is a mismatch between what HTRC v2.0
provides and users’ needs.
– HTRC v2.0 provides predefined algorithms to users
and runs them on users’ behalf. This is to prevent
copyrighted data leak.
– However, a user usually wants to run her own
algorithm and exam the results interactively.
• HTRC Data Capsule is developed to strike a
balance between preventing data leak while
keeping HTRC as flexible as possible to users.
9/10/2014 #HTRC @HathiTrust
Research Questions
• Non-consumptive use*: can framework provide
safe handling of large amounts of protected data?
• Openness: can framework support user-
contributed analysis without resorting to code
walkthroughs prior to acceptance?
• Large-scale and low cost: can protections be
extended to utilization of large-scale national
(public) computational resources?
*Non-consumptive use is defined as computational analysis of the copyrighted
content that is carried out in such a way that human consumption of texts is
prohibited.
9/10/2014 #HTRC @HathiTrust
HTRC Data Capsule
• Provisions virtual machines (VM) for
researchers to run their algorithms over
copyrighted data.
• Trusts researchers to not deliberately leak
copyrighted data.
• Prevents malware acting on researcher’s
behalf from leaking data.
9/10/2014 #HTRC @HathiTrust
Building Block – Data Capsule
Data CapsuleData Capsuleuser
sensitive data
output constraint
arbitrary input
arbitrary output
Computation is carried
out inside Data Capsule.
K. Borders, E. V. Weele, B. Lau, and A. Prakash.
Protecting confidential data on personal computers with storage capsules.
In 18th USENIX Security Symposium, SSYM’09, pages 367–382. USENIX Association, 2009.
9/10/2014 #HTRC @HathiTrust
Design Options
• HTRC Data Capsule extends data capsule to
build a cloud environment around data
capsule to serve multiple users.
– Build the system around an existing cloud
platform, e.g., OpenStack;
– Build the system from scratch through web service
and QEMU.
9/10/2014 #HTRC @HathiTrust
Design Options
• HTRC Data Capsule extends data capsule to
build a cloud environment around data
capsule to serve multiple users.
– Build the system around an existing cloud
platform, e.g., OpenStack, Eucalyptus;
– Build the system from scratch through web service
and QEMU.
(Data Capsule relies on low level control of the VM which requires a lot of
customizations of existing cloud platforms. In addition, OpenStack allows a
user to configure the VM network which poses threats to Data Capsule.)
9/10/2014 #HTRC @HathiTrust
Design Options
• HTRC Data Capsule extends data capsule to
build a cloud environment around data
capsule to serve multiple users.
– Build the system around an existing cloud
platform, e.g., OpenStack;
– Build the system from scratch through web
services and QEMU.
(This option gives us the highest degree of flexibility to implement HTRC
Data Capsule.)
9/10/2014 #HTRC @HathiTrust
Threat Model
• The user is trustworthy.
• The virtual machine manager and the host it
runs on are also trusted.
• The VM is NOT trusted. We assume the
possibility of malware being installed as well
as other remotely initiated attacks on the VM,
which are undetectable to the user.
9/10/2014 #HTRC @HathiTrust
Threat Model (Cont.)
• The VNC session and final result download are
two channels which data could leak from
potentially.
– For VNC session, we could encrypt the session to
prevent eavesdropping.
– For final result download, we could monitor traffic on
the release channel as a means to automatically
detect leakage.
• Covert channels between VMs on the same host
also could leak data potentially.
– In the future, we could run VMs on separated hosts to
provide strong isolation.
9/10/2014 #HTRC @HathiTrust
HTRC Data Capsule Architecture
Host-1
VM-1 …
…
Hypervisor Scripts
Web Services
Web UI
Database
User Authentication
Firewall
Audit
ImageStore
VolumeStore
VM-k
Host-N
VM-1 … VM-k
WebfrontendWebserviceBackend
9/10/2014 #HTRC @HathiTrust
HTRC Data Capsule Workflow
9/10/2014 #HTRC @HathiTrust
HTRC Data Capsule Access
Leak through network?
Leak through transition?
9/10/2014 #HTRC @HathiTrust
Data Capsule Mode Switch
Data Capsule
(VM)
Data Capsule
(VM)
Data Capsule
(VM)
Secure volume
1. Snapshot
2. Switch to secure mode
Copyrighted
texts
3. Copyrighted texts and
secure volume are available.
4. Switch to
maintenance mode
5. Snapshot is
restored.
Network is blocked.
VM state in secure mode is discarded.
9/10/2014 #HTRC @HathiTrust
VM Operations Screenshots
VM in shutdown state.
VM in maintenance mode.
VM in secure mode.
9/10/2014 #HTRC @HathiTrust
VM Access Screenshots
Maintenance Mode
Secure Mode
9/10/2014 #HTRC @HathiTrust
User Feedback
• Non-consumptive use
– Initial users report that they can only access the
internet in maintenance mode and HTRC data
service in secure mode. They can neither make
persistent changes to VMs in secure mode, nor
access other users’ VMs by SSH’ing.
• Openness and efficiency
– Initial users report that they are able to configure
the VM as needed, and run their analysis against
HTRC data interactively.
9/10/2014 #HTRC @HathiTrust
Future Work
• The upcoming hands-on session!
– Sep 15, 2-5pm
– Wells Library E174
– Bring your own laptop
– One of the Scholar Commons Events
– Register at the Scholar Commons page [1]
– Work on text analytics tasks in the HTRC Data
Capsule environment
[1] http://libraries.iub.edu/tools/workshops/workshop-listings/series-view/283/series
9/10/2014 #HTRC @HathiTrust
Future Work
• Copyrighted content in progress
• Advanced Collaborative Support
– The award model
– Award content is HTRC ACS staff time
– Collaborate with scholars on addressing their research needs related
to HTRC
– E.g. prototyping, running text analysis
– Advocate open source; encourage extending the work to a grant
submission
• Scholars Commons
– Interaction with scholars to help using HTRC tools and services
– An interface to interact with HTRC users via the channel of scholars
commons
– Series of workshops at IU and other places
– Weekly consulting time
– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R
– Contact: Miao Chen, Nicholae Cline
9/10/2014 #HTRC @HathiTrust
• For details http://www.hathitrust.org/htrc/faq
• General contact info
– Beth Plale, Director HTRC, plale@indiana.edu
• Requests for capability, interest
– Miao Chen, Asst. Director for Outreach HTRC
miaochen@indiana.edu

Más contenido relacionado

La actualidad más candente

DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...Kristin Briney
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarKristin Briney
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO) LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO) Amanda Bauer
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
 
Ala cspace aspace rep services demo 2015
Ala cspace aspace rep services demo 2015Ala cspace aspace rep services demo 2015
Ala cspace aspace rep services demo 2015LYRASIS
 
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked DataIn search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Datajonblower
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Sciencepetermurrayrust
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management BarrierKristin Briney
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Kristin Briney
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) CommonsJames Hendler
 
NCURA Webinar on Open Data
NCURA Webinar on Open DataNCURA Webinar on Open Data
NCURA Webinar on Open DataKristin Briney
 

La actualidad más candente (20)

DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG Webinar
 
Escaping Datageddon
Escaping DatageddonEscaping Datageddon
Escaping Datageddon
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO) LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO)
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
Ala cspace aspace rep services demo 2015
Ala cspace aspace rep services demo 2015Ala cspace aspace rep services demo 2015
Ala cspace aspace rep services demo 2015
 
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked DataIn search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management Barrier
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
School intro
School introSchool intro
School intro
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
NCURA Webinar on Open Data
NCURA Webinar on Open DataNCURA Webinar on Open Data
NCURA Webinar on Open Data
 

Destacado

Harun Yahya Islam Eternity Has Already Begun
Harun Yahya Islam   Eternity Has Already BegunHarun Yahya Islam   Eternity Has Already Begun
Harun Yahya Islam Eternity Has Already Begunzakir2012
 
Kompetenzentwicklung für Content-Strategie #cosca16
Kompetenzentwicklung für Content-Strategie #cosca16Kompetenzentwicklung für Content-Strategie #cosca16
Kompetenzentwicklung für Content-Strategie #cosca16Kai Heddergott
 
Camino a los yungas
Camino a los yungasCamino a los yungas
Camino a los yungasjazz_3c
 
Folleto Proxima Generación de Linóleo
Folleto Proxima Generación de Linóleo Folleto Proxima Generación de Linóleo
Folleto Proxima Generación de Linóleo Forbo Pavimentos, S.A.
 
Aprender a jugar al poker
Aprender a jugar al pokerAprender a jugar al poker
Aprender a jugar al pokerpokerdekk
 
Vilde Urbans area
Vilde Urbans area Vilde Urbans area
Vilde Urbans area SofiaNim
 
Si no estoy en Twitter no me conoce ni el Tato
Si no estoy en Twitter no me conoce ni el TatoSi no estoy en Twitter no me conoce ni el Tato
Si no estoy en Twitter no me conoce ni el TatoElia Méndez Bravo
 
Dekra insight safety excellence symposium
Dekra insight safety excellence symposiumDekra insight safety excellence symposium
Dekra insight safety excellence symposiumMusallam Khaifi
 
Presentación diego miranda candidato decano. debate profesores
Presentación diego miranda candidato decano. debate profesoresPresentación diego miranda candidato decano. debate profesores
Presentación diego miranda candidato decano. debate profesoresdmirandal01
 
Internet of Things, Cloud & Co.
Internet of Things, Cloud & Co.Internet of Things, Cloud & Co.
Internet of Things, Cloud & Co.Damir Dobric
 
SEOGuardian - Plantas Online en Espanya
SEOGuardian - Plantas Online en EspanyaSEOGuardian - Plantas Online en Espanya
SEOGuardian - Plantas Online en EspanyaBint
 
Kolab - Groupware unlike others
Kolab - Groupware unlike othersKolab - Groupware unlike others
Kolab - Groupware unlike othersChristoph Wickert
 
VI Congreso Biobancos 2015
VI Congreso Biobancos 2015VI Congreso Biobancos 2015
VI Congreso Biobancos 2015fundacioudl
 
Resultate elternumfrage neue medien ps buerglen
Resultate elternumfrage neue medien ps buerglenResultate elternumfrage neue medien ps buerglen
Resultate elternumfrage neue medien ps buerglenEmanuel Schönholzer
 
Mexico 3070 user group meeting 2012 test coverage john
Mexico 3070 user group meeting 2012  test coverage johnMexico 3070 user group meeting 2012  test coverage john
Mexico 3070 user group meeting 2012 test coverage johnInterlatin
 

Destacado (20)

Harun Yahya Islam Eternity Has Already Begun
Harun Yahya Islam   Eternity Has Already BegunHarun Yahya Islam   Eternity Has Already Begun
Harun Yahya Islam Eternity Has Already Begun
 
Kompetenzentwicklung für Content-Strategie #cosca16
Kompetenzentwicklung für Content-Strategie #cosca16Kompetenzentwicklung für Content-Strategie #cosca16
Kompetenzentwicklung für Content-Strategie #cosca16
 
Camino a los yungas
Camino a los yungasCamino a los yungas
Camino a los yungas
 
Impuestos
ImpuestosImpuestos
Impuestos
 
Folleto Proxima Generación de Linóleo
Folleto Proxima Generación de Linóleo Folleto Proxima Generación de Linóleo
Folleto Proxima Generación de Linóleo
 
Aprender a jugar al poker
Aprender a jugar al pokerAprender a jugar al poker
Aprender a jugar al poker
 
Vilde Urbans area
Vilde Urbans area Vilde Urbans area
Vilde Urbans area
 
Si no estoy en Twitter no me conoce ni el Tato
Si no estoy en Twitter no me conoce ni el TatoSi no estoy en Twitter no me conoce ni el Tato
Si no estoy en Twitter no me conoce ni el Tato
 
Dekra insight safety excellence symposium
Dekra insight safety excellence symposiumDekra insight safety excellence symposium
Dekra insight safety excellence symposium
 
Presentación diego miranda candidato decano. debate profesores
Presentación diego miranda candidato decano. debate profesoresPresentación diego miranda candidato decano. debate profesores
Presentación diego miranda candidato decano. debate profesores
 
Internet of Things, Cloud & Co.
Internet of Things, Cloud & Co.Internet of Things, Cloud & Co.
Internet of Things, Cloud & Co.
 
SEOGuardian - Plantas Online en Espanya
SEOGuardian - Plantas Online en EspanyaSEOGuardian - Plantas Online en Espanya
SEOGuardian - Plantas Online en Espanya
 
Geomagz201401
Geomagz201401Geomagz201401
Geomagz201401
 
Kolab - Groupware unlike others
Kolab - Groupware unlike othersKolab - Groupware unlike others
Kolab - Groupware unlike others
 
Carretera de Apurimac 2
Carretera de Apurimac 2Carretera de Apurimac 2
Carretera de Apurimac 2
 
VI Congreso Biobancos 2015
VI Congreso Biobancos 2015VI Congreso Biobancos 2015
VI Congreso Biobancos 2015
 
Resultate elternumfrage neue medien ps buerglen
Resultate elternumfrage neue medien ps buerglenResultate elternumfrage neue medien ps buerglen
Resultate elternumfrage neue medien ps buerglen
 
Mexico 3070 user group meeting 2012 test coverage john
Mexico 3070 user group meeting 2012  test coverage johnMexico 3070 user group meeting 2012  test coverage john
Mexico 3070 user group meeting 2012 test coverage john
 
La Justicia Juvenil en Catalunya
La Justicia Juvenil en CatalunyaLa Justicia Juvenil en Catalunya
La Justicia Juvenil en Catalunya
 
Guia 4 de redes
Guia  4 de redesGuia  4 de redes
Guia 4 de redes
 

Similar a HathiTrust Research Center Data Capsule Overview 09.10.14

JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsBeth Plale
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
Project On-Science
Project On-ScienceProject On-Science
Project On-ScienceAmrit Ravi
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
Introduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureIntroduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureLarry Smarr
 
Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...
Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...
Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...Parthenos
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research dataARDC
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science CloudTERN Australia
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Ola Spjuth
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...Jenny Mitcham
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
 
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Mindtrek
 
Asteroid Observations - Real Time Operational Intelligence Series
Asteroid Observations - Real Time Operational Intelligence SeriesAsteroid Observations - Real Time Operational Intelligence Series
Asteroid Observations - Real Time Operational Intelligence SeriesStormBourne, LLC
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 

Similar a HathiTrust Research Center Data Capsule Overview 09.10.14 (20)

JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Project On-Science
Project On-ScienceProject On-Science
Project On-Science
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Introduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureIntroduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital Infrastructure
 
Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...
Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...
Parthenos Webinar How to work successfully with e-Humanities and e-Heritage R...
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science Cloud
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
 
Asteroid Observations - Real Time Operational Intelligence Series
Asteroid Observations - Real Time Operational Intelligence SeriesAsteroid Observations - Real Time Operational Intelligence Series
Asteroid Observations - Real Time Operational Intelligence Series
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Imac 090924
Imac 090924Imac 090924
Imac 090924
 

Más de Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesRobert H. McDonald
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesRobert H. McDonald
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Robert H. McDonald
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...Robert H. McDonald
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionRobert H. McDonald
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 

Más de Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 

Último

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 

HathiTrust Research Center Data Capsule Overview 09.10.14

  • 1. HathiTrust Research Center Data Capsule v1.0: An Overview of Functionality IU Libraries’ Digital Library Brownbag Series | 09.10.14 Beth Plale | Jiaan Zeng | Robert H. McDonald | Miao Chen Data To Insight Center Indiana University Tweet us - @HathiTrust #HTRC HATHI TRUST RESEARCH CENTER
  • 2. Many thanks … HTRC Data Capsule@IU Team • Beth Plale (PI) • Jiaan Zeng • Guangchen Ruan HTRC Data Capsule@Michigan Team • Atul Prakash (PI) • Alexander Crowell Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non- consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031 Special Thanks to • Samitha Liyanage • Milinda Pathirage • Zong Peng • Earlence Fernandes • Ajit Aluri
  • 3. 9/10/2014 #HTRC @HathiTrust HathiTrust Digital Library • HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world. – IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia. http://www.hathitrust.org/htrc http://www.hathitrust.org
  • 4. 9/10/2014 #HTRC @HathiTrust  HathiTrust is large corpus providing opportunity for new forms of computational investigation  Bigger the data, less able we are to move it to researcher’s desktop  Research on large collections requires computation moves to data, not data to computation
  • 5. 9/10/2014 #HTRC @HathiTrust Mission of the HT Research Center • Public research arm of HathiTrust • Goal: enable researchers world-wide to accomplish tera-scale text data-mining and analysis – Develop cutting-edge software tools for processing, analyzing text – Develop cyberinfrastructure to enable HPC access to the HathiTrust Digital Library • Established: July, 2011 • Collaborative center: Indiana University & University of Illinois
  • 6. 9/10/2014 #HTRC @HathiTrust HTRC Timeline • Phase I: development 01 Jul 2011 – 31 Mar 2013 – HTRC software and services release v1.0 https://github.com/htrc • Phase II: outreach, 01 Apr 2013 – 30 June 2014 – 2nd HTRC UnCamp Sep ’13 • Phase III: operations, 01 July 2014 - present
  • 7. 9/10/2014 #HTRC @HathiTrust HTRC 2014-2018 org chart HathiTrust HTRC Advisory Board HTRC Execu ve Management Administra ve Support Senior Library Personnel (4 supervisors at .05 FTE) Senior Project Coordinator (.25 FTE) Execu ve Assistant (.5 FTE) Core Development Sr. So ware Architect (1.0 FTE) Research Programmer (.5 FTE) Library Research Programmer (.5 FTE) IU Systems Administrator (.25 FTE) User Interface Specialist (2 years at 1.0 FTE) Informa cs Developers (2 developers for 2 years at .15 FTE) Advanced Research CS PhD Students LIS PhD Students UI Systems Administrator (.5 FTE) Advanced Collabora ve Support (coordinated by M. Chen) Research Programmer (.5 FTE) Computa onal Research Liaison (.5 FTE) Asst Dir Outreach & Educa on (M. Chen) (1 year at .25 FTE) Scholarly Commons Dig Humani es Specialist (1.0 FTE) CLIR Postdoctoral Research Associate (2 years at 1.0 FTE) Digital Research Librarian support (.2 FTE) Scholars Commons Support (.5 FTE) LIS MS Students IU Managing Director (.25 FTE) UI Managing Director (.11 FTE) Key: Area Proposed for funding by HathTrust Funded by Indiana University Funded by University of Illinois IU Systems Administrator (.25 FTE) User Interface Specialist (2 years at 1.0 FTE) Informa cs Developers (2 developers for 2 years at .15 FTE) Key: Area Proposed for funding by HathTrust Funded by Indiana University Funded by University of Illinois Proposed for joint funding by HathiTrust Proposed for joint funding by HathiTrust
  • 8. 9/10/2014 #HTRC @HathiTrust Scholarly Commons User Support Service • Develop training materials • Educational workshops • Tool and workset creation • Collaborate with librarians and DH centers at HT institutions • Assist researchers in HTRC text data mining research projects • Led out of University of Illinois Library; smaller group at IU • Resourced at 2.7 FTE. 8 Administra ve Support Senior Library Personnel (4 supervisors at .05 FTE) Senior Project Coordinator (.25 FTE) Execu ve Assistant (.5 FTE) Core Development Sr. So ware Architect (1.0 FTE) Research Programmer (.5 FTE) Library Research Programmer (.5 FTE) IU Systems Administrator (.25 FTE) User Interface Specialist (2 years at 1.0 FTE) Informa cs Developers (2 developers for 2 years at .15 FTE) Advanced Research CS PhD Students LIS PhD Students UI Systems Administrator (.5 FTE) Advanced Collabora ve Support (coordinated by M. Chen) Research Programmer (.5 FTE) Computa onal Research Liaison (.5 FTE) Asst Dir Outreach & Educa on (M. Chen) (1 year at .25 FTE) Scholarly Commons Dig Humani es Specialist (1.0 FTE) CLIR Postdoctoral Research Associate (2 years at 1.0 FTE) Digital Research Librarian support (.2 FTE) Scholars Commons Support (.5 FTE) LIS MS Students (.25 FTE) (.11 FTE) Key: Area Proposed for funding by HathTrust
  • 9. 9/10/2014 #HTRC @HathiTrust Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 10. HTRC Complexity hiding interface All the complexity Tabular info Statistical plots Spatial plots Request
  • 12. 9/10/2014 #HTRC @HathiTrust HTRC v2.0 + HTRC Data Capsule • There is a mismatch between what HTRC v2.0 provides and users’ needs. – HTRC v2.0 provides predefined algorithms to users and runs them on users’ behalf. This is to prevent copyrighted data leak. – However, a user usually wants to run her own algorithm and exam the results interactively. • HTRC Data Capsule is developed to strike a balance between preventing data leak while keeping HTRC as flexible as possible to users.
  • 13. 9/10/2014 #HTRC @HathiTrust Research Questions • Non-consumptive use*: can framework provide safe handling of large amounts of protected data? • Openness: can framework support user- contributed analysis without resorting to code walkthroughs prior to acceptance? • Large-scale and low cost: can protections be extended to utilization of large-scale national (public) computational resources? *Non-consumptive use is defined as computational analysis of the copyrighted content that is carried out in such a way that human consumption of texts is prohibited.
  • 14. 9/10/2014 #HTRC @HathiTrust HTRC Data Capsule • Provisions virtual machines (VM) for researchers to run their algorithms over copyrighted data. • Trusts researchers to not deliberately leak copyrighted data. • Prevents malware acting on researcher’s behalf from leaking data.
  • 15. 9/10/2014 #HTRC @HathiTrust Building Block – Data Capsule Data CapsuleData Capsuleuser sensitive data output constraint arbitrary input arbitrary output Computation is carried out inside Data Capsule. K. Borders, E. V. Weele, B. Lau, and A. Prakash. Protecting confidential data on personal computers with storage capsules. In 18th USENIX Security Symposium, SSYM’09, pages 367–382. USENIX Association, 2009.
  • 16. 9/10/2014 #HTRC @HathiTrust Design Options • HTRC Data Capsule extends data capsule to build a cloud environment around data capsule to serve multiple users. – Build the system around an existing cloud platform, e.g., OpenStack; – Build the system from scratch through web service and QEMU.
  • 17. 9/10/2014 #HTRC @HathiTrust Design Options • HTRC Data Capsule extends data capsule to build a cloud environment around data capsule to serve multiple users. – Build the system around an existing cloud platform, e.g., OpenStack, Eucalyptus; – Build the system from scratch through web service and QEMU. (Data Capsule relies on low level control of the VM which requires a lot of customizations of existing cloud platforms. In addition, OpenStack allows a user to configure the VM network which poses threats to Data Capsule.)
  • 18. 9/10/2014 #HTRC @HathiTrust Design Options • HTRC Data Capsule extends data capsule to build a cloud environment around data capsule to serve multiple users. – Build the system around an existing cloud platform, e.g., OpenStack; – Build the system from scratch through web services and QEMU. (This option gives us the highest degree of flexibility to implement HTRC Data Capsule.)
  • 19. 9/10/2014 #HTRC @HathiTrust Threat Model • The user is trustworthy. • The virtual machine manager and the host it runs on are also trusted. • The VM is NOT trusted. We assume the possibility of malware being installed as well as other remotely initiated attacks on the VM, which are undetectable to the user.
  • 20. 9/10/2014 #HTRC @HathiTrust Threat Model (Cont.) • The VNC session and final result download are two channels which data could leak from potentially. – For VNC session, we could encrypt the session to prevent eavesdropping. – For final result download, we could monitor traffic on the release channel as a means to automatically detect leakage. • Covert channels between VMs on the same host also could leak data potentially. – In the future, we could run VMs on separated hosts to provide strong isolation.
  • 21. 9/10/2014 #HTRC @HathiTrust HTRC Data Capsule Architecture Host-1 VM-1 … … Hypervisor Scripts Web Services Web UI Database User Authentication Firewall Audit ImageStore VolumeStore VM-k Host-N VM-1 … VM-k WebfrontendWebserviceBackend
  • 22. 9/10/2014 #HTRC @HathiTrust HTRC Data Capsule Workflow
  • 23. 9/10/2014 #HTRC @HathiTrust HTRC Data Capsule Access Leak through network? Leak through transition?
  • 24. 9/10/2014 #HTRC @HathiTrust Data Capsule Mode Switch Data Capsule (VM) Data Capsule (VM) Data Capsule (VM) Secure volume 1. Snapshot 2. Switch to secure mode Copyrighted texts 3. Copyrighted texts and secure volume are available. 4. Switch to maintenance mode 5. Snapshot is restored. Network is blocked. VM state in secure mode is discarded.
  • 25. 9/10/2014 #HTRC @HathiTrust VM Operations Screenshots VM in shutdown state. VM in maintenance mode. VM in secure mode.
  • 26. 9/10/2014 #HTRC @HathiTrust VM Access Screenshots Maintenance Mode Secure Mode
  • 27. 9/10/2014 #HTRC @HathiTrust User Feedback • Non-consumptive use – Initial users report that they can only access the internet in maintenance mode and HTRC data service in secure mode. They can neither make persistent changes to VMs in secure mode, nor access other users’ VMs by SSH’ing. • Openness and efficiency – Initial users report that they are able to configure the VM as needed, and run their analysis against HTRC data interactively.
  • 28. 9/10/2014 #HTRC @HathiTrust Future Work • The upcoming hands-on session! – Sep 15, 2-5pm – Wells Library E174 – Bring your own laptop – One of the Scholar Commons Events – Register at the Scholar Commons page [1] – Work on text analytics tasks in the HTRC Data Capsule environment [1] http://libraries.iub.edu/tools/workshops/workshop-listings/series-view/283/series
  • 29. 9/10/2014 #HTRC @HathiTrust Future Work • Copyrighted content in progress • Advanced Collaborative Support – The award model – Award content is HTRC ACS staff time – Collaborate with scholars on addressing their research needs related to HTRC – E.g. prototyping, running text analysis – Advocate open source; encourage extending the work to a grant submission • Scholars Commons – Interaction with scholars to help using HTRC tools and services – An interface to interact with HTRC users via the channel of scholars commons – Series of workshops at IU and other places – Weekly consulting time – Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R – Contact: Miao Chen, Nicholae Cline
  • 30. 9/10/2014 #HTRC @HathiTrust • For details http://www.hathitrust.org/htrc/faq • General contact info – Beth Plale, Director HTRC, plale@indiana.edu • Requests for capability, interest – Miao Chen, Asst. Director for Outreach HTRC miaochen@indiana.edu

Notas del editor

  1. Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031
  2. The Scholarly Commons User Support service gives HT institutions exclusive access to training and learning materials that help them establish programs that integrate HTRC tools and services into their scholarly commons programs in libraries and digital humanities centers. The SC will be physically located on the University of Illinois Library’s Scholarly commons. Several Library staff and faculty will support this service. Key among these is the Digital Humanities Research Specialist who will assist with the development of training and outreach initiatives in support of researchers working with the Hathi Trust Research Center and HathiTrust digital library affiliates who seek to start their own HTRC research services. This will involve planning, implementation and continuous development of training materials, educational workshops, and potential tools, and outreach activities in support of the usage of HTRC tools and datasets. The HTRC Digital H. Specialist will focus on development of HTRC research services at HathiTrust member institutions, and will collaborate with public services and data services librarians at HathiTrust member institutions on developing support services for digital humanities research with HTRC corpus. The specialist will work closely with the English and Digital Humanities Librarian at the University of Illinois Library to develop research data services for the humanities, with particular emphasis on the HTRC corpus and tools. Additional professionals are focused on related aspects of HTRC work, including a CLIR Postdoc researching user requirements for HTRC tools, a Technical Specialist and other technical support. These professionals contribute to the work of the Scholarly Commons and to the HT community in helping to articulate the relationship between new technologies and humanities scholarship to the community of humanists; and in advising teaching faculty on the usage of digitized textual corpora and providing technical support for use of analytical tools. The scope and responsibilities will evolve in accordance with priorities established by the Library and HathiTrust community.     The specialist will spend up to 20 percent of their time on the support of research work with the HTRC. Examples of currently supported digital humanities projects involving the HTRC corpus include: A text mining project of eighteenth-century novels for changes in dialect; A textual analysis of nineteenth-century women's serial novels for thematic patterns; A comparative literature textual analysis project; Topic modeling of twentieth-century texts for depictions of African-American women.
  3. HTRC hides complexity of analytics. In this sense, it is like Google search, which is a simple interface that hides complexity to search billions of pages. The kinds of things returned from HTRC interaction are spatial relationship of words (and their frequency obviously), statistical plots of information or tabular information.
  4. Shifting the complexity hiding interface to the right, we open up the cloud to see what’s inside. HTRC at it simplest has 1) algorithms – these are drawn from SEASR and from other analysis tool suites including Mahout and mapreduce, the 2) HT corpus (and subsets of the corpus that users either have personally as part of a workset, or are publically available, and 3) other data sets that are used. HTRC brokers the bringing together of these pieces so that computation can take place on a resource like Big Red II (or XSEDE). Note that there is an arrow from the compute engine to the complexity hiding interface. This is because researcher interaction with the texts isn’t an automated workflow; it is one requiring levels of interaction with the computation as it is running.