SlideShare una empresa de Scribd logo
1 de 21
Big Data Processing 
in the Cloud 
Zhiwu Xie and Collin Brittle
Data are abundant; 
(re)usable data are scarce. 
2 
- Jane Silverthorne 
Deputy Assistant Director, NSF
3 
Outline 
• Data, big data, and the library 
• Why should the library process big data? A case 
study: SEB 
• How is this different? Some characterizations. 
• How can the cloud computing help?
Data, Big Data, and Library 
• Research data as the first class citizen 
• Data management mandates, OSTP memorandum, … 
• Research libraries’ roles 
• data consultancy, DMP Tools, institutional repositories, 
DPN, SHARE, … 
• From data to big data 
4
The Big Data Long Tails 
• Small number of large data sets as the head; large 
number of smaller, messier data sets as the tail 
• Small number of “grand-challenge” related data sets 
as the head; large number of modestly funded 
researches that produce large data sets as the tail 
5
The Libraries’ Role 
• Large data sets and grand challenges -> National 
infrastructure, e.g., XSEDE 
• Smaller, messier -> Disciplinary repositories, 
institutional repositories 
• What about the large data sets that are not initially 
recognized as grand-challenge related? 
• Much easier to produce large data sets today 
• Their values must not be underestimated 
• Most institutional repositories limit the size of deposit, e.g., to 
10G 
6
SEB: A Case Study 
7
Signature Engineering Building 
8
Signature Engineering Building 
9
Sensors 
10
Smart Infrastructure 
11
Data Sharing 
• Encourage exploratory and multidisciplinary research 
• Foster open and inclusive communities around 
• modeling of dynamic systems 
• structural health monitoring and damage detection 
• occupancy studies 
• sensor evaluation 
• data fusion 
• energy reduction 
• evacuation management 
• … 
12
Characterization 
• Compute intensive 
• Storage intensive 
• Potentially bandwidth intensive 
• On-demand 
• Scalability challenge 
13
Compute Intensive 
• About 6GB raw data per hour 
• Must be continuously processed, ingested, and 
further processed 
• User-generated computations 
• Must not interfere with data retrieval 
14
Storage Intensive 
• SEB will accumulate about 60TB of raw data per year 
• To facilitate researches on long-term effects, we must 
keep raw data for an extended period of time, e.g., 
>= 5 years 
• VT currently does not have an affordable storage 
facility to hold this much data 
• Within XSEDE, only TACC’s Ranch can allocate this 
much long-term storage 
15
(Potentially) Bandwidth Intensive 
• What if hundreds of researchers around the world 
each tried to download hundreds of TB of our data? 
16
On Demand 
• Explorative and multidisciplinary researches cannot 
predict the data usage a priori 
17
Scalability 
• How to deal with these challenges in a scalable 
manner? 
18
Big Data + Cloud 
• Affordable 
• Elastic 
• Scalable 
19
A Data Reuse Scenario 
• Validate cross-disciplinary research hypothesis, e.g., 
find a novel vibration pattern against all SEB data 
• Lower the reuse barrier: must not require users to 
invest heavily on infrastructure before the initial 
analysis 
• No need to move data around 
• Can perform user-specified initial data filtering and analysis 
• Initial analysis results may be used to enrich the metadata 
and facilitate further discovery 
• Cost sharing 
20
Thank You! 
• Questions? Comments? 
• zhiwuxie@vt.edu 
21

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data network
 
Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 
The WSTIERIA Project – A Web of Services
The  WSTIERIA Project – A Web of ServicesThe  WSTIERIA Project – A Web of Services
The WSTIERIA Project – A Web of Services
 
Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-TiessenData Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-Tiessen
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 
API economy
API economyAPI economy
API economy
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShareCollaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 

Destacado

Destacado (8)

Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Big Data Processing in the Cloud: A Hydra/Sufia Experience
Big Data Processing in the Cloud: A Hydra/Sufia ExperienceBig Data Processing in the Cloud: A Hydra/Sufia Experience
Big Data Processing in the Cloud: A Hydra/Sufia Experience
 
Big data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiBig data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - Sogeti
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar a Sept 24 NISO Virtual Conference: Library Data in the Cloud

New Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaNew Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
Institute of Contemporary Sciences
 

Similar a Sept 24 NISO Virtual Conference: Library Data in the Cloud (20)

Big Data
Big Data Big Data
Big Data
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data Rampage
 
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaNew Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Research network infrastructure engineers
Research network infrastructure engineersResearch network infrastructure engineers
Research network infrastructure engineers
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 

Más de National Information Standards Organization (NISO)

Más de National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

Sept 24 NISO Virtual Conference: Library Data in the Cloud

  • 1. Big Data Processing in the Cloud Zhiwu Xie and Collin Brittle
  • 2. Data are abundant; (re)usable data are scarce. 2 - Jane Silverthorne Deputy Assistant Director, NSF
  • 3. 3 Outline • Data, big data, and the library • Why should the library process big data? A case study: SEB • How is this different? Some characterizations. • How can the cloud computing help?
  • 4. Data, Big Data, and Library • Research data as the first class citizen • Data management mandates, OSTP memorandum, … • Research libraries’ roles • data consultancy, DMP Tools, institutional repositories, DPN, SHARE, … • From data to big data 4
  • 5. The Big Data Long Tails • Small number of large data sets as the head; large number of smaller, messier data sets as the tail • Small number of “grand-challenge” related data sets as the head; large number of modestly funded researches that produce large data sets as the tail 5
  • 6. The Libraries’ Role • Large data sets and grand challenges -> National infrastructure, e.g., XSEDE • Smaller, messier -> Disciplinary repositories, institutional repositories • What about the large data sets that are not initially recognized as grand-challenge related? • Much easier to produce large data sets today • Their values must not be underestimated • Most institutional repositories limit the size of deposit, e.g., to 10G 6
  • 7. SEB: A Case Study 7
  • 12. Data Sharing • Encourage exploratory and multidisciplinary research • Foster open and inclusive communities around • modeling of dynamic systems • structural health monitoring and damage detection • occupancy studies • sensor evaluation • data fusion • energy reduction • evacuation management • … 12
  • 13. Characterization • Compute intensive • Storage intensive • Potentially bandwidth intensive • On-demand • Scalability challenge 13
  • 14. Compute Intensive • About 6GB raw data per hour • Must be continuously processed, ingested, and further processed • User-generated computations • Must not interfere with data retrieval 14
  • 15. Storage Intensive • SEB will accumulate about 60TB of raw data per year • To facilitate researches on long-term effects, we must keep raw data for an extended period of time, e.g., >= 5 years • VT currently does not have an affordable storage facility to hold this much data • Within XSEDE, only TACC’s Ranch can allocate this much long-term storage 15
  • 16. (Potentially) Bandwidth Intensive • What if hundreds of researchers around the world each tried to download hundreds of TB of our data? 16
  • 17. On Demand • Explorative and multidisciplinary researches cannot predict the data usage a priori 17
  • 18. Scalability • How to deal with these challenges in a scalable manner? 18
  • 19. Big Data + Cloud • Affordable • Elastic • Scalable 19
  • 20. A Data Reuse Scenario • Validate cross-disciplinary research hypothesis, e.g., find a novel vibration pattern against all SEB data • Lower the reuse barrier: must not require users to invest heavily on infrastructure before the initial analysis • No need to move data around • Can perform user-specified initial data filtering and analysis • Initial analysis results may be used to enrich the metadata and facilitate further discovery • Cost sharing 20
  • 21. Thank You! • Questions? Comments? • zhiwuxie@vt.edu 21

Notas del editor

  1. Today I will present a data repository project that Virginia Tech Libraries is currently involved in that may alleviate some of the reuse pain. We are developing a technology prototype that potentially targets a slight different niche from the traditional comfort zone of the library development. Nonetheless, the work is closely related to what the academic and research library has been doing for a long time, which is, archiving and making available research outcomes. The library community certainly has a lot to contribute, but to do it well, we may need to pick up a few new tricks, for example, the cloud computing.
  2. The presentation will cover these grounds: First I’ll give you a general idea of data and big data, and in what capacity the libraries have been involved so far. I’ll then describe the new niche that our development is trying to explore. This is done by describing the Virginia Tech Signature Engineering Building project, or the SEB project. However, this is not a project briefing. Instead I believe the problem we are trying to tackle is a general one and can potentially become a growth point for the libraries. I therefore will extract the key requirements of the SEB project and generalize them as the characteristics of this new niche. I will then describe how this new niche is different from the traditional repository work we are already familiar with, and then how can the cloud computing help. I will not cover much implementation details. Instead, this is more of a high-level, conceptual overview. My co-author Collin Brittle has presented some of the implementation details at this year’s Open Repository in Helsinki, Finland. His presentation video is online in case you are interested.