The Ontario library research cloud

•Descargar como PPTX, PDF•

1 recomendación•796 vistas

University of Toronto Libraries - Information Technology Services

Steve Marks. PASIG — Preservation and Archiving Special Interest Group 2015 Meeting. https://libraries.ucsd.edu/chronopolis/pasig/agenda_2/index_agenda.html

The Ontario Library Research
Cloud
Steve Marks
Scholars Portal / Ontario Council of University Libraries
(University of Toronto)

What is Scholars Portal
 Shared technology service of the 21 university libraries of
the Ontario Council of University Libraries founded in
2003
 Provides content aggregation and preservation services for
member libraries
 Journals – 16,000 and 38 M articles
 Books – 610,000 ebooks
 GeoPortal – GIS Data
 ODESI – Numeric Data
 Dataverse – Research Data

What do libraries have
to do with clouds?

Toronto
Ryerson
York
Queens
Ottawa/Carleton
McMaster
Waterloo / Laurier
Guelph
Windsor
Technology
OpenStack Swift
1.4 PB (4.6 PB raw)
3x replication
Geographically distributed
storage nodes
(5-6 locations initially)
Private network
Content
Digital Library resources
Archival resources
Research data
Ontario Digital Library Research Cloud (ODLRC)
Project Details
3-year project
MTCU-PIF Funding
10 partners
UTL as lead Goals
Lower cost
Highly scaleable
Replicated
Open technologies and standards
Integrated
Hosted in Canada
Secure
orion
OLRC

MTCU PIF Proposal
 Nine partner libraries from OCUL; three year project
 University of Toronto as financial lead
 Develop a 1.2PB object storage service for partners
 Provide subscription storage services to other OCUL
libraries
 Develop interfaces with library repository applications
 Create a compute cluster to support text analysis of
content in the cloud

Storage RFP
• Storage hardware RFP issued Dec. 20, 2013
• High density disk storage servers (DSS)
• Evaluation and analysis through early March
• Awarded to Dell: 2nd week of March
• All equipment delivered by 31st March, 2014

Data Storage Server
• PowerEdge R720xd
• MD1200 disk drawers
• Each drawer contains
48TB (12 x 4TB NL-
SAS drives)
• DSS capacity: 48TB to
432TB:

Infrastructure purchases
• 19 R720xds, 77 MD1200s: 4.6PB raw: 18 server racks
• 26 UPSs, PDUs, 2 x video consoles
• 15 10Gbit network switches + fiber optics
• 4 R620 servers: OpenStack proxy / authentication
• 5 R720xd servers: compute/data processing

GTAnet Pilot
Purpose of the pilot:
To understand how to design and implement an effective
network topology to support the operation of the OLRC
Storage Cloud

GTAnet Pilot
Execution of the pilot:
Model and record the network traffic generated between
four OpenStack Swift storage nodes during routine
operation and under various simulated disaster
scenarios

Swift Node Considerations
 How much bandwidth can they provide?
 Will they enable jumbo frames?
 Will they extend VLANs across their
network?
 How low are their OTO & ongoing costs?
 Do they have an ORION POP on site?

0
50
100
150
200
250
300
1 Gb/s 2 Gb/s 3 Gb/s 4 Gb/s 5 Gb/s
Rebalance
Time
(Hours)
Link Speed
200 TB data (600 TB RAW) across 5 Zones
Drive (2TB)
Drawer (24 TB)
RAID Card (48 TB)
Zone (120TB)

Implementation!
 Because what the heck are we going to use all this
storage for?
 Or maybe more to the point, how?

https://github.com/HackODLRC/Simple-Swift-Sword-Server

https://spotdocs.scholarsportal.info/display/ODLRC/Cloudfuse

 https://github.com/HackODLRC/docker-wordpress

Status
 Beta!
 Develop end-user tools
 Repository integration
 Compute cluster and text mining

Acknowledgements
 Our Partner Libraries
 GTAnet – Doug Carson, Lloyd Kwong, Kevin Wong
 ORION – Andy Lam, Mark Grant
 OLRC Admin & Tech Committees
 SP/UTL Systems teams – Steve Baroti, Chris Crebolder,
Miki Wong, Harpinder Singh, Bikram Singh

Interested in Learning More or
Getting Involved?
cloud@scholarsportal.info
https://spotdocs.scholarsportal.info/display/ODLRC

Más contenido relacionado

La actualidad más candente

Ariadne: Interoperabilityariadnenetwork

Data management and the online e-depot for Dutch Archaeology at DANSariadnenetwork

The LoCloud lightweight digital library and alternative content sources, Adam...locloud

Session 3: Vocabulary enrichment, Gerda Kochlocloud

Open Access of Research Data - The Present and Future Situation in Germanyariadnenetwork

Ariadne: Lifecyclesariadnenetwork

What is an archaeological research infrastructure and why do we need it? Aims...ariadnenetwork

Ariadne overviewariadnenetwork

Data workshop preso Doug Moncur

Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet

WikidataAnja Jentzsch

LoCloud: Local Cultural Heritage Online and in the Cloudlocloud

Linked DataAnja Jentzsch

17. kb.nederlab.20150324ingeangevaare

ORDS, research data networkJisc RDM

WG5: A data wrangling experimentWARCnet

A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...ariadnenetwork

Ariadne: Data Sharingariadnenetwork

Rebecca Grant - DRI Training Series: 1. Organising Your Collection dri_ireland

Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez

La actualidad más candente (20)

Ariadne: Interoperability

Data management and the online e-depot for Dutch Archaeology at DANS

The LoCloud lightweight digital library and alternative content sources, Adam...

Session 3: Vocabulary enrichment, Gerda Koch

Open Access of Research Data - The Present and Future Situation in Germany

Ariadne: Lifecycles

What is an archaeological research infrastructure and why do we need it? Aims...

Ariadne overview

Data workshop preso

Wednesday 6 May: Hand me the data! What you should know as a humanities resea...

Wikidata

LoCloud: Local Cultural Heritage Online and in the Cloud

Linked Data

17. kb.nederlab.20150324

ORDS, research data network

WG5: A data wrangling experiment

A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...

Ariadne: Data Sharing

Rebecca Grant - DRI Training Series: 1. Organising Your Collection

Maximising (Re)Usability of Library metadata using Linked Data

Destacado

Nariani OLA2010 ebooksRajiv Nariani

'Scholars Portal: What's Now, What's Next' by Steve MarksEDINA, University of Edinburgh

'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky ReichEDINA, University of Edinburgh

eResources, tools and search techniqueslerichard

eResources for Ontario UniversitiesJacqueline Whyte Appleby

The JTHES as Part of the Intelligence Layer for the Sustainability Collection...Access Innovations, Inc.

ReEnvisioning E-Resource Holdings ManagementUniversity of Toronto Libraries - Information Technology Services

Where Do We Put It All? Lessons Learned Housing Large Geospatial Data Collect...nacis_slides

Destacado (8)

Nariani OLA2010 ebooks

'Scholars Portal: What's Now, What's Next' by Steve Marks

'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich

eResources, tools and search techniques

eResources for Ontario Universities

The JTHES as Part of the Intelligence Layer for the Sustainability Collection...

ReEnvisioning E-Resource Holdings Management

Where Do We Put It All? Lessons Learned Housing Large Geospatial Data Collect...

Similar a The Ontario library research cloud

From Fixed-Function to Programmable Switching Chip for Network Packet Broker ...Junho Suh

Ceph used in Cancer Research at OICRCeph Community

OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com

Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.

Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media

TransPAC3/ACE Measurement & PerfSONAR UpdateInternational Networking at Indiana University

Globus: Enabling the Open Storage NetworkGlobus

CERN IT Monitoring Tim Bell

Ibm power sales bootcampsolarisyougood

A Scenario-Based Review Of IPv6 Transition ToolsTye Rausch

Why Software Defined Storage is Critical for Your IT Strategyandreas kuncoro

PLNOG 17 - Nicolai van der Smagt - Building and connecting the eBay Classifie...PROIDEA

IOT and System Platform From Concepts to CodeAndy Robinson

AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services

Rpl2018Pascal Thubert

Practical 7 - Using Wireshark Tutorial and Hands-onQaisSaifQassim

Archiving data from Durham to RAL using the File Transfer Service (FTS)Jisc

Update on IPv6 activity in CERNET2APNIC

Cisco CCNA Data Center Networking FundamentalsE.S.G. JR. Consulting, Inc.

Similar a The Ontario library research cloud (20)

From Fixed-Function to Programmable Switching Chip for Network Packet Broker ...

Ceph used in Cancer Research at OICR

OpenStack Toronto Q3 MeetUp - September 28th 2017

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility

Grid optical network service architecture for data intensive applications

Louise McCluskey, Kx Engineer at Kx Systems

TransPAC3/ACE Measurement & PerfSONAR Update

Globus: Enabling the Open Storage Network

CERN IT Monitoring

Ibm power sales bootcamp

A Scenario-Based Review Of IPv6 Transition Tools

Why Software Defined Storage is Critical for Your IT Strategy

PLNOG 17 - Nicolai van der Smagt - Building and connecting the eBay Classifie...

IOT and System Platform From Concepts to Code

AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...

Rpl2018

Practical 7 - Using Wireshark Tutorial and Hands-on

Archiving data from Durham to RAL using the File Transfer Service (FTS)

Update on IPv6 activity in CERNET2

Cisco CCNA Data Center Networking Fundamentals

Más de University of Toronto Libraries - Information Technology Services

Measure twice, cut once: Taking the time for user research in your redesign (...University of Toronto Libraries - Information Technology Services

Digital preservation policy for humansUniversity of Toronto Libraries - Information Technology Services

Islandora and Omeka: Building U of T Digital Collections & ExhibitsUniversity of Toronto Libraries - Information Technology Services

Adding e-resources license information to library systems: three libraries’ a...University of Toronto Libraries - Information Technology Services

Anatomy of a Drupal Hack - TechKnowFile 2014University of Toronto Libraries - Information Technology Services

Collections UofT - TRY 2014University of Toronto Libraries - Information Technology Services

Opportunities and Challenges Using Open Source Software in Academic Libraries...University of Toronto Libraries - Information Technology Services

Accessibility Information Toolkit for Libraries - TRY 2014University of Toronto Libraries - Information Technology Services

Sustaining Continuous Digital Project Development with Team Project Managemen...University of Toronto Libraries - Information Technology Services

Facing our E-Demons: Challenges of E-Serial Management in a Large Academic Li...University of Toronto Libraries - Information Technology Services

Communicating Changes in Digital Services University of Toronto Libraries - Information Technology Services

Why schema.org?University of Toronto Libraries - Information Technology Services

Library Linked Data and the Future of Bibliographic ControlUniversity of Toronto Libraries - Information Technology Services

Introduction to the Semantic WebUniversity of Toronto Libraries - Information Technology Services

Brave New eWorld: Struggles and SolutionsUniversity of Toronto Libraries - Information Technology Services

Responsive Web Design at University of Toronto LibrariesUniversity of Toronto Libraries - Information Technology Services

My Media at University of Toronto LibrariesUniversity of Toronto Libraries - Information Technology Services

Digital Signage at University of Toronto LibrariesUniversity of Toronto Libraries - Information Technology Services

Responsive Design For The Mobile WebUniversity of Toronto Libraries - Information Technology Services

Towards Better IT CommunicationsUniversity of Toronto Libraries - Information Technology Services

Más de University of Toronto Libraries - Information Technology Services (20)

Measure twice, cut once: Taking the time for user research in your redesign (...

Digital preservation policy for humans

Islandora and Omeka: Building U of T Digital Collections & Exhibits

Adding e-resources license information to library systems: three libraries’ a...

Anatomy of a Drupal Hack - TechKnowFile 2014

Collections UofT - TRY 2014

Opportunities and Challenges Using Open Source Software in Academic Libraries...

Accessibility Information Toolkit for Libraries - TRY 2014

Sustaining Continuous Digital Project Development with Team Project Managemen...

Facing our E-Demons: Challenges of E-Serial Management in a Large Academic Li...

Communicating Changes in Digital Services

Why schema.org?

Library Linked Data and the Future of Bibliographic Control

Introduction to the Semantic Web

Brave New eWorld: Struggles and Solutions

Responsive Web Design at University of Toronto Libraries

My Media at University of Toronto Libraries

Digital Signage at University of Toronto Libraries

Responsive Design For The Mobile Web

Towards Better IT Communications

The Ontario library research cloud

1. The Ontario Library Research Cloud Steve Marks Scholars Portal / Ontario Council of University Libraries (University of Toronto)

2. What is Scholars Portal  Shared technology service of the 21 university libraries of the Ontario Council of University Libraries founded in 2003  Provides content aggregation and preservation services for member libraries  Journals – 16,000 and 38 M articles  Books – 610,000 ebooks  GeoPortal – GIS Data  ODESI – Numeric Data  Dataverse – Research Data

3. What do libraries have to do with clouds?

4. Toronto Ryerson York Queens Ottawa/Carleton McMaster Waterloo / Laurier Guelph Windsor Technology OpenStack Swift 1.4 PB (4.6 PB raw) 3x replication Geographically distributed storage nodes (5-6 locations initially) Private network Content Digital Library resources Archival resources Research data Ontario Digital Library Research Cloud (ODLRC) Project Details 3-year project MTCU-PIF Funding 10 partners UTL as lead Goals Lower cost Highly scaleable Replicated Open technologies and standards Integrated Hosted in Canada Secure orion OLRC

5. If it’s worth doing here….

6. Reassessing Storage Strategies

7. Why not go with Amazon/Rackspace/etc.?

8. Because we secretly hate you.

9. MTCU PIF Proposal  Nine partner libraries from OCUL; three year project  University of Toronto as financial lead  Develop a 1.2PB object storage service for partners  Provide subscription storage services to other OCUL libraries  Develop interfaces with library repository applications  Create a compute cluster to support text analysis of content in the cloud

10. Storage RFP • Storage hardware RFP issued Dec. 20, 2013 • High density disk storage servers (DSS) • Evaluation and analysis through early March • Awarded to Dell: 2nd week of March • All equipment delivered by 31st March, 2014

11. Data Storage Server • PowerEdge R720xd • MD1200 disk drawers • Each drawer contains 48TB (12 x 4TB NL- SAS drives) • DSS capacity: 48TB to 432TB:

12.

13. Infrastructure purchases • 19 R720xds, 77 MD1200s: 4.6PB raw: 18 server racks • 26 UPSs, PDUs, 2 x video consoles • 15 10Gbit network switches + fiber optics • 4 R620 servers: OpenStack proxy / authentication • 5 R720xd servers: compute/data processing

14. GTAnet Pilot Purpose of the pilot: To understand how to design and implement an effective network topology to support the operation of the OLRC Storage Cloud

15. GTAnet Pilot Execution of the pilot: Model and record the network traffic generated between four OpenStack Swift storage nodes during routine operation and under various simulated disaster scenarios

16. GTAnet Pilot Network Diagram

17.

18. Swift Node Considerations  How much bandwidth can they provide?  Will they enable jumbo frames?  Will they extend VLANs across their network?  How low are their OTO & ongoing costs?  Do they have an ORION POP on site?

19. OLRC Network Diagram

20. 0 50 100 150 200 250 300 1 Gb/s 2 Gb/s 3 Gb/s 4 Gb/s 5 Gb/s Rebalance Time (Hours) Link Speed 200 TB data (600 TB RAW) across 5 Zones Drive (2TB) Drawer (24 TB) RAID Card (48 TB) Zone (120TB)

21. Implementation!  Because what the heck are we going to use all this storage for?  Or maybe more to the point, how?

22. Swift Browser

23. ODLRC Hackfest – June 20, 2014

24. https://github.com/HackODLRC/Simple-Swift-Sword-Server

25. https://spotdocs.scholarsportal.info/display/ODLRC/Cloudfuse

26.  https://github.com/HackODLRC/docker-wordpress

27. https://github.com/HackODLRC/SwiftFS

28.

29.

30. Status  Beta!  Develop end-user tools  Repository integration  Compute cluster and text mining

31. Acknowledgements  Our Partner Libraries  GTAnet – Doug Carson, Lloyd Kwong, Kevin Wong  ORION – Andy Lam, Mark Grant  OLRC Admin & Tech Committees  SP/UTL Systems teams – Steve Baroti, Chris Crebolder, Miki Wong, Harpinder Singh, Bikram Singh

32. Interested in Learning More or Getting Involved? cloud@scholarsportal.info https://spotdocs.scholarsportal.info/display/ODLRC

Notas del editor

The slide that will live in infamy.
logos
Develop a 1.2PB object storage service for partners The majority of the monies awarded was for hardware
The first step was to buy the hardware. Under the rules of the grant, money had to be spent by March 31st.
Server has 2 H800 RAID controllers Up to 4 MD 1200 directly attached to EACH controller in a redundant fashion (total of 8 per server) RAID 0
GTAnet is the Regional R&E network for the Greater Toronto Area
3 Partner Libraries (Ryerson, York, UofT) are connected via GTAnet. We consulted with the folks at GTAnet and they said they’d be happy to help setup a network for our pilot. They were connected Use VLAN technology over shared 10GB connection physically limited to 1GB running over GTAnet routers extended through backbones at Ryerson, York and Toronto - Each site is assigned a private /24- VLANs are configured over existing GTAnet connectivity- routing is performed at the GTAnet router- RyersonU, UofT and YorkU extend VLANs through their backbones
IMPORTANT NOTE: On the graphs, ‘Inbound’ and ‘Outbound’ are reversed from what you would expect Round 1 - Three storage nodes were set up, one each at UofT, Ryerson, and York. Additionally, a proxy node (which controls routing of traffic to the storage nodes) was set up at UofT. Once operational, baseline network traffic between the nodes was negligible (5-10Mb/s). Round 2 - 35TB of data was loaded into the cluster. At 1Gb network speeds, replication and distribution of the data across the cluster took a couple of weeks (in actual time, it would be 9.72 days). Network utilization for the storage nodes averaged 350 Mb/s, but the proxy node's 1Gb network connection was fully utilized the entire time. Round 3 - Once loading of the 35TB was completed, 12 drives were added to each storage node, which led to a rebalancing of the content within the cluster. Network utilization for the storage nodes hovered between 500-850 Mb/s, while the proxy node did not come under significant load. Attempts to load additional content at this time led to complete saturation of the 1 Gb network connections between the storage nodes. Round 4 - This test added an additional storage node at the UofT site, which led to another rebalancing of the loaded content. It took approximately 2.5 days (58 hours) to import ~26.25 TB of content to the new node, during which time, the 1Gb connection to the new node was fully saturated. Network utilization of the other nodes was around 350-500 Mb/s. It is worth noting that these scenarios represent extreme changes to the storage network. Outside of initial setup conditions, upgrades and other planned events will occur in a much smoother fashion, which should ameliorate the need for such high utilization. That said, they may be fairly representative of possible outage scenarios, in which the network might lose a node and have it reappear. Typical use cases would be if one hard drive dies. Each hard drive is 4TB in size and over a 1Gb pipe, it would take 8.88 hours
ORION, GTAnet, and SP staff came up with this: - 1 R720xd and 4 drawers as well as a switch will be installed at each site- ORION will provision a VPLS network to link the nodes sites would pass the ORION VPLS VLAN through their networks to the OLRC switch Under this architecture, each node can talk directly to each other without having to hit the GTAnet router.
I want to end with this chart that represents the amount of time it would take to rebalance our cluster (and bring it to 100% health) under our current configuration (which is 1 server with 4 drawers) if there was 200TB of data on the cluster. On the right are “failure zones” We anticipate that the 2 most common occurrences will be drives failing and Zones failing (because this could happen if network connectivity was lost to that zone, and on a WAN, the odds of this happening increase). Drive | 4.444444444 | 2.222222222 | 1.481481481 | 1.111111111 | 0.888888889 Drawer | 53.33333333 | 26.66666667 | 17.77777778 | 13.33333333 | 10.66666667 RAID Card | 213.3333333 | 106.6666667 | 71.11111111 | 26.66666667 | 21.33333333 Zone Hours | 266.6666667 | 133.3333333 | 88.88888889 | 66.66666667 | 53.33333333 Zone DAYS | 11.11111111 | 5.555555556 | 3.703703704 | 2.777777778 | 2.222222222

The Ontario library research cloud

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a The Ontario library research cloud

Similar a The Ontario library research cloud (20)

Más de University of Toronto Libraries - Information Technology Services

Más de University of Toronto Libraries - Information Technology Services (20)

The Ontario library research cloud

Notas del editor