1. From tape to cloud storage
4/19/2012
Steve Meier
https://cloud.sdsc.edu
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
2. Agenda
• Where we were…
• Tape archive overview
• Where SDSC is at today
• Current Data Services
• Swift architecture overview
• Access methods
• Cloud Explorer Web interface
• UCSD Libraries Collections
• Others (Cyberduck, Command Line, s3backer)
• Future Plans
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
3. SAMQFS ARCHITECTURE
MetaData
Oracle (RMAN) Servers
NFS Server/
MDS1 NFS Backups/
Commvault
Data Login web
MDS2 (GridFTP,SFTP)
Force 10 -
Juniper
12000
T640
16 STK 9940B FC SAM-QFS
32 IBM 3592 FC Tape 1.2PB SAN
Drives
6 STK 9310 Silos 32PB
Disk Cache
Capacity
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
4. SAMQFS ARCHITECTURE CONT’D…
LCU 20-J2A 12-J2A
ow
4-9 9940
nd
12
Wi
LC
84 B
-
LC
0C
U
U
4-3590E
12-J2A
LSM_2 Passthru
Channel
LSM_4 Passthru
Channel
LSM_5
Panel 1
Pa Pa Pa
Ch ssthr hru l Ch ssthr hru l Ch ssthr
sst an u sst
an u
ne Pa anne ne
l Pa anne an u
ne
l h l
Ch C
LC
U
Passthru
Channel
Passthru
Channel
LSM_0 LSM_1 LSM_3
Panel 9
12 9840
8-
-99 C
2A
40
LC
-J
Panel 0
B
U
20
DESCRIPTION
LC
0 E 0 E U 0 E
59 59 59
4-3 12-9940B 4-3 4-3 1.) one existing 20 drive panel on LSM4's panel 10 will hold 20 J2A
2.) one existing 20 drive panel on LSM5's panel 10 will hold 12 J2A
3.) will install a 20 drive panel on LSM5's panel 1 to hold 12 J2A
4.) will install a 20 drive panel on LSM3's panel 9 to hold 20 J2A
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
5. …TAKE AWAY
• Complex Environment
• Many dependencies (SAN, Metadata, Tape Drives, Silo)
• Aging Infrastructure
• Puts pressure on all the dependencies
• Tech refresh way over due
• Archival data is difficult to access
- high latency, lower bandwidth, user interfaces
• Difficult to share archival data to multiple users
• All too often archived data, particularly HPC simulations, is “write-once-
read-never”
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
6. Where SDSC is at today
Data Services Overview
Cloud Storage (OpenStack Swift)
• Purpose: Storage of Digital Data for Ubiquitous Access and High-Durability
• Access Mechanisms: Swift/S3 API, Cloud Explorer, Clients, CLI
Traditional File Server Storage (NFS/CIFS)
• Purpose: Typical Project / User Storage Needs
• Access: NFS/CIFS/iSCSI
High Performance Computing Storage (PFS)
• Purpose: High Performance Transient Storage to Support HPC
• Access Mechanisms: Lustre on HPC Systems (Gordon, Trestles, Triton)
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
7. Goals for Cloud/Object Storage
• Support NSF Data Management Plan
• Required Plan to describe how research results are shared.
• 99.5% system availability
• File replication automated
• Default 2 copies, able to keep additional offsite replications.
• Automated checksum verification and error correction
• Scalable
• Performance and capacity grows by incremental bricks.
• Multifaceted accessibility
• Web, API, Graphical and Command Line Clients
• Cost competitive
• Operated as a recharge service
• On par with current tape-based dual-copy costs of $0.0325/GB/Mo.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
8. Why Openstack?
Industry Standard Proven Software
• More than 100 leading • Running the OpenStack cloud
companies from over a dozen operating system is same
countries are participating in software that powers many
OpenStack, including large public and private
Cisco, Citrix, Dell, Intel and clouds, including RackSpace
Microsoft Cloud Storage.
Control & Flexibility
Highly Compatible
Open source platform means not
• Compatibility w/ public OpenStack locked to a proprietary
clouds means it’s easy to migrate vendor, and modular design can
data and apps to public clouds integrate with legacy or 3rd-party
when desired—based on security technologies. OpenStack project
policies, economics, and other provided under Apache 2.0
key business criteria license.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
9. Design Highlights
100% Dual Copy Disk Storage
Initial 5.5PB (petabytes)
Dual 10Gb Arista Connected, 8 GB aggregate I/O
performance
Off-Site Replication (UC Berkeley)
Continuous File Integrity Verification
Help PI’s meet NSF Data Management requirements
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
10. OpenStack Cloud Storage Architecture
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
11. Current Usage
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
12. Usage Breakdown
Application Integration Native clients
File system Emulation Backups
(s3backer, panzura, whitewater)
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
13. Access methods
Integrated
• UCSD Library Collection management
Client tools
• SDSC Cloud Explorer
• swift python client
• Cyberduck
• s3backer
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
14. UCSD Library Collections
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
15. SDSC Swift Web Client (Cloud Explorer)
Features:
Uploads/Downloads/Rename/Mo
ve
Permissions management
Change Password
Display Container Share URL
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
16. Others (Command Line, GUI, Filesystem)
Swift Python Client
•Batch processing
•Large file upload support
•Lacking in features and error logging/recovery
Cyberduck
•drag and drop GUI for Mac’s and Windows
•No large file upload support
s3Backer
•Compatible with existing tools (eg. rsync, SFTP)
•File system
•Familiar
•File Sharing Challenges
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
17. Upcoming Features
Active Directory
Authentication Integration
Large file upload support
(Cloud Explorer)
Server Side Encryption for
at rest data.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
18. Questions?
Email SERVICES@SDSC.EDU for more info!
http://www.sdsc.edu
http://rci.ucsd.edu
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Cyberinfrastructure Services Division
Notas del editor
Organized research unit on the campus of UC San Diego and funded by the National Science Foundation.
Describe environment, Complex, Latency due to tape loads, small files affect performance
Silo overview, pass-through ports
High latency tape access and lower bandwidth users wait longer for there data. Many times one user can monopolize the staging queueBecause it’s difficult to share and access, most data tends to be write-once, ready never. Fine for things like backup, but not acceptable for valuable data sets
Where we’re at
New January 2011, NSF data plan is required for all new grant proposals
Talk about basic architecture, scale out by adding more components (proxies, storage bricks) Love the shared nothing architecture. then changes we made to improveDoug Merging auth, Load balancing… Arista switched data oasis…Currently running Centos 5 with a plan to update to Centos 6.2 within the next couple of months.
Launched Service in August 2011, Modest usage so far. Commvault backup service has yet to come online
Backups consumes the most space and This is consistent with our tape archive usage as wellFollowed by File system emulation. Both Backups and Filesystem are well understood and have familiar applicationsNative Client usage is a small percentage of use (swift python client, cyberduck, cloud explorer). There is no robust reliable (free) client available to our knowledge. We would like to see this get addressed. Speak to the challenges of each tool.We may be a little different in where we want to use swift as the generic storage service vs building an application.
Ron
Search, indexing,etc is handled by the web applicationDigital assets, text and other associated data is stored in swift
Features: Upload, Download, Permissions, dragNdrop, change passwordNot considered a large data movement tool. Largely used for viewing objects, setting permissions, viewing share URL’sNeed to support large file uploads
Swift: Nice to add copy, move, rename. In general, we’d like to see a robust command line client. We are interested in starting this effort if one doesn’t exist already and talking to others who have similar needsNearly all of our HPC users rely on command line based tools. Cyberduck is ok for light usage. Our hopes is that once cloud explorer supports drag and drop uploads that this would obsolete the tool and focus on the web and command line tool development and support.
And with that I’ll take questions
Ron
Ron PI pertinentEverything you put in is immediately shareable to groups of users, the world or no one if you want This form of storage is fully integrated with your applications and web services.Near real time collaboration from anywhere on the globe that you have web access.