In this video from the DDN User Group Meeting at ISC'13, Dr. Daniel Hanlon from the University College of London presents: Advancing Research at London's Global University.
"As UCL's storage demands grow, the university expects to build a storage foundation that will scale up to 100PB. Looking for a storage solution that was massively scalable yet simple to manage as part of the first phase of the infrastructure build out, UCL will use DDN object storage technology to store up to 600TB of research data. DDN object storage capabilities also will be able to empower UCL researchers to collaborate without having to worry about data reliability, compliance obligations or long-term retention of critical research assets."
Learn more: http://www.ddn.com/press-releases/2013/ucl-selects-ddn-object-storage-for-cloud-infrastructure
Watch the presentation video: http://inside-bigdata.com/video-advancing-research-at-londons-global-university/
2. London’s Global University
Consistently ranked in word’s top 10 Universities
Alumnae include 20 Nobel prize winners
Founded 1826; first to admit without regard to race or
religion, women on equal terms as men
8000 staff, 25% from 84 countries outside UK
~25,000 students (over 1/3 postgrads)
Annual turnover > £900 million
> £15m dependent on University HPC
Highly multi-disciplinary
3. Delivering a Culture of Wisdom
“UCL is London’s research powerhouse, with a commitment to enhancing the
lives of people in the capital, the UK and around the world. Our academics
have breadth and depth of expertise across the entire range of academic
disciplines. Individually, they expand our understanding of the world;
collectively and collaboratively, they deliver analysis that addresses the major
challenges facing humanity” Professor David Price, UCL Vice-Provost
(Research)
• UCL Grand Challenges
– Impact
• UCL Research Frontiers
– Enquiry and ‘curiosity’
4. Underpinning Research
• Research
IT Services
• Established
June 2012
• Support and
enablement
across research
lifecycle
IRIS
HPC
Collaboration
tools
Research DataRPS
UCL Discovery
Research Data
5. Research IT Services
• Investment in people and expertise
– Research software development initiative
– Comprehensive training programme
• Collaboration and Innovation
– e-Infrastructure South
– UK Research Data community
– Vendor partnership
Innovation and Skills
6. UCL’s case for Research Data Initiative
• Many departments and research groups with long
history of excellence
– Lost opportunity for new research building on old
• UCL is a highly multi-disciplinary institution
– Lost opportunity for cross-disciplinary re-use
• Unmanaged datasets are lost datasets
– Increasing burden to researchers
– Backup, Failing USB HDDs, AuthN problems
7. All carrots, no sticks
• Research Data is an offering
• Projects can and will opt out
• Remove the burden of managing storage
• Resiliency is better than backup
• Remove burden of compliance with
Research Council requirements
8. All carrots, no (hardly any) sticks
• Research Data is an offering
• Projects can and will opt out
• Remove the burden of managing storage
• Resiliency is better than backup
• Remove burden of compliance with
Research Council requirements
9. Requirements capture
• Researchers – We want…
– Everything! but more! faster! and shinier!
– NFS, CIFS, scp, GridFTP, Cloud (Dro***x)
• Institution – Solution must have…
– Low admin overhead
– Cheap (cost per TB)
– High density
10. Architecture choices
• Simple
– start with tried and trusted
– challenge the Big Data hysteria
• Strong abstractions
– avoiding lock-in to proprietary technology
• Hedge bets
– Build in migration between storage solutions
• Project-based
11. Separate Live and Archive
• Live
– Mutable
– Address current requirement
– Private
• Archive
– Immutable
– Exploitable
– Transfer of responsibility
13. iRODS
http://www.irods.org
• Metadata store
• Bridge between different areas and storage types
– Conventional storage
– Object storage
– Tape
• Metadata store
– Possibility for enrichment during live phase
15. WOS Access
• NAS presentation
– expose WOS as CIFS or NFS mount points
– HA and backed up in WOS
• Open architecture based on Open Source projects
– easy to build upon
16. GPFS – IBM’s General Parallel FileSystem
• The conventional choice
– POSIX filesystem
– High performance, parallel transfers
• Connect with UCL’s existing HPC resources
– Multi-cluster around UCL
• Many options for exports
– Native GPFS, samba, scp, cNFS
– Non-trivial to manage interactions/locking issues
17. Cloud – “The storage that dare not speak its name”
• Very widely used in academia (unofficially)
– need to satisfy a very real requirement
• Rapidly evolving
– standards needed (not S3!)
• Oxygen Cloud
– Local storage
– Local authN
– Local files can be stubs or fully synchronised
18. Current state of play
• Number of projects
– 11
• Number of users
– 22
• Volume of data
– 35TB
• Continuing to deploy and slowly
expand users
19. Project
Registration
Manual
Process
Smaller initial
deployment but to
scale with phase II
Storage
connector
Cloud
infrastructure
AD/Moonshot
infrastructure
AuthN
connector
Client applications
mirror local files in
to RD live storage
Create
project
group
E-mail
with
cloud
access
details
DDN SFA12K
iCAT
Metadata
store
E-mail
with
campus
access
details
Legion
CS cluster
etc...
GPFS/NFS/
Samba/ssh/
iRODS via PAM
Store high level
project metadata
Mounted access via
GPFS/CIFS/NFS/
WebDAV
sshd
(gridftpd)
Storage resiliency
>10Gb network
Asynchronous
access via
scp, iRODS
Policy driven
replication,
encryption and
backup as required
with project-level
granularity
DDN WOS
20. Challenges
• Authentication
– Multiple access mechanisms with single AuthN
– Cross-University collaboration (Moonshot)
• Networking
– central storage vs local NAS device
– poor connectivity to some departments
• Multiple access mechanisms
– Multiple views…