Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoekers
1. Using Servers for Fast Data Transfers
Mary Hester
Relatiemanager Onderzoek
Netwerkdag 2017
14 December 2017
2. Using Servers for Fast Data Transfers
http://www.spiegel.de/wissenschaft/technik/niederlaender-wollen-radwege-mit-geothermie-beheizen-a-862937.html
3. • To researchers, getting
access to, and/or
transferring data is hard.
• For example:
• To a supercomputing center,
• To a local cluster,
• To collaborators, etc.
What is the problem?
3
4. Why this can happen…
4
Fasterdata: http://fasterdata.es.net/network-tuning/tcp-issues-explained/packet-loss/
5. • What can we do?
• Provide infrastructure to make this possible
• Lossless networks
• No firewalls; but still secure environments
• Servers that act as gateways for the data transfers
• Provide education to support the use of this infrastructure
• To ICT departments
• Research groups/departments as needed
What needs to happen?
5
6. • Dedicated servers for
transferring data
• a.k.a. “data transfers nodes”
• Decouples LAN issues from WAN
• Enables faster transfers
• Part of a higher-level concept
called a Science DMZ
• End users do not log into the infrastructure directly
• Should be a seamless part of the infrastructure that improves
performance for end users
One possible solution: DTNs and Science DMZs
6
10G
Virtual Circuit
Nx10GE
10GE
10GE
10GE
10GE
10G
Routed
Border Router
WAN
Science DMZ
Switch/Router
Enterprise Border
Router/Firewall
Site / Campus
LAN
High performance
Data Transfer Node
with high-speed storage
Site/Campus
Virtual Circuits
Per-service
security policy
control points
Clean,
High-bandwidth
path to/from
WAN
Dedicated
path for virtual
circuit traffic
Site / Campus
access to Science
DMZ resources
perfSONAR
perfSONAR
http://fasterdata.es.net/science-dmz/
7. • “High performing” servers
• Host tuning
• Fast storage
• High performing speeds are relative—100G, 40G, 10G or multiple 1G
• Lossless networks/connections are really important
• Security policies that do not deter data transfers
• ACLs
• Host-based firewalls
• Limited ports used for applications (i.e., no web/email)
One possible solution…continued
7http://fasterdata.es.net/home/requirements-and-expectations/
8. Relative comparison for data transfers
8
Campus internet
(+1000 Mbps)
Home internet
(100 Mbps)
High Performance
(+10.000 Gbps)
40 GB
400 GB
4 TB
1 minute 1 hour 1
day
1TB
10 TB
100 TB
700 MB
7 GB
700 GB
9. • People have been building networks like this for years
• HPC centers
• HEP facilities…
• JISC
• Jasmin Compute has Science DMZ architecture
• Protocols
• Spanish R&E community investigating performance of data transfer
protocols (i.e., like ASPERA)
• NII is working with MMCFTP
• HEP/CERN looking into other solutions outside of Globus toolkit
(gridFTP-based service)
Other work in Europe
9
11. UMC Research LAN Pilot
A common, virtual and trusted research
infrastructure
for University Medical Centers
Paul van Dijk, SURF
12. The challenge
• 8 UMCs in NL
• Researchers dealing with huge data sets
• Omics data – full genome e.g. 75GB/pp
• Imaging data
• Collaboration is key!
• How to deal with growing demands for
data transfers and compute scale out
• How can Science DMZ concepts help?
13. The challenge
Can we create one virtual pool of
resources?
• How to share data and resources in
a safe and high performance
way?
• Requirements and perspectives?
• Researchers
• Resource owners
• (Research) IT staff
• Security officer
14. What is needed?
1. Facilities and approaches that help to
establish sufficient trust so UMCs are
willing to open up internal resources to
each other
2. High performance configurations and
solutions
15. From 1 to 2 network zones
General purpose
zone
Research Data Zone
Many
Small
files
Very
Large
files
Borrowing concepts from
“Science DMZ”
17. UMC Research Data Zones – Interconnected
• Multi point VPN with L3VPN
• One single MSP port needed
• BGP routing via SURFnet core
routers
• Facilitate:
• data transfers
• compute scale-out
• both in all directions
18. Next Steps
1. Add more partners
2. A common policy
3. Using federated identities
for access control
19. Conclusions
• So far... happy researchers
• Minimize impact for (Research) ICT staff, after initial setup
• General purpose network “off-loaded”
• From 8 UMCs to 1 UMC with 8 locations:
a local national private UMC network
It feels like remote
clusters are
available locally
Fast data
transfer speeds
achieved
21. Ad hoc support not scalable
And many more...
• ad hoc support niet efficient
• Compatibiliteit problematisch
• Inrichten kost teveel tijd voor
onderzoeksproject
Optimization
• Larger packetsize (jumboframes)
• Other networkprotocols (UDP)
• Specialized data transfer
software (GridFTP)
• Access control in stead of firewall
Climatology (UU)
Population Imaging (LUMC)
Bacterial drug resistance
Discovery (TUDelft)
22. Science DMZ concept
• Developed in the US
• Dedicated network zone voor research
data en –services
• Optimized for research data
• Data Transfer Nodes with high throughput
• Standardized solution
• Compatibility
10GE
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZ
Switch/Router
Enterprise Border
Router/Firewall
Site / Campus
LAN
High performance
Data Transfer Node
with high-speed storage
Per-service
security policy
control points
Clean,
High-bandwidth
WAN path
Site / Campus
access to Science
DMZ resources
perfSONAR
perfSONAR
perfSONAR
Fasterdata knowledgebase:
http://fasterdata.es.net/science-dmz/
28. Discussion
• Hoe kunnen we dit op een schaalbare manier uitrollen?
• Minimaliseren kosten en menskracht
• Wat zou de rol van SURF kunnen zijn?
• Kennis en expertise?
• Coordineren van standaardisatie?
• Beheer van DTN’s on site?
• Wat is de rol van instellingen?
• Campusnetwerk?
• Ondersteuning van onderzoekers?