HPCNow! outlines their dynamic provisioning of Hybrid nodes, used primarily for HPC. OpenNebula is a fundamental component, offering the desired flexibility and ease.
3. Quick introduction to HPCNow!
● Global HPC consulting company
● IT + scientific background
● HPC services and solutions
● User-oriented company
● Hardware agnostic
Company overview
7. User environment
User libraries, Modules,
EasyBuild, Spack
Development tools
Compilers: GNU, Intel, PGI, IBM
XL compilers; Debuggers and
profilers: V-Tune, DDT, GDB
Scientific and engineering applications
More than 100 references. Contact us to
know more.
Company overview
13. What is High Performance Computing?
Many tasks and/or threads working together to
solve different parts of a single larger problem.
This is achieved with parallel programming, which
usually requires large shared memory systems or
low latency and high bandwidth network.
Motivation
14. HPC users need more than just compute solution
❅ Workflow: Pre-processing and post-processing, workflow frameworks,...
❅ Web services: RStudio, Galaxy, Jupyter notebook, JMS,...
❅ Software managers: Anaconda, EasyBuild, Spack,...
❅ Prebuilt software: Docker, Singularity, VM image (NeuroDebian,..),...
Motivation
15. Convergence Solution
HPC Cluster, Singularity, Docker Swarm, OpenNebula
Allows to dynamically re-architect / re-purpose
the HPC solution to accommodate different roles /
user needs.
Motivation
20. Global configuration
● OpenNebula v5.6.0
● Ceph v13.2.1 mimic
● Datastore
○ standard ceph configuration
■ cephds type Image
■ ceph_system type System
● Nodes with kvm hypervisor
● NIC’s with virtio model
Architecture
22. Stumbling blocks along the way
● Snapshots
○ datastore for images configured as raw
■ recommended for ceph using RBD
○ images stored as raw, even created as qcow2
○ snapshot of system disk, and recovering from ceph
■ rbd ls -l -p one
● Bridge destroyed when no virtual NIC linked
○ switch keep_empty_bridge to true in
/var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf
■ bug preventing to transfer config to hypervisors at
/var/tmp/one/etc/vnm/OpenNebulaNetwork.conf
○ create virtual network with PHYDEV unset
one-2-103-0
one-2-103-0@0
one-2-104-0
Implementation
23. Stumbling blocks along the way
● VM could not communicate with each other
○ switch net.bridge.bridge-nf-call-iptables parameter to 0.
○ tried to do it persistent in /etc/sysctl..d/bridge-nf-call.conf and
/usr/lib/sysctl.d/00-system.conf
■ bug prevents for working, when sysctl runs the bridge kernel
module is not already loaded.
○ fixed by modifying /usr/lib/systemd/system/libvirtd.service
Type=notify
EnvironmentFile=-/etc/sysconfig/libvirtd
ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS
+ExecStartPost=/usr/bin/sleep 30s
+ExecStartPost=/usr/sbin/sysctl -w net.bridge.bridge-nf-call-iptables=0
+ExecStartPost=/usr/sbin/sysctl -p
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
Implementation
24. Stumbling blocks along the way
● VM creation from Sunstone ended with FAILED status
○ error: Cannot check QEMU binary /usr/bin/qemu-system-x86_64: No such file or directory
■ ln -s /usr/libexec/qemu-kvm /usr/bin/qemu-system-x86_64
Implementation
26. Conclusions
● We architected and implemented a solution
deploying nodes with hybrid role.
● This solution allows dynamically re-purpose the
cluster to accommodate the user needs.
● OpenNebula has been found to be a really easy
tool to install, deploy and manage.
● Useful tips and collaboration in the forum to
troubleshoot issues.
Conclusions