With this presentation you should be able to create a kerberos secured architecture for a framework of an interactive data analysis and machine learning by using a Jupyter/JupyterHub powered by IPython Clusters that enables the machine learning processing clustering local and/or remote nodes, all of this with a non-root user and as a service.
AWS Community Day CPH - Three problems of Terraform
How to create a secured multi tenancy for clustered ML with JupyterHub
1. How-to create a secured multi
tenancy for Clustered ML with
JupyterHub
Non-root + JupyterHub + Kerberos +
IPython Cluster as a service
2. Introduction
With this presentation you should be able to create a kerberos secured architecture for a
framework of an interactive data analysis and machine learning by using a
Jupyter/JupyterHub powered by IPython Clusters that enables the processing clustering
local and/or remote nodes.
3. Architecture
This architecture enables the following:
● Transparent data-science development
● User Authentication
● Authentication via Kerberos + SSH
● Upgrades on Cluster won’t affect the developments.
● Controlled access to the data and resources by Kerberos Tickets.
● Several coding API’s (Scala, R, Python, PySpark, etc…).
● Parallel Processing
● JupyterHub as service and non-root user
5. Pre-Assumptions
1. Jupyter Machine hostname: cm1.localdomain
2. Controller Node hostname: cm1.localdomain Engine Node hostname: cm2.localdomain
3. Conda Python version: 3.8.5
4. Jupyter Machine Authentication Pre-Installed: Kerberos
a. Kerberos Realm DOMAIN.COM
5. JupyterHub Machine Authentication Not-Installed: Kerberos
6. Permissions user with root or sudo
7. MIT Kerberos installed on your windows machine
6. Miniconda
Add Anaconda User/Dir
adduser anaconda;
passwd anaconda;
mkdir /opt/anaconda;
Download and installation
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /tmp;
chmod +x /tmp/Miniconda3-latest-Linux-x86_64.sh;
/tmp/Miniconda3-latest-Linux-x86_64.sh -b -u -p /opt/anaconda;
Note 1: Change with your values in the highlighted field.
Note 2: JupyterHub requires Python 3.X, therefore it will be installed Anaconda 3
Add Permissions MiniConda
chown -R anaconda:anaconda /opt/anaconda;
chmod -R go-w /opt/anaconda && chmod -R go+rX /opt/anaconda;
mkdir -p /apps/anaconda/pkgs;
chown -R anaconda:anaconda /apps/anaconda/pkgs && chmod -R oug+rwx /apps;
7. Anaconda
Set Conda Bash Configurations
nano .bashrc;
export CONDA_PKGS_DIRS="/apps/anaconda/pkgs","/opt/anaconda/pkgs","/home/$USER/.conda/pkgs"
export CONDA_ENVS_DIRS="/apps/anaconda/$USER/envs"
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/anaconda/etc/profile.d/conda.sh" ]; then
. "/opt/anaconda/etc/profile.d/conda.sh"
else
export PATH="/opt/anaconda/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
conda config --set auto_update_conda False && conda config --add channels conda-forge;
conda config --set pip_interop_enabled True;
Note: Change with your values in the highlighted field.
8. Jupyter or JupyterHub?
JupyterHub it’s a multi-purpose notebook that:
● Manages authentication.
● Spawns single-user notebook on-demand.
● Gives each user a complete notebook
server.
How to choose?
9. JupyterHub
JupyterHub needs to be executed with root privileges or at least some root privileges (ie for example to access to the
pam passwords). Therefore we will need to configure a special user (with no password) that it will be used by the
sudospawner!
For this example we will set: user: jupyter | group: jupyterhub to execute the JupyterHub Server as a service. Any new
user that should access to Jupyter and Spawn Notebooks … must be added to the JupyterHub group.
Create User/Group to operate as Service
sudo useradd jupyter && sudo groupadd jupyterhub && sudo usermod jupyter -G jupyterhub;
Add jupyter to root group & Give Read Permissions (PAM)
sudo usermod -a -G root jupyter; sudo chmod g+r /etc/shadow;
Log as Jupyter user
su - jupyter;
Note 1: it’s only necessary to change the highlighted
10. JupyterHub
Set Conda Bash Configurations
Use the configurations on the Page 7.
Create Environment for JupyterHub
conda create -n jupyterhub_env;
Activate Environment for JupyterHub
conda activate jupyterhub_env;
Install JupyterHub Packages
conda install jupyterhub jupyterlab notebook configurable-http-proxy;
Install sudospawner Package
conda install -c conda-forge sudospawner;
Check sudospawner location
which sudospawner;
Note 1: it’s only necessary to change the highlighted
Create JupyterHub Directories
sudo mkdir /etc/jupyterhub;
sudo chown jupyter:jupyterhub /etc/jupyterhub;
Generate JupyterHub Config file
cd /etc/jupyterhub && jupyterhub --generate-config;
11. JupyterHub
Create/Edit sudoers config
sudo nano /etc/sudoers.d/jupytersudoers;
Runas_Alias JUPYTER_USERS = jupyter
Cmnd_Alias JUPYTER_CMD =
/apps/anaconda/jupyter/envs/jupyterhub_env/bin/sudospawner
%jupyterhub ALL=(jupyter) /usr/bin/sudo
jupyter ALL=(%jupyterhub) NOPASSWD:JUPYTER_CMD
Start JupyterHub Server With Config File
jupyterhub -f /etc/jupyterhub/jupyterhub_config.py;
Note: it’s only necessary to change the highlighted, ex: for your ip.
Create/Edit sudoers config
sudo nano /etc/sudoers.d/jupytersudoers;
import os
import pwd
import subprocess
def create_dir_hook(spawner):
if not os.path.exists(os.path.join('/home/', spawner.user.name)):
subprocess.call(["sudo", "/sbin/mkhomedir_helper",
spawner.user.name])
c.Spawner.pre_spawn_hook = create_dir_hook
c.JupyterHub.bind_url = 'http://10.111.22.333:8000'
c.JupyterHub.hub_bind_url = 'http://10.111.22.333:8081'
c.JupyterHub.hub_ip = '10.111.22.333’
c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'
c.SudoSpawner.sudospawner_path =
'/apps/anaconda/jupyter/envs/jupyterhub_env/bin/sudospawner'
c.Authenticator.admin_users = {'jupyter'}
12. JupyterHub
Create systemd JupyterHub Directory
sudo mkdir -p /home/jupyter/.config/systemd;
Create systemd JupyterHub service Configuration
sudo nano /home/jupyter/.config/systemd/jupyterhub.service;
[Unit]
Description=Jupyterhub Server
After=syslog.target network-online.target
[Service]
Type=simple
User=jupyter
ExecStart=/etc/jupyterhub/runJupyterhub.sh
WorkingDirectory=/etc/jupyterhub
Restart=on-failure
RestartSec=1min
TimeoutSec=5min
[Install]
WantedBy=multi-user.target
Note: it’s only necessary to change the highlighted
Create JupyterHub Script for Systemd
nano /etc/jupyterhub/runJupyterhub.sh;
#!/bin/bash
export
CONDA_PKGS_DIRS="/apps/anaconda/pkgs","/opt/anaconda/pkgs","/home/$USER/.
conda/pkgs"
export CONDA_ENVS_DIRS="/apps/anaconda/$USER/envs"
__conda_setup="$('/opt/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/anaconda/etc/profile.d/conda.sh" ]; then
. "/opt/anaconda/etc/profile.d/conda.sh"
else
export PATH="/opt/anaconda/bin:$PATH"
fi
fi
unset __conda_setup
conda activate /apps/anaconda/jupyter/envs/jupyterhub_env
/apps/anaconda/jupyter/envs/jupyterhub_env/bin/jupyterhub -f
/etc/jupyterhub/jupyterhub_config.py 2>&1 | tee /var/log/jupyter/jupyterhub.log
13. JupyterHub
Create systemd JupyterHub service symbolic link
sudo ln -s /home/jupyter/.config/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service;
Enable/Start systemd JupyterHub service
sudo systemctl enable jupyterhub.service;
sudo systemctl start jupyterhub && systemctl status jupyterhub;
Note: it’s only necessary to change the highlighted
14. IPython Clusters
With this functionality it will enable on the current architecture, the ability to distribute your python processing between
local and/or remote cpu and therefore use the power of parallel processing.
Install ipyparallel
conda install ipyparallel;
Note: This package must be installed on the controller machine and on all remote engine nodes!
Apply to All Users
jupyter nbextension install --sys-prefix --py ipyparallel;
jupyter nbextension enable --sys-prefix --py ipyparallel;
jupyter serverextension enable --sys-prefix --py ipyparallel;
15. IPython Clusters
Create ssh profile on user
ipython profile create --parallel --profile=ssh;
Note: this is on the scope of the user that will run/spawn the notebook ex: tpsimoes
Configure ssh profile on user
nano /home/tpsimoes/.ipython/profile_ssh/ipcluster_config.py;
c.IPClusterStart.controller_launcher_class = 'Local'
c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engines = { 'cm1.localdomain' : 2, 'cm2.localdomain' : 5 }
nano /home/tpsimoes/.ipython/profile_ssh/ipcontroller_config.py;
c.IPControllerApp.location = 'cm1.localdomain'
c.HubFactory.client_ip = '10.111.22.333'
c.HubFactory.engine_ip = '10.111.22.333'
c.HubFactory.ip = '*'
Note: it’s only necessary to change the highlighted
16. So that IPython Cluster Controller (SSH profile) can communicate with all the engines (local and remote) we will need to
configure the SSH on Local machine and also on the remote nodes.
KeyLess Configuration
ssh-keygen;
Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
ssh-copy-id -i ~/.ssh/id_rsa.pub -p 22 tpsimoes@cm2.localdomain;
Add the SSH Public Key to the authorized_keys file on your target hosts.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys;
Add User to SSH
ssh tpsimoes@localhost;
ssh tpsimoes@cm1.localdomain;
ssh tpsimoes@cm2.localdomain;
Try connecting User via SSH
ssh -p '22' 'tpsimoes@cm2.localdomain';
Note: it’s only necessary to change the highlighted
IPython Clusters
17. IPython Clusters
When starting a Cluster via JupyterHub UI would should see on your logs the communication between machines…
JupyterHub Logs
[I 2021-02-22 14:28:43.979 SingleUserNotebookApp launcher:591] ensuring remote cm1.localdomain:.ipython/profile_ssh/security/ exists
Connection to cm1.localdomain closed.
[I 2021-02-22 14:28:44.776 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-client.json to
cm1.localdomain:.ipython/profile_ssh/security/ipcontroller-client.json
[I 2021-02-22 14:28:45.573 SingleUserNotebookApp launcher:591] ensuring remote cm1.localdomain:.ipython/profile_ssh/security/ exists
Connection to cm1.localdomain closed.
[I 2021-02-22 14:28:46.405 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-engine.json to
cm1.localdomain:.ipython/profile_ssh/security/ipcontroller-engine.json
[I 2021-02-22 14:28:47.308 SingleUserNotebookApp launcher:591] ensuring remote cm2.localdomain:.ipython/profile_ssh/security/ exists
Connection to cm2.localdomain closed.
[I 2021-02-22 14:28:48.087 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-client.json to
cm2.localdomain:.ipython/profile_ssh/security/ipcontroller-client.json
[I 2021-02-22 14:28:48.875 SingleUserNotebookApp launcher:591] ensuring remote cm2.localdomain:.ipython/profile_ssh/security/ exists
Connection to cm2.localdomain closed.
[I 2021-02-22 14:28:49.652 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-engine.json to
cm2.localdomain:.ipython/profile_ssh/security/ipcontroller-engine.json