Load Testing Jupyter Notebooks

Introduction

Consider this scenario: you set up a JupyterHub environment (to learn more, go to the JupyterHub section below) so that over 1000 participants of your online workshop can access Jupyter notebooks in JupyterHub. How do you ensure that the workshop runs smoothly? How do you ensure that the cloud servers you allocated for this event are sufficient? You might first reference similar threads to this one:

https://stackoverflow.com/questions/46569059/how-to-stress-load-test-jupyterhub-for-multiple-users

To learn the implementation- read on.

Performance Testing

Performance, scalability, and reliability of applications are key non-functional requirements of any product or service. This is especially true when a product is expected to be used by a large number of users.

This document overviews the Refactored platform and its JupyterHub environment, describes effective load tests for these types of systems, and addresses some of the main challenges the JupyterHub community faces when load testing in JupyterHub environments. The information presented in this document will benefit anyone interested in running load/stress tests in the JupyterHub environment.

Refactored

Refactored is an interactive, on-demand data training platform powered by AI. It provides a hands-on learning experience to accommodate various learning styles and levels of expertise. The platform consists of two core components:

Refactored website
JupyterHub environment.

JupyterHub

Jupyter is an open-source tool that provides an interface to create and share documents, including live code, the output of code execution, and visualizations. It also includes cells to create code or project documentation – all in a single document.

JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Students, researchers, and data scientists can get their work done in their workspaces on shared resources, that can be managed efficiently by system administrators.

To learn how to create a JupyterHub setup using a Kubernetes cluster, go to https://zero-to-jupyterhub.readthedocs.io/en/latest/.

Load Testing Approach

Running load tests on JupyterHub requires a unique approach. This tool differs significantly in the way it works as compared to a regular web application. Further, a modern authentication mechanism severely limits the options available to run seamless end-to-end tests.

Load Testing Tool

We use k6.io to perform load testing on Refactored in the JupyterHub environment.
k6.io is a developer-centric, free, and open-source load testing tool built to ensure an effective and intuitive performance testing experience.

JupyterHub Testing

To start load testing the JupyterHub environment, we need to take care of the end-to-end flow. This includes the server configurations, login/authentication, serving notebooks, etc.

Since we use a cloud authentication provider on Refactored, we ran into issues testing end-to-end flow due to severe restrictions in load testing cloud provider components. Generally, load testing such platforms is the responsibility of the cloud application provider- they are typically well-tested for loads and scalability. Hence, we decided to temporarily remove the authentication from the cloud provider and use a dummy authentication for the JupyterHub environment.

To do that, we needed to change the configuration in k8s/config.yaml under the JupyterHub code.

Find the configuration entry that specifies authentication below:

auth:

type: custom

custom:

className: ***oauthenticator.***Auth0OAuthenticator

config:

NOTE:

client_id: “!*****************************”

client_secret: “********************************”

oauth_callback_url: “https://***.test.com/hub/oauth_callback”

admin:

users:

– admin

In our case, we use a custom authenticator. , so changing it to the following dummy authenticator:

auth:

type: dummy

dummy:

password: *******1234

admin:

users:

– admin

GCP Configuration

In the GCP console under the Kubernetes cluster, take a look at the user pool as shown below:

Edit the user pool to reflect the number of nodes required to support load tests.
There might be a calculation involved to figure out the number of nodes.
In our case, we wanted to load test 300 user pods in JupyterHub. We created about 20 nodes as below:

k6 Configuration

k6.io is a tool for running load tests. It uses JavaScript (ES6 JS) as the base language for creating the tests.

First, install k6 from the k6.io website. There are 2 versions – cloud & open-source versions. We use the open-source version downloaded into the testing environment.

To install it on Debian Linux-based unix systems:

sudo apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv-keys 379CE192D401AB61
echo “deb https://dl.bintray.com/loadimpact/deb stable main” | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install k6

Check the documentation at https://github.com/loadimpact/k6 for other types of OS.

Running Load Tests

Creating Test Scripts

In our case, we identified the key tests to be performed on the JupyterHub environment:
1. Login.
2. Heartbeat check.
3. Roundtrip for Jupyter notebooks.

To run tests on Refactored, we create the .js module that does the following via JS code.

1. Imports

import { check, group, sleep } from ‘k6’;
import http from ‘k6/http’;

2. Configuration options

We set up the configuration options ahead of the tests. These options include the duration of the tests, number of users, maximum users simulated, and other parameters such as shut-down grace time for the tests to complete.

Here is a sample set of options:
export let options = {
max_vus: 300,
vus: 100,
stages: [
{ duration: “30s”, target: 10 },
{ duration: “4m”, target: 100 },
{ duration: “30s”, target: 0 }
],
thresholds: {
“RTT”: [“avg r.status === 200 });
}

);

3. Actual tests

We created the actual tests as JS functions within group objects provided by the k6.io framework. We had various groups, including a login group, a heartbeat group, and other individual module check groups. Further groups can be chained within those groups.

Here is a sample set of groups to test our JupyterHub environment:

export default function() {

group(‘v1 Refactored load testing’, function() {

group(‘heart-beat’, function() {

let res = http.get(“https://refactored.ai”);

check(res, { “status is 200”: (r) => r.status === 200 });

});

group(‘course aws deep racer – Home ‘, function() {

let res = http.get(url_deepracer_home);

check(res, {

“status is 200”: (r) => r.status === 200,

“AWS Deepracer Home .. done”: (r) => r.body.includes(‘<h3>AWS DeepRacer</h3>’)

});

})

group(‘course aws deep racer Pre-Workshop- Create your AWS account ‘, function() {

let res = http.get(url_create_aws);

check(res, {

“status is 200”: (r) => r.status === 200,

“Create AWS account.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Create your AWS account</h1>’)

});

group(‘course aws deep racer Pre-Workshop – Introduction to Autonomous Vehicle ‘, function() {

let res = http.get(url_intro_autonmous);

check(res, {

“status is 200”: (r) => r.status === 200,

“Introduction to Autonomous Vehicle.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Introduction to Autonomous Vehicles</h1>’)

});

group(‘course aws deep racer Pre-Workshop – Introduction to Machine learning ‘, function() {

let res = http.get(url_intro_ml);

check(res, {

“status is 200”: (r) => r.status === 200,

“Introduction to Machine learning.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Introduction to Machine learning</h1>’)

});

Load Test Results

The results of the load test are displayed while running the tests.

Start of the test.

The test is run by using the following command:

root@ip-172-31-0-241:REFACTORED-SITE-STAGE:# k6 run -u 300 -I 300 dsin100days_test.js

The parameters ‘u’ & ‘i’ provide the number of users to be simulated as well as the iterations to be performed respectively.

The first part of the test displays the configuration options, the test scenario, and the list of users created for the test.

Test execution.

Further results display the progress of the test. In this case, the login process, the creation of user pods, and the reading of the notebooks are displayed. Here is a snapshot of the output:

INFO[0027] loadtestuser25 Reading 1st notebook

INFO[0027] loadtestuser20 Reading 1st notebook

INFO[0027] loadtestuser220 Reading 1st notebook

INFO[0027] loadtestuser64 Reading 1st notebook

INFO[0027] loadtestuser98 Reading 1st notebook

INFO[0027] loadtestuser194 Reading 1st notebook

INFO[0028] loadtestuser273 Reading 1st notebook

INFO[0028] loadtestuser261 Reading 1st notebook

INFO[0028] loadtestuser218 Reading 1st notebook

INFO[0028] loadtestuser232 Reading 1st notebook

INFO[0028] loadtestuser52 Reading 1st notebook

INFO[0028] loadtestuser175 Reading 1st notebook

INFO[0028] loadtestuser281 Reading 1st notebook

INFO[0028] loadtestuser239 Reading 1st notebook

INFO[0028] loadtestuser112 Reading 1st notebook

INFO[0028] loadtestuser117 Reading 1st notebook

INFO[0028] loadtestuser159 Reading 1st notebook

INFO[0029] loadtestuser189 Reading 1st notebook

Final results

After the load test is completed, a summary of the test results is produced. This includes the time taken to complete the test and other statistics on the actual test.

Here is the final section of the results:

running (04m17.1s), 000/300 VUs, 300 complete and 0 interrupted iterations

default ✓ [======================================] 300 VUs 04m17.1s/10m0s 300/300 shared items

█ v1 Refactored Jupyter Hub load testing

█ login

✗ The login is successful..

↳ 89% — ✓ 267 / ✗ 33

█ Jupyter hub heart-beat

✗ Notebooks Availability…done

↳ 94% — ✓ 284 / ✗ 16

✓ heart-beat up..

█ get 01-Basic_data_types notebook

✓ Notebook loaded

✓ 01-Basic_data_types.. done

█ get 02-Lists_and_Nested_Lists notebook

✓ 02-Lists_and_Nested_Lists.. done

✓ Notebook loaded

█ get dealing-with-strings-and-dates notebook

✓ Notebook loaded

✓ dealing-with-strings-and-dates.. done

checks…………………: 97.97% ✓ 2367 ✗ 49

data_received…………..: 43 MB 166 kB/s

data_sent………………: 1.2 MB 4.8 kB/s

group_duration………….: avg=15.37s min=256.19ms med=491.52ms max=4m16s p(90)=38.11s p(95)=40.86s

http_req_blocked………..: avg=116.3ms min=2.54µs med=2.86µs max=1.79s p(90)=8.58µs p(95)=1.31s

http_req_connecting……..: avg=8.25ms min=0s med=0s max=98.98ms p(90)=0s p(95)=84.68ms

http_req_duration……….: avg=3.4s min=84.95ms med=453.65ms max=32.04s p(90)=13.88s p(95)=21.95s

http_req_receiving………: avg=42.37ms min=31.37µs med=135.32µs max=11.01s p(90)=84.24ms p(95)=84.96ms

http_req_sending………..: avg=66.1µs min=25.26µs med=50.03µs max=861.93µs p(90)=119.82µs p(95)=162.46µs

http_req_tls_handshaking…: avg=107.18ms min=0s med=0s max=1.68s p(90)=0s p(95)=1.23s

http_req_waiting………..: avg=3.35s min=84.83ms med=370.16ms max=32.04s p(90)=13.88s p(95)=21.95s

http_reqs………………: 3115 12.114161/s

iteration_duration………: avg=47.15s min=22.06s med=35.86s max=4m17s p(90)=55.03s p(95)=3m51s

iterations……………..: 300 1.166693/s

vus……………………: 1 min=1 max=300

vus_max………………..: 300 min=300 max=300

Interpreting Test Results:

In the above results, the key metrics are:

Http_reqs: gives the requests per second. In this case, it is 12.11 r/s. This is because it includes first-time requests. during the initial run, the codes are synced with GitHub and include idle time. This could also happen due to initial server spawns. In other cases, there could be as many as 70 requests per second.
Vus_max: maximum virtual users supported.
Iteration: 300 iterations. This could be n-fold as well.
Http_req_waiting: 3.35 s on average wait time during the round trip.

Running Individual Tests

The final step in this process is the testing of an individual user to see some key metrics around the usage of the JupyterHub environment.

The key metrics include:

Login-timed test
Notebook roundtrip
Notebook functions: start kernel, run all cells

This is performed by using a headless browser tool. In our case, we use PhantomJS as we are familiar with it. There are other tools to consider that will perform the same or even better.

Before we do the test, we must define the performance required from the page loads. The performance metrics include:

Load the notebook within 30 seconds.
Basic Python code execution must be completed within 30 seconds of the start of the execution.
In exceptional cases, due to the complexity of the code, it must not exceed 3 minutes of execution. This applies to the core data science code. There could be further exceptions to this rule depending on the notebook in advanced cases.

In the next technical paper, we will explore how to run individual tests.

– Written by Manikandan Rangaswamy