Load Testing Jupyter Notebooks
Introduction
Consider this scenario: you set up a JupyterHub environment (to learn more, go to the JupyterHub section below) so that over 1000 participants of your online workshop can access Jupyter notebooks in JupyterHub. How do you ensure that the workshop runs smoothly? How do you ensure that the cloud servers you allocated for this event are sufficient? You might first reference similar threads to this one:
https://stackoverflow.com/questions/46569059/how-to-stress-load-test-jupyterhub-for-multiple-users
To learn the implementation- read on.
Performance Testing
Performance, scalability, and reliability of applications are key non-functional requirements of any product or service. This is especially true when a product is expected to be used by a large number of users.
This document overviews the Refactored platform and its JupyterHub environment, describes effective load tests for these types of systems, and addresses some of the main challenges the JupyterHub community faces when load testing in JupyterHub environments. The information presented in this document will benefit anyone interested in running load/stress tests in the JupyterHub environment.
Refactored
Refactored is an interactive, on-demand data training platform powered by AI. It provides a hands-on learning experience to accommodate various learning styles and levels of expertise. The platform consists of two core components:
- Refactored website
- JupyterHub environment.
JupyterHub
Jupyter is an open-source tool that provides an interface to create and share documents, including live code, the output of code execution, and visualizations. It also includes cells to create code or project documentation – all in a single document.
JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Students, researchers, and data scientists can get their work done in their workspaces on shared resources, that can be managed efficiently by system administrators.
To learn how to create a JupyterHub setup using a Kubernetes cluster, go to https://zero-to-jupyterhub.readthedocs.io/en/latest/.
Load Testing Approach
Running load tests on JupyterHub requires a unique approach. This tool differs significantly in the way it works as compared to a regular web application. Further, a modern authentication mechanism severely limits the options available to run seamless end-to-end tests.
Load Testing Tool
We use k6.io to perform load testing on Refactored in the JupyterHub environment.
k6.io is a developer-centric, free, and open-source load testing tool built to ensure an effective and intuitive performance testing experience.
JupyterHub Testing
To start load testing the JupyterHub environment, we need to take care of the end-to-end flow. This includes the server configurations, login/authentication, serving notebooks, etc.
Since we use a cloud authentication provider on Refactored, we ran into issues testing end-to-end flow due to severe restrictions in load testing cloud provider components. Generally, load testing such platforms is the responsibility of the cloud application provider- they are typically well-tested for loads and scalability. Hence, we decided to temporarily remove the authentication from the cloud provider and use a dummy authentication for the JupyterHub environment.
To do that, we needed to change the configuration in k8s/config.yaml under the JupyterHub code.
Find the configuration entry that specifies authentication below:
auth:
type: custom
custom:
className: ***oauthenticator.***Auth0OAuthenticator
config:
NOTE:
client_id: “!*****************************”
client_secret: “********************************”
oauth_callback_url: “https://***.test.com/hub/oauth_callback”
admin:
users:
– admin
In our case, we use a custom authenticator. , so changing it to the following dummy authenticator:
auth:
type: dummy
dummy:
password: *******1234
admin:
users:
– admin
GCP Configuration
In the GCP console under the Kubernetes cluster, take a look at the user pool as shown below:
Edit the user pool to reflect the number of nodes required to support load tests.
There might be a calculation involved to figure out the number of nodes.
In our case, we wanted to load test 300 user pods in JupyterHub. We created about 20 nodes as below:
k6 Configuration
k6.io is a tool for running load tests. It uses JavaScript (ES6 JS) as the base language for creating the tests.
First, install k6 from the k6.io website. There are 2 versions – cloud & open-source versions. We use the open-source version downloaded into the testing environment.
To install it on Debian Linux-based unix systems:
sudo apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv-keys 379CE192D401AB61
echo “deb https://dl.bintray.com/loadimpact/deb stable main” | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install k6
Check the documentation at https://github.com/loadimpact/k6 for other types of OS.
Running Load Tests
Creating Test Scripts
In our case, we identified the key tests to be performed on the JupyterHub environment:
1. Login.
2. Heartbeat check.
3. Roundtrip for Jupyter notebooks.
To run tests on Refactored, we create the .js module that does the following via JS code.
1. Imports
import { check, group, sleep } from ‘k6’;
import http from ‘k6/http’;
2. Configuration options
We set up the configuration options ahead of the tests. These options include the duration of the tests, number of users, maximum users simulated, and other parameters such as shut-down grace time for the tests to complete.
Here is a sample set of options:
export let options = {
max_vus: 300,
vus: 100,
stages: [
{ duration: “30s”, target: 10 },
{ duration: “4m”, target: 100 },
{ duration: “30s”, target: 0 }
],
thresholds: {
“RTT”: [“avg r.status === 200 });
}
);
3. Actual tests
We created the actual tests as JS functions within group objects provided by the k6.io framework. We had various groups, including a login group, a heartbeat group, and other individual module check groups. Further groups can be chained within those groups.
Here is a sample set of groups to test our JupyterHub environment:
export default function() {
group(‘v1 Refactored load testing’, function() {
group(‘heart-beat’, function() {
let res = http.get(“https://refactored.ai”);
check(res, { “status is 200”: (r) => r.status === 200 });
});
group(‘course aws deep racer – Home ‘, function() {
let res = http.get(url_deepracer_home);
check(res, {
“status is 200”: (r) => r.status === 200,
“AWS Deepracer Home .. done”: (r) => r.body.includes(‘<h3>AWS DeepRacer</h3>’)
});
})
group(‘course aws deep racer Pre-Workshop- Create your AWS account ‘, function() {
let res = http.get(url_create_aws);
check(res, {
“status is 200”: (r) => r.status === 200,
“Create AWS account.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Create your AWS account</h1>’)
});
});
group(‘course aws deep racer Pre-Workshop – Introduction to Autonomous Vehicle ‘, function() {
let res = http.get(url_intro_autonmous);
check(res, {
“status is 200”: (r) => r.status === 200,
“Introduction to Autonomous Vehicle.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Introduction to Autonomous Vehicles</h1>’)
});
});
group(‘course aws deep racer Pre-Workshop – Introduction to Machine learning ‘, function() {
let res = http.get(url_intro_ml);
check(res, {
“status is 200”: (r) => r.status === 200,
“Introduction to Machine learning.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Introduction to Machine learning</h1>’)
});
});
Load Test Results
The results of the load test are displayed while running the tests.
Start of the test.
The test is run by using the following command:
root@ip-172-31-0-241:REFACTORED-SITE-STAGE:# k6 run -u 300 -I 300 dsin100days_test.js
The parameters ‘u’ & ‘i’ provide the number of users to be simulated as well as the iterations to be performed respectively.
The first part of the test displays the configuration options, the test scenario, and the list of users created for the test.
Test execution.
Further results display the progress of the test. In this case, the login process, the creation of user pods, and the reading of the notebooks are displayed. Here is a snapshot of the output:
INFO[0027] loadtestuser25 Reading 1st notebook
INFO[0027] loadtestuser20 Reading 1st notebook
INFO[0027] loadtestuser220 Reading 1st notebook
INFO[0027] loadtestuser64 Reading 1st notebook
INFO[0027] loadtestuser98 Reading 1st notebook
INFO[0027] loadtestuser194 Reading 1st notebook
INFO[0028] loadtestuser273 Reading 1st notebook
INFO[0028] loadtestuser261 Reading 1st notebook
INFO[0028] loadtestuser218 Reading 1st notebook
INFO[0028] loadtestuser232 Reading 1st notebook
INFO[0028] loadtestuser52 Reading 1st notebook
INFO[0028] loadtestuser175 Reading 1st notebook
INFO[0028] loadtestuser281 Reading 1st notebook
INFO[0028] loadtestuser239 Reading 1st notebook
INFO[0028] loadtestuser112 Reading 1st notebook
INFO[0028] loadtestuser117 Reading 1st notebook
INFO[0028] loadtestuser159 Reading 1st notebook
INFO[0029] loadtestuser189 Reading 1st notebook
Final results
After the load test is completed, a summary of the test results is produced. This includes the time taken to complete the test and other statistics on the actual test.
Here is the final section of the results:
running (04m17.1s), 000/300 VUs, 300 complete and 0 interrupted iterations
default ✓ [======================================] 300 VUs 04m17.1s/10m0s 300/300 shared items
█ v1 Refactored Jupyter Hub load testing
█ login
✗ The login is successful..
↳ 89% — ✓ 267 / ✗ 33
█ Jupyter hub heart-beat
✗ Notebooks Availability…done
↳ 94% — ✓ 284 / ✗ 16
✓ heart-beat up..
█ get 01-Basic_data_types notebook
✓ Notebook loaded
✓ 01-Basic_data_types.. done
█ get 02-Lists_and_Nested_Lists notebook
✓ 02-Lists_and_Nested_Lists.. done
✓ Notebook loaded
█ get dealing-with-strings-and-dates notebook
✓ Notebook loaded
✓ dealing-with-strings-and-dates.. done
checks…………………: 97.97% ✓ 2367 ✗ 49
data_received…………..: 43 MB 166 kB/s
data_sent………………: 1.2 MB 4.8 kB/s
group_duration………….: avg=15.37s min=256.19ms med=491.52ms max=4m16s p(90)=38.11s p(95)=40.86s
http_req_blocked………..: avg=116.3ms min=2.54µs med=2.86µs max=1.79s p(90)=8.58µs p(95)=1.31s
http_req_connecting……..: avg=8.25ms min=0s med=0s max=98.98ms p(90)=0s p(95)=84.68ms
http_req_duration……….: avg=3.4s min=84.95ms med=453.65ms max=32.04s p(90)=13.88s p(95)=21.95s
http_req_receiving………: avg=42.37ms min=31.37µs med=135.32µs max=11.01s p(90)=84.24ms p(95)=84.96ms
http_req_sending………..: avg=66.1µs min=25.26µs med=50.03µs max=861.93µs p(90)=119.82µs p(95)=162.46µs
http_req_tls_handshaking…: avg=107.18ms min=0s med=0s max=1.68s p(90)=0s p(95)=1.23s
http_req_waiting………..: avg=3.35s min=84.83ms med=370.16ms max=32.04s p(90)=13.88s p(95)=21.95s
http_reqs………………: 3115 12.114161/s
iteration_duration………: avg=47.15s min=22.06s med=35.86s max=4m17s p(90)=55.03s p(95)=3m51s
iterations……………..: 300 1.166693/s
vus……………………: 1 min=1 max=300
vus_max………………..: 300 min=300 max=300
Interpreting Test Results:
In the above results, the key metrics are:
- Http_reqs: gives the requests per second. In this case, it is 12.11 r/s. This is because it includes first-time requests. during the initial run, the codes are synced with GitHub and include idle time. This could also happen due to initial server spawns. In other cases, there could be as many as 70 requests per second.
- Vus_max: maximum virtual users supported.
- Iteration: 300 iterations. This could be n-fold as well.
- Http_req_waiting: 3.35 s on average wait time during the round trip.
Running Individual Tests
The final step in this process is the testing of an individual user to see some key metrics around the usage of the JupyterHub environment.
The key metrics include:
- Login-timed test
- Notebook roundtrip
- Notebook functions: start kernel, run all cells
This is performed by using a headless browser tool. In our case, we use PhantomJS as we are familiar with it. There are other tools to consider that will perform the same or even better.
Before we do the test, we must define the performance required from the page loads. The performance metrics include:
- Load the notebook within 30 seconds.
- Basic Python code execution must be completed within 30 seconds of the start of the execution.
- In exceptional cases, due to the complexity of the code, it must not exceed 3 minutes of execution. This applies to the core data science code. There could be further exceptions to this rule depending on the notebook in advanced cases.
In the next technical paper, we will explore how to run individual tests.
– Written by Manikandan Rangaswamy