Step into the world of boundless opportunities at our weekly and monthly Blog events, designed to empower and equip students and professionals with cutting-edge skills in Business Intelligence and Analytics. Brace yourself for an awe-inspiring lineup of events, ranging from Power BI and Data Warehouse events, SQL Wednesday events, Qlik and Tableau events, and IPBC Saturday events, to multiple sessions, focused on helping students ace their coursework and mortgage projects.
Power BI Event (Monday, 7:30 pm CST)
Data Warehouse (ETL) Event (Monday, 7:30 pm CST)
Our Power BI and Data Warehouse event is an excellent opportunity for beginners and professionals to learn and improve their skills in creating effective data visualizations and building data warehouses. Our experienced trainers will provide a comprehensive overview of the latest tools and techniques to help you unlock the full potential of Power BI and Data Warehouse. Join us on Monday at 7:30 pm CST to learn more.
SQL Wednesday Event (2nd and 3rd Wednesday 7:30pm CST)
Our SQL Wednesday event is designed to help participants gain in-depth knowledge and understanding of SQL programming language. The event is divided into two sessions on the 2nd and 3rd Wednesday of every month, where we cover different topics related to SQL programming. Our experts will guide you through the nuances of SQL programming, and teach you how to use the language to extract insights from large datasets.
Tableau Events (Thursday 7:30 pm CST)
Qlik Events (Thursday 7:30 pm CST)
Our Qlik and Tableau events are dedicated to helping participants master the art of data visualization using these powerful tools. Whether you are a beginner or an experienced professional, our trainers will provide you with valuable insights and best practices to create compelling data stories using Qlik and Tableau. Join us on Thursday to learn how to make sense of complex data and present it in an engaging and impactful way.
IPBC Saturday Event (Saturday at 10 am CST)
Our IPBC Saturday event is designed to provide participants with a broad understanding of the fundamentals of business analytics, including predictive analytics, descriptive analytics, and prescriptive analytics. Our trainers will provide hands-on experience with the latest tools and techniques, and demonstrate how to apply analytics to real-world business problems.
Mortgage Project Help (Monday, Wednesday, & Thursday at 7:30 pm CST)
For those students who need help with their mortgage projects, we have dedicated sessions on Monday, Wednesday, and Thursday at 7:30 pm CST. Our experts will guide you through the process of creating a successful mortgage project, and help you understand the key factors that contribute to a successful project.
Homework help (Wednesday, Thursday, Saturday)
We understand that students may face challenges in their coursework, and may need additional help to understand concepts or complete assignments. That’s why we offer dedicated homework help sessions on Wednesday at 8 pm CST, Thursday at 7:30 pm CST, and Saturday at 1:30 pm CST. Our tutors will provide personalized guidance and support to help you overcome any challenges you may face in your coursework.
CAP Competition Event (1st Wednesday of the Month at 7:30 pm CST)
We have our monthly CAP competition where students showcase their communication skills and compete in our Monthly Data Challenge. Open to all students, this event offers a chance to sharpen skills and showcase abilities in front of a live audience. The top three winners move on to the next level. The event is free, so come and support your fellow classmates on the 1st Wednesday of every month at 7:30 pm CST. We look forward to seeing you there!
The Good Life Event (1st Thursday of the Month at 10 am CST)
Good Life event on the 1st Thursday of every month at 10 am CST. Successful alumni come to share their inspiring success stories and offer valuable advice to current students. Don’t miss this opportunity to gain insights and learn from those who have already achieved success. It’s an event not to be missed, so mark your calendar and join us for the next Good Life event.
Data Talent Showcase Event (4th Thursday of Every Month at 4 pm CST)
Our Data Talent Showcase Event is the next level of the CAP Competition where the top three winners compete against each other. It’s an event where judges from the industry come to evaluate and select the winner based on the projects presented. This event is a great opportunity for students to showcase their skills and receive feedback from industry experts. Join us at the event and witness the competition among the best students, and see who comes out on top!
Discover the electrifying world of events that Colaberry organizes for students and alumni, aimed at fostering continuous growth and progress in the ever-evolving realm of Business Intelligence and Analytics. With an ever-changing landscape, our dynamic and captivating lineup of events ensures that you stay ahead of the curve and are continuously intrigued. Get ready to be swept off your feet by the exciting opportunities that await you!
The One Question to Ask Chat GPT to Excel in Any Job
Have you ever found yourself struggling to complete a task at work or unsure of what questions to ask to gain the skills you need? We’ve all been there, trying to know what we don’t know. But what if there was a simple solution that could help you become better at any job, no matter what industry you’re in?
At Colaberry, we’ve discovered the power of asking the right questions at the right time. Our one-year boot camp takes individuals with no experience in the field and transforms them into top-performing data analysts and developers. And one of the keys to our success is teaching our students how to use Chat GPT and how to ask the right questions.
Everyones talking about Chat GPT but the key to mastery with it lies in knowing how to ask the right question to find the answer you need. What if there was one question you could ask Chat GPT to become better at any job? This a question that acts like a magic key and unlocks a world of possibilities and can help you gain the skills you need to excel in your career.
Are you ready? The question is actually asking for more questions.
“What are 10 questions I should ask ChatGPT to help gain the skills needed to complete this requirement?”
By passing in any set of requirements or instructions for any project, Chat GPT can provide you with a list of questions you didn’t know you needed to ask.
In this example, we used “mowing a lawn”, something simple we all think we know how to do right? But, do we know how to do it like an expert?
Looking at the answers Chat GPT gave us helps us see factors we might not ever have thought of. Now instead of doing something “ok” using what we know and asking a pointed or direct question, we can unlock the knowledge of the entire world on the task!
And the best part? You can even ask Chat GPT for the answers.
Now, imagine you had a team of data analysts who were not only trained in how to think like this but how to be able to overcome any technical obstacle they met.
If you’re looking for talent that not only has a solid foundation in data analytics and how to integrate the newest technology but how to maximize both of those tools, then Colaberry is the perfect partner. We specialize in this kind of forward-thinking training. Not just how to do something, but how to use all available tools to do something, to learn how to do it, and more. Real-life application of “smarter, not harder”.
Our approach is built on learning a foundation of data knowledge that is fully integrated with the latest tech available, to speed up the learning process. We use Chat GPT and other AI tools to help our students become self-sufficient and teach them how to apply their skills to newer and more difficult problem sets.
But, they don’t do it alone. Our tightly knit alumni network consists of over 3,000 data professionals throughout the US, and many of Colaberry’s graduates have gone on to become Data leaders in their organization, getting promoted to roles such as Directors, VPs, and Managers. When you hire with Colaberry, you’re not just hiring one person – you’re hiring a network of highly skilled data professionals.
So why not take the first step toward unlocking the full potential of your data? Let Colaberry supply you with the data talent you need to take your company to the next level.
Contact us today to learn more about our services and how we can help you meet your unique business goals.
This blog post explores the applications of the Pivot and Unpivot data manipulation techniques within the context of the Oil and Gas industry. These powerful techniques are commonly used in data analytics and data science to summarize and transform data and can be especially useful for professionals in this industry who need to analyze large datasets. Through the use of SQL Server, the blog will demonstrate how these techniques can be effectively applied to the Oil and Gas industry to streamline data analysis and improve decision-making.
Pivot and Unpivot are powerful data manipulation techniques used to summarize and transform data. These techniques are widely used in the data analytics and data science fields. In this blog, we will discuss how these techniques can be applied to the Oil and Gas industry using SQL Server.
Understanding the Different Concept Types in Pivot and Unpivot
Pivot
Pivot is used to transform data from a row-based format to a column-based format. It allows data to be summarized and grouped based on columns.
Example: Consider the following data table that shows the sales of different products for different months.
The data can be transformed using the UNPIVOT operator as follows:
SELECT Product, Month, Sales FROM Sales UNPIVOT ( Sales FOR MonthIN (January, February, March) ) AS UnpivotTable;
The result will be as follows:
Product | Month | Sales -----------------------------Product A | January | 100 Product A | February | 200 Product A | March | 300 Product B | January | 400 Product B | February | 500 Product B | March | 600
Real-World Examples in the Oil & Gas Industry
Using SQL Server:
CREATETABLEOilProduction ( Country varchar(50), January int, February int, March int);
1. Display the total oil production of each country for the first quarter of the year (January to March).
View Answer
SELECT Country, SUM(January + February + March) AS TotalProduction FROM OilProduction GROUP BY Country;
2. Display the oil production of each country for each month.
View Answer
SELECT Country, Month, Sales FROM OilProduction UNPIVOT ( Sales FOR MonthIN (January, February, March) ) AS UnpivotTable;
3. Display the oil production of each country in a column-based format.
View Answer
SELECT*FROM (SELECT Country, Month, Sales FROM (SELECT Country, January, February, March FROM OilProduction) AS SourceTable UNPIVOT ( Sales FOR MonthIN (January, February, March) ) AS UnpivotTable) AS FinalTable PIVOT ( SUM(Sales) FOR MonthIN ([January], [February], [March]) ) AS PivotTable;
Most Commonly Asked Interview Question in Pivot and Unpivot
Q: What is the difference between Pivot and Unpivot?
A: Pivot and Unpivot are data manipulation techniques used to summarize and transform data. The main difference between these techniques is the direction of transformation. Pivot is used to transform data from a row-based format to a column-based format. Unpivot is used to transform data from a column-based format to a row-based format.
I have used these techniques in a previous project where I was required to analyze the sales of different products for different months. I used Pivot to summarize the data and transform it into a column-based format. This allowed me to easily analyze the sales of each product for each month. I also used Unpivot to transform the data back into a row-based format so that I could perform further analysis.
Conclusion
In this blog, we discussed the basics of Pivot and Unpivot and how they can be applied to the Oil and Gas industry using SQL Server. We also looked at real-world examples and the most commonly asked interview questions in Pivot and Unpivot. These techniques are powerful tools for data manipulation and can be used to summarize and transform data in a variety of industries.
Interested in a career in Data Analytics? Book a call with our admissions team or visit training.colaberry.com to learn more.
Dynamic SQL is a powerful technique used in SQL Server that enables developers to generate and execute SQL statements during runtime. This approach provides greater flexibility and adaptability, as it allows developers to construct SQL statements based on user inputs or other conditions. By utilizing dynamic SQL, developers can build applications that are more responsive to user needs and provide a more personalized experience. In this blog, we will delve into the different types of dynamic SQL and provide examples from the Telecommunications industry, demonstrating how dynamic SQL can be used to create more efficient and effective database applications. Whether you are a seasoned developer or just starting with SQL Server, this blog will help you master dynamic SQL and unleash its full potential.
Dynamic SQL is a technique used in SQL Server where the SQL statement is generated and executed at runtime. This allows you to write code that is more flexible and adaptable, as you can construct and execute SQL statements based on user inputs or other conditions. In this blog, we’ll explore the different types of dynamic SQL and provide examples from the Telecommunications industry.
Types of Dynamic SQL
Dynamic SELECT Statement
A dynamic SELECT statement is used to generate a SELECT statement at runtime, based on the inputs or conditions. For example, in the Telecommunications industry, you may need to generate a SELECT statement to retrieve data for a specific customer based on their customer ID.
Here’s an example of how you would generate a dynamic SELECT statement in SQL Server:
DECLARE @customerID INT=123; DECLARE @sql NVARCHAR(MAX); SET @sql =N'SELECT * FROM Customers WHERE CustomerID = '+CAST(@customerID ASNVARCHAR(10)); EXEC sp_executesql @sql;
In this example, the @customerID variable is set to 123, and the dynamic SELECT statement is generated using the @sql variable. The sp_executesql system stored procedure is used to execute the dynamic SQL statement.
Dynamic INSERT Statement
A dynamic INSERT statement is used to insert data into a table at runtime. For example, in the Telecommunications industry, you may need to insert data for a new customer into the Customers table.
Here’s an example of how you would generate a dynamic INSERT statement in SQL Server:
In this example, the @firstName and @lastName variables are set to ‘John’ and ‘Doe’, respectively, and the dynamic INSERT statement is generated using the @sql variable. The sp_executesql system stored procedure is used to execute the dynamic SQL statement.
Dynamic UPDATE Statement
A dynamic UPDATE statement is used to update data in a table at runtime. For example, in the Telecommunications industry, you may need to update the last name of a customer based on their customer ID.
Here’s an example of how you would generate a dynamic UPDATE statement in SQL Server:
DECLARE @customerID INT=123; DECLARE @lastName NVARCHAR(50) ='Smith'; DECLARE @sql NVARCHAR(MAX); SET @sql =N'UPDATE Customers SET LastName = '''+ @lastName +''' WHERE CustomerID = @customerID
Real-World Example Questions in Telecommunications Industry
1. Write a script to generate a table named Customers with columns CustomerID, FirstName, LastName, and PhoneNumber. Populate the table with sample data.
2. Write a dynamic SQL statement to retrieve all customers with the last name Doe.
View Answer
DECLARE @lastName NVARCHAR(50) ='Doe'; DECLARE @sql NVARCHAR(MAX); SET @sql =N'SELECT * FROM Customers WHERE LastName = '''+ @lastName +''''; EXEC sp_executesql @sql;
3. Write a dynamic SQL statement to update the phone number for customer with ID 123 to 555-555-1215.
View Answer
DECLARE @customerID INT=123; DECLARE @phoneNumber NVARCHAR(20) ='555-555-1215'; DECLARE @sql NVARCHAR(MAX); SET @sql =N'UPDATE Customers SET PhoneNumber = '''+ @phoneNumber +''' WHERE CustomerID = '+CAST(@customerID ASNVARCHAR(10)); EXEC sp_executesql @sql;
Interview Question and Answer
Q: What is Dynamic SQL and how have you used it in a project?
A: Dynamic SQL is a technique used in SQL Server where the SQL statement is generated and executed at runtime. I have used dynamic SQL in a project where I was building a reporting system for a Telecommunications company. The system allowed users to generate reports based on various criteria such as customer information, call data, and billing data. To achieve this, I used dynamic SQL to generate SELECT statements based on the user inputs and then executed those statements to retrieve the data. This approach allowed me to write more flexible and adaptable code that could handle different reporting requirements.
Conclusion
In conclusion, dynamic SQL is a powerful technique in SQL Server that allows you to generate and execute SQL statements at runtime. By using dynamic SQL, you can write code that is more flexible and adaptable, making it easier to handle different scenarios and requirements. In this blog, we explored the different types of dynamic SQL and provided examples from the Telecommunications industry. We also provided real-world example questions and an interview question and answer to help you better understand the concept of dynamic SQL.
Interested in a career in Data Analytics? Book a call with our admissions team or visit training.colaberry.com to learn more.
A Comprehensive Guide to SQL Case Statement in Healthcare Industry
The SQL Case Statement is an essential feature of SQL that enables developers and data analysts to build conditional logic into their queries. By evaluating a set of conditions, the Case Statement returns a result that is based on the outcome of the evaluation. This functionality can be used to create complex, multi-level decision trees within a SQL query. With the Case Statement, data analysts can effectively analyze and extract specific data sets from vast data sources, making it an indispensable tool in the data analysis process. Overall, the SQL Case Statement is a powerful feature of SQL that provides developers and data analysts with greater flexibility and precision in their data analysis and decision-making capabilities.
SQL Case Statement is a conditional statement in SQL that returns a result based on the evaluation of a set of conditions. The Case Statement is used to implement conditional logic in SQL queries, making it a powerful tool for data analysis and decision-making.
Types of SQL Case Statement with Examples From The Healthcare Industry
Simple Case Statement
A Simple Case Statement is used to evaluate a single expression and return a corresponding result. For example, in the Healthcare Industry, you can use a Simple Case Statement to categorize patients based on their age.
SELECT PatientID, PatientName, Age, CASEWHEN Age <18THEN'Child'WHEN Age BETWEEN18AND64THEN'Adult'ELSE'Senior'ENDAS PatientCategory FROM Patients;
Searched Case Statement
A Searched Case Statement evaluates multiple conditions and returns a result based on the first matching condition. For example, in the Healthcare Industry, you can use a Searched Case Statement to calculate the co-pay amount for a patient based on their insurance plan.
SELECT PatientID, PatientName, InsurancePlan, CASE WHEN InsurancePlan='Plan A' THEN 50 WHEN InsurancePlan='Plan B' THEN 40 ELSE30ENDASCoPayAmount FROM Patients;
Nested Case Statement
A Nested Case Statement is used to evaluate multiple conditions within another Case Statement. For example, in the Healthcare Industry, you can use a Nested Case Statement to categorize patients based on their age and insurance plan.
SELECT PatientID, PatientName, Age, InsurancePlan, CASEWHEN Age <18THEN'Child'ELSECASEWHEN InsurancePlan ='Plan A'THEN'Adult with Plan A'WHEN InsurancePlan ='Plan B'THEN'Adult with Plan B'ELSE'Senior'ENDENDAS PatientCategory FROM Patients;
Real-World Example Questions in the Healthcare Industry
Script to generate tables and records needed for the real-world example questions:
1. What is the average age of patients with Plan A and Plan B insurance?
View Answer
SELECTCASEWHEN insurance_plan ='Plan A'THEN'Plan A'WHEN insurance_plan ='Plan B'THEN'Plan B'ENDas Insurance_Plan, AVG(age) as Average_Age FROM patients GROUP BY insurance_plan
2. What is the total number of patients for each insurance plan?
View Answer
SELECT insurance_plan, COUNT(*)asTotal_PatientsFROM patients GROUP BY insurance_plan
3. List the patients and their age categories (Child, Adult with Plan A, Adult with Plan B, Senior) based on their age and insurance plan.
View Answer
SELECT patient_name, age, insurance_plan, CASEWHEN age <=18THEN'Child'WHEN age BETWEEN18and65AND insurance_plan ='Plan A'THEN'Adult with Plan A'WHEN age BETWEEN18and65AND insurance_plan ='Plan B'THEN'Adult with Plan B'WHEN age >65THEN'Senior'ENDas Age_Category FROM patients
Most Commonly Asked Interview Question and Answer
Q: How do you use the SQL Case Statement in a real-world scenario?
A: I have used the SQL Case Statement in several real-world projects, including in the Healthcare Industry. One specific example is when I was working on a project to categorize patients based on their age and insurance plan. To accomplish this, I used a Nested Case Statement that evaluated the patient’s age and insurance plan and returned the appropriate patient category. The final result was a table that displayed the patient’s information, including their age category, which was used for further analysis and decision-making. The use of the Case Statement made the process of categorizing patients much simpler and more efficient.
Conclusion
The SQL Case Statement is a versatile and powerful tool for data analysis and decision-making in SQL. With the different types of Case Statements, including Simple, Searched, and Nested, you can implement complex conditional logic and return results based on multiple evaluations. By using examples from the Healthcare Industry, you can see the practical applications of the Case Statement and how it can be used to improve your data analysis and decision-making processes.
Interested in a career in Data Analytics? Book a call with our admissions team or visit training.colaberry.com to learn more.
In our organization, Colaberry Inc, we provide professionals from various backgrounds and various levels of experience, with the platform and the opportunity to learn Data Analytics and Data Science. In order to teach Data Science, the Jupyter Notebook platform is one of the most important tools. A Jupyter Notebook is a document within an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
In this blog, we will learn the basic architecture of JupyterHub, the multi-user jupyter notebook platform, its working mechanism, and finally how to set up jupyter notebooks to serve a large user base.
Why Jupyter Notebooks?
In our platform, refactored.ai we provide users an opportunity to learn Data Science and AI by providing courses and lessons on Data Science and machine learning algorithms, the basics of the python programming language, and topics such as data handling and data manipulation.
Our approach to teaching these topics is to provide an option to “Learn by doing”. In order to provide practical hands-on learning, the content is delivered using the Jupyter Notebooks technology.
Jupyter notebooks allow users to combine code, text, images, and videos in a single document. This also makes it easy for students to share their work with peers and instructors. Jupyter notebook also gives users access to computational environments and resources without burdening the users with installation and maintenance tasks.
Limitations
One of the limitations of the Jupyter Notebook server is that it is a single-user environment. When you are teaching a group of students learning data science, the basic Jupyter Notebook server falls short of serving all the users.
JupyterHub comes to our rescue when it comes to serving multiple users, with their own separate Jupyter Notebook servers seamlessly. This makes JupyterHub equivalent to a web application that could be integrated into any web-based platform, unlike the regular jupyter notebooks.
JupyterHub Architecture
The below diagram is a visual explanation of the various components of the JupyterHub platform. In the subsequent sections, we shall see what each component is and how the various components work together to serve multiple users with jupyter notebooks.
Components of JupyterHub
Notebooks
At the core of this platform are the Jupyter Notebooks. These are live documents that contain user code, write-up or documentation, and results of code execution in a single document. The contents of the notebook are rendered in the browser directly. They come with a file extension .ipynb. The figure below depicts how a jupyter notebook looks:
Notebook Server
As mentioned above, the notebook servers serve jupyter notebooks as .ipynb files. The browser loads the notebooks and then interacts with the notebook server via sockets. The code in the notebook is executed in the notebook server. These are single-user servers by design.
Hub
Hub is the architecture that supports serving jupyter notebooks to multiple users. In order to support multiple users, the Hub uses several components such as Authenticator, User Database, and Spawner.
Authenticator
This component is responsible for authenticating the user via one of the several authentication mechanisms. It supports OAuth, GitHub, and Google to name a few of the several available options. This component is responsible for providing an Auth Token after the user is successfully authenticated. This token is used to provide access for the corresponding user.
Refer to JupyterHub documentation for an exhaustive list of options. One of the notable options is using an identity aggregator platform such as Auth0 that supports several other options.
User Database
Internally, Jupyter Hub uses a user database to store the user information to spawn separate user pods for the logged-in user and then serve notebooks contained within the user pods for individual users.
Spawner
A spawner is a worker component that creates individual servers or user pods for each user allowed to access JupyterHub. This mechanism ensures multiple users are served simultaneously. It is to be noted that there is a predefined limitation on the number of the simultaneous first-time spawn of user pods, which is roughly about 80 simultaneous users. However, this does not impact the regular usage of the individual servers after initial user pod creation.
How It All Works Together
The mechanism used by JupyterHub to authenticate multiple users and provide them with their own Jupyter Notebook servers is described below.
The user requests access to the Jupyter notebook via the JupyterHub (JH) server. The JupyterHub then authenticates the user using one of the configured authentication mechanisms such as OAuth. This returns an auth token to the user to access the user pod. A separate Jupyter Notebook server is created and the user is provided access to it. The requested notebook in that server is returned to the user in the browser. The user then writes code (or documentation text) in the notebook. The code is then executed in the notebook server and the response is returned to the user’s browser.
Deployment and Scalability
The JupyterHub servers could be deployed in two different approaches: Deployed on the cloud platforms such as AWS or Google Cloud platform. This uses Docker and Kubernetes clusters in order to scale the servers to support thousands of users. A lightweight deployment on a single virtual instance to support a small set of users.
Scalability
In order to support a few thousand users and more, we use the Kubernetes cluster deployment on the Google Cloud platform. Alternatively, this could also have been done on the Amazon AWS platform to support a similar number of users.
This uses a Hub instance and multiple user instances each of which is known as a pod. (Refer to the architecture diagram above). This deployment architecture scales well to support a few thousand users seamlessly.
To learn more about how to set up your own JupyterHub instance, refer to the Zero to JupyterHub documentation.
Conclusion
JupyterHub is a scalable architecture of Jupyter Notebook servers that supports thousands of users in a maintainable cluster environment on popular cloud platforms.
This architecture suits several use cases with thousands of users and a large number of simultaneous users, for example, an online Data Science learning platform such as refactored.ai
Consider this scenario: you set up a JupyterHub environment (to learn more, go to the JupyterHub section below) so that over 1000 participants of your online workshop can access Jupyter notebooks in JupyterHub. How do you ensure that the workshop runs smoothly? How do you ensure that the cloud servers you allocated for this event are sufficient? You might first reference similar threads to this one:
Performance, scalability, and reliability of applications are key non-functional requirements of any product or service. This is especially true when a product is expected to be used by a large number of users.
This document overviews the Refactored platform and its JupyterHub environment, describes effective load tests for these types of systems, and addresses some of the main challenges the JupyterHub community faces when load testing in JupyterHub environments. The information presented in this document will benefit anyone interested in running load/stress tests in the JupyterHub environment.
Refactored
Refactored is an interactive, on-demand data training platform powered by AI. It provides a hands-on learning experience to accommodate various learning styles and levels of expertise. The platform consists of two core components:
Refactored website
JupyterHub environment.
JupyterHub
Jupyter is an open-source tool that provides an interface to create and share documents, including live code, the output of code execution, and visualizations. It also includes cells to create code or project documentation – all in a single document.
JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Students, researchers, and data scientists can get their work done in their workspaces on shared resources, that can be managed efficiently by system administrators.
Running load tests on JupyterHub requires a unique approach. This tool differs significantly in the way it works as compared to a regular web application. Further, a modern authentication mechanism severely limits the options available to run seamless end-to-end tests.
Load Testing Tool
We use k6.io to perform load testing on Refactored in the JupyterHub environment. k6.io is a developer-centric, free, and open-source load testing tool built to ensure an effective and intuitive performance testing experience.
JupyterHub Testing
To start load testing the JupyterHub environment, we need to take care of the end-to-end flow. This includes the server configurations, login/authentication, serving notebooks, etc.
Since we use a cloud authentication provider on Refactored, we ran into issues testing end-to-end flow due to severe restrictions in load testing cloud provider components. Generally, load testing such platforms is the responsibility of the cloud application provider- they are typically well-tested for loads and scalability. Hence, we decided to temporarily remove the authentication from the cloud provider and use a dummy authentication for the JupyterHub environment.
To do that, we needed to change the configuration in k8s/config.yaml under the JupyterHub code.
Find the configuration entry that specifies authentication below:
In our case, we use a custom authenticator. , so changing it to the following dummy authenticator:
auth:
type: dummy
dummy:
password: *******1234
admin:
users:
– admin
GCP Configuration
In the GCP console under the Kubernetes cluster, take a look at the user pool as shown below:
Edit the user pool to reflect the number of nodes required to support load tests. There might be a calculation involved to figure out the number of nodes. In our case, we wanted to load test 300 user pods in JupyterHub. We created about 20 nodes as below:
k6 Configuration
k6.io is a tool for running load tests. It uses JavaScript (ES6 JS) as the base language for creating the tests.
First, install k6 from the k6.io website. There are 2 versions – cloud & open-source versions. We use the open-source version downloaded into the testing environment.
In our case, we identified the key tests to be performed on the JupyterHub environment: 1. Login. 2. Heartbeat check. 3. Roundtrip for Jupyter notebooks.
To run tests on Refactored, we create the .js module that does the following via JS code.
1. Imports
import { check, group, sleep } from ‘k6’; import http from ‘k6/http’;
2. Configuration options
We set up the configuration options ahead of the tests. These options include the duration of the tests, number of users, maximum users simulated, and other parameters such as shut-down grace time for the tests to complete.
Here is a sample set of options: export let options = { max_vus: 300, vus: 100, stages: [ { duration: “30s”, target: 10 }, { duration: “4m”, target: 100 }, { duration: “30s”, target: 0 } ], thresholds: { “RTT”: [“avg r.status === 200 }); }
);
3. Actual tests
We created the actual tests as JS functions within group objects provided by the k6.io framework. We had various groups, including a login group, a heartbeat group, and other individual module check groups. Further groups can be chained within those groups.
Here is a sample set of groups to test our JupyterHub environment:
group(‘course aws deep racer Pre-Workshop – Introduction to Autonomous Vehicle ‘, function() {
let res = http.get(url_intro_autonmous);
check(res, {
“status is 200”: (r) => r.status === 200,
“Introduction to Autonomous Vehicle.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Introduction to Autonomous Vehicles</h1>’)
});
});
group(‘course aws deep racer Pre-Workshop – Introduction to Machine learning ‘, function() {
let res = http.get(url_intro_ml);
check(res, {
“status is 200”: (r) => r.status === 200,
“Introduction to Machine learning.. done”: (r) => r.body.includes(‘<h1 class=”main_heading”>Introduction to Machine learning</h1>’)
});
});
Load Test Results
The results of the load test are displayed while running the tests.
Start of the test.
The test is run by using the following command:
root@ip-172-31-0-241:REFACTORED-SITE-STAGE:# k6 run -u 300 -I 300 dsin100days_test.js
The parameters ‘u’ & ‘i’ provide the number of users to be simulated as well as the iterations to be performed respectively.
The first part of the test displays the configuration options, the test scenario, and the list of users created for the test.
Test execution.
Further results display the progress of the test. In this case, the login process, the creation of user pods, and the reading of the notebooks are displayed. Here is a snapshot of the output:
INFO[0027] loadtestuser25 Reading 1st notebook
INFO[0027] loadtestuser20 Reading 1st notebook
INFO[0027] loadtestuser220 Reading 1st notebook
INFO[0027] loadtestuser64 Reading 1st notebook
INFO[0027] loadtestuser98 Reading 1st notebook
INFO[0027] loadtestuser194 Reading 1st notebook
INFO[0028] loadtestuser273 Reading 1st notebook
INFO[0028] loadtestuser261 Reading 1st notebook
INFO[0028] loadtestuser218 Reading 1st notebook
INFO[0028] loadtestuser232 Reading 1st notebook
INFO[0028] loadtestuser52 Reading 1st notebook
INFO[0028] loadtestuser175 Reading 1st notebook
INFO[0028] loadtestuser281 Reading 1st notebook
INFO[0028] loadtestuser239 Reading 1st notebook
INFO[0028] loadtestuser112 Reading 1st notebook
INFO[0028] loadtestuser117 Reading 1st notebook
INFO[0028] loadtestuser159 Reading 1st notebook
INFO[0029] loadtestuser189 Reading 1st notebook
Final results
After the load test is completed, a summary of the test results is produced. This includes the time taken to complete the test and other statistics on the actual test.
Here is the final section of the results:
running (04m17.1s), 000/300 VUs, 300 complete and 0 interrupted iterations
Http_reqs: gives the requests per second. In this case, it is 12.11 r/s. This is because it includes first-time requests. during the initial run, the codes are synced with GitHub and include idle time. This could also happen due to initial server spawns. In other cases, there could be as many as 70 requests per second.
Vus_max: maximum virtual users supported.
Iteration: 300 iterations. This could be n-fold as well.
Http_req_waiting: 3.35 s on average wait time during the round trip.
Running Individual Tests
The final step in this process is the testing of an individual user to see some key metrics around the usage of the JupyterHub environment.
The key metrics include:
Login-timed test
Notebook roundtrip
Notebook functions: start kernel, run all cells
This is performed by using a headless browser tool. In our case, we use PhantomJS as we are familiar with it. There are other tools to consider that will perform the same or even better.
Before we do the test, we must define the performance required from the page loads. The performance metrics include:
Load the notebook within 30 seconds.
Basic Python code execution must be completed within 30 seconds of the start of the execution.
In exceptional cases, due to the complexity of the code, it must not exceed 3 minutes of execution. This applies to the core data science code. There could be further exceptions to this rule depending on the notebook in advanced cases.
In the next technical paper, we will explore how to run individual tests.
It was a bright Friday afternoon in July in the lovely city of Boston, Massachusetts. We had close to 30 participants ready to race their cars. Sounds straight out of a Fast and Furious movie, right? What if I told you that these participants were actually racing autonomous vehicles on a virtual race track created on Amazon Sagemaker?
This project was the brainchild of Kristina, Pawan, and Sathwik. Earlier this year, they attended a workshop on reinforcement learning for autonomous vehicle racing conducted by Amazon. When they took the idea of conducting workshops in collaboration with Amazon Web Services (AWS) to Ram, the CEO of Colaberry, he was very enthusiastic. And that was what led me to create Colaberry – AWS Deepracer course on Deepracer for quite a few days along with a fellow intern, Manaswi.
We found that Deepracer can be an important tool to introduce people to the world of Machine Learning: they can see its real-world applications and use it in a fun way. The virtual vehicle applies the ML reward function by rewarding an autonomous vehicle for following the track properly and penalizing it for “bad” behaviors like going off track. This reward function utilizes Python programming, and advanced participants in the AWS Deepracer league even prepared their own codes using an exhaustive list of input parameters. For those unfamiliar with coding, the platform also had pre-created reward functions.
As an introductory workshop, we refrained from using complex code, instead explaining the pre-created codes. Advanced users or those seeking to participate in the community race the next day could use the advanced notebooks available. These notebooks provided a deeper understanding of hyperparameters, reward graphs, and other tips that can help a person succeed in the race.
Are you ready to race?
This post was written by Badri Yashwant Kuram, a Colaberry Intern.
Our Program
Colaberry has been providing one-of-a-kind, career-oriented training in data analytics and data science since 2012. We offer instructor-led onsite and online classes. Learn with us in person on our campus in Plano, Texas, or remotely from the comfort of your home. We have helped over 5,000 people to transform their lives with our immersive boot camp-style programs.
In-Demand Skills
Colaberry training programs equip you with in-demand tech and human skills. Our up-to-date lessons and carefully crafted curriculum set you up for success from day one. Throughout the training and the job search, our mentors will support and guide you as you transition into a fast-paced and exciting field of data analytics and data science.
Project-Based Learning
Our programs integrate projects that are based on real-world scenarios to help you master a new concept, tool, or skill. We work with you to build your portfolio to showcase your skills and achievements.
Award Winning Learning Platform
You will be learning using our homegrown technology platform Refactored AI which is recognized as the “Most Promising Work of the Future Solution” in global competition by MIT SOLVE. Our platform also received General Motors’ “Advanced Technology” prize and McGovern Foundation’s “Artificial Intelligence for Betterment of Humanity” prize.
Placement Assistance
Colaberry’s program, platform, and ecosystem empower you with skills that you need to succeed in your job interviews and transition into high-paying careers. Over 1/3rd of Colaberry graduates receive job offers after their first in-person interview. We provide you with continuous mentoring and guidance until you land your job, as well as post-placement support for twelve months so that you not only survive but thrive in your career.
Financial Aid
At Colaberry, we strive to create opportunities for all. We work with each individual to ensure the cost of the training does not hold them back from becoming future-ready. We offer various payment plans, payment options, and scholarships to work with the financial circumstances of our learners.
Military Scholarship
Colaberry is committed to supporting men and women who have served our country in uniform. As part of this commitment, we offer Military Scholarships to enable active-duty and retired military members to transition into civilian life. We have already helped numerous veterans by creating a pathway to rewarding and exciting careers in data science and data analytics. We hold alumni events and provide an extensive support system and a strong community of veterans to help our students succeed. Contact our enrollment team to find out more about how we can help you.
5 Ways To Ace Your Business Intelligence Interview – Part 5
Acing your next Business Intelligence (BI) interview isn’t easy but with the tips we’ve shared in the previous four mini-blogs in this series, you should be well on your way. Our last mini-blog covers the most important preparation you can make: completing Colaberry’s online data analytics courses.
5) Complete Colaberry’s Online Data Analytics Course
Complete Colaberry’s Online Data Analytics Course
Colaberry’s online data analytics and data science courses allow you to focus on your future career while still being able to spend time with your family and loved ones. Expert instructors help you to get through real-world situations and make sure you are ready for your business intelligence career.
4 Benefits of Completing Colaberry’s Online Data Analytics Program
1) You can spend time with your family while pursuing your dreams.
3) You’ll gain access to a supportive student and alumni network that will always be there to assist you in preparing for your next Business Intelligence interview.
4) You can transition to a new career fast – and protect yourself from future obstacles like AI and automation.
Colaberry is completely focused on helping you achieve your dreams. Listen to Adie Mack, one of our successful graduates, who transitioned from teaching to an amazing career as a Business Intelligence Analyst:
Complete Colaberry’s Online Data Analytics Course
Our Program
Colaberry has been providing one-of-a-kind, career-oriented training in data analytics and data science since 2012. We offer instructor-led onsite and online classes. Learn with us in person on our campus in Plano, Texas, or remotely from the comfort of your home. We have helped over 5,000 people to transform their lives with our immersive boot camp-style programs.
In-Demand Skills
Colaberry training programs equip you with in-demand tech and human skills. Our up-to-date lessons and carefully crafted curriculum set you up for success from day one. Throughout the training and the job search, our mentors will support and guide you as you transition into a fast-paced and exciting field of data analytics and data science.
Project-Based Learning
Our programs integrate projects that are based on real-world scenarios to help you master a new concept, tool, or skill. We work with you to build your portfolio to showcase your skills and achievements.
Award Winning Learning Platform
You will be learning using our homegrown technology platform Refactored AI which is recognized as the “Most Promising Work of the Future Solution” in global competition by MIT SOLVE. Our platform also received General Motors’ “Advanced Technology” prize and McGovern Foundation’s “Artificial Intelligence for Betterment of Humanity” prize.
Placement Assistance
Colaberry’s program, platform, and ecosystem empower you with skills that you need to succeed in your job interviews and transition into a high-paying career. Over 1/3rd of Colaberry graduates receive job offers after their first in-person interview. We provide you continuous mentoring and guidance until you land your job, and provide you post-placement support for twelve months so that you not only survive but thrive in your career.
Financial Aid
At Colaberry, we strive to create opportunities for all. We work with each individual to ensure the cost of the training does not hold them back from becoming future-ready. We offer various payment plans, payment options, and scholarships to work with the financial circumstances of our learners.
Military Scholarship
Colaberry is committed to supporting men and women who have served our country in uniform. As part of this commitment, we offer Military Scholarships to enable active-duty and retired military members to transition into civilian life. We have already helped numerous veterans by creating a pathway to rewarding and exciting careers in data science and data analytics. We hold alumni events and provide an extensive support system and a strong community of veterans to help our students succeed. Contact our enrollment team to find out more about how we can help you.
5 Ways To Ace Your Business Intelligence Interview – Part 4
Human skills are some of the most important (and often forgotten) tools to help you succeed in your business intelligence interviews. In this mini-blog, we’re sharing 4 core human skills you can master to improve the odds of you landing your next data science role.
In the age of automation, business information and intelligence, if utilized strategically, has the power to propel a business far above its competitors as well as exponentially boost brand awareness and profitability. This makes business intelligence roles extremely valuable to corporations.
The BI and analytics industry is expected to soar to a value of $26.50 billion by the end of 2021. Moreover, companies that use BI analytics are five times more likely to make swifter, more informed decisions. In this second post of our 5 part mini blog series, we will show you how preparation will help you ace your next business intelligence interview.
According to Forbes, “dashboards, reporting, end-user self-service, advanced visualization, and data warehousing are the top five most important technologies and initiatives strategic to BI in 2018.” How do your skills line up with these initiatives? Are you ready to succeed in your next business intelligence interview?
5 Simple Ways To Ace Your Next Business Intelligence Interview
Career automation and AI are streamlining job roles and cutting out unnecessary positions so job hunters looking to ace their business intelligence interview need to adapt to the Future of Work and presentation skills that are both practical and necessary. The Future of Work isn’t just about knowing tech or being able to code; the most necessary skills are those that are often overlooked by applicants when it comes to interviewing time. These four skills are critically important and will help you stand out and ace your next business intelligence interview:
Grit
Emotional Intelligence
Critical/Creative Thinking
Problem-Solving
Human skills show that you can “go the long haul” and not give up when situations get challenging. These important skills also show that you are flexible and can think outside of the box. Moreover, presenting these skills by describing how you work on projects and displaying your problem-solving skills is extremely important. Colaberry’s programs help you develop all of the aforementioned skills.
Stay tuned for the next mini-blog in this series. We will be releasing a short blog twice a week to help you ace your next business intelligence interview!
Our Program
Colaberry has been providing one-of-a-kind, career-oriented training in data analytics and data science since 2012. We offer instructor-led onsite and online classes. Learn with us in person on our campus in Plano, Texas, or remotely from the comfort of your home. We have helped over 5,000 people to transform their lives with our immersive boot camp-style programs.
In-Demand Skills
Colaberry training programs equip you with in-demand tech and human skills. Our up-to-date lessons and carefully crafted curriculum set you up for success from day one. Throughout the training and the job search, our mentors will support and guide you as you transition into a fast-paced and exciting field of data analytics and data science.
Project-Based Learning
Our programs integrate projects that are based on real-world scenarios to help you master a new concept, tool, or skill. We work with you to build your portfolio to showcase your skills and achievements.
Award Winning Learning Platform
You will be learning using our homegrown technology platform Refactored AI which is recognized as the “Most Promising Work of the Future Solution” in global competition by MIT SOLVE. Our platform also received General Motors’ “Advanced Technology” prize and McGovern Foundation’s “Artificial Intelligence for Betterment of Humanity” prize.
Placement Assistance
Colaberry’s program, platform, and ecosystem empower you with skills that you need to succeed in your job interviews and transition into high-paying careers. Over 1/3rd of Colaberry graduates receive job offers after their first in-person interview. We provide you continuous mentoring and guidance until you land your job, and provide you post-placement support for twelve months so that you not only survive but thrive in your career.
Financial Aid
At Colaberry, we strive to create opportunities for all. We work with each individual to ensure the cost of the training does not hold them back from becoming future-ready. We offer various payment plans, payment options, and scholarships to work with the financial circumstances of our learners.
Military Scholarship
Colaberry is committed to supporting men and women who have served our country in uniform. As part of this commitment, we offer Military Scholarships to enable active-duty and retired military members to transition into civilian life. We have already helped numerous veterans by creating a pathway to rewarding and exciting careers in data science and data analytics. We hold alumni events and provide an extensive support system and a strong community of veterans to help our students succeed. Contact our enrollment team to find out more about how we can help you.