From Excel to Jupyter (with boxplots!) (2024)

Why Jupyter?

In my first job as a risk analyst, Excel was the go-to tool for analyzing data and building charts for technical reports.We would extract files from the central database, copy them into Excel files, do some data processing to build meaningfultables, and then we would use Excel’s predefined charts to create visualizations.

Excel is a pretty powerful tool and there are a lot of tasks where Excel is the right tool to use. However, in my dataexploration and reporting tasks, I always ended-up missing some key functionality that would make my work much harder.On the top of the list was the ability to plot boxplots (I still get chills when I remember this…).

As a refresh, a boxplot is a standard chart for visualising distributions. Given a group of observations from a certainvariable, it depicts the 25th, 50th and 75th percentiles as a box, and it depicts the maximum and minimum values (excludingoutliers) as whiskers. The following figure shows an example:

From Excel to Jupyter (with boxplots!) (1)

This chart is very useful to compare populations and was widely used in our technical reports. Unfortunately, Excel does nothave the feature of building boxplots from raw data, and thus we had to do some gymnastics to build them. This includedcomputing the needed percentiles and the maximum and minimum without outliers, doing some tricks to build the boxplot’sboxes as a staked bar chart and then color the bars individually to hide the bars that were “supporting” the boxes and to showthe bars that made the boxes. Finally, we would add the whiskers by overlapping some lines of correct size on top of thestaked bar chart.

It was an automation mess! And it was very prone to human errors. At the time, I started to wonder whether there was a betterway. And there was! Data Science with Python was booming at the time and Jupyter Notebooks were being used ever more oftento do data exploration and reporting. So, I dug into it and started to work with these tools, leaving my hacked boxplotstime behind me.

With this post, I wish to motivate people that use mainly Excel for reporting and data exploration to broaden their horizonsand try out JupyterLab and Jupyter Notebooks. It can be a learning curve, but one that is totally worth it on the long run.

As an illustrative example, we will read data from an Excel file with daily foreign exchanges rates between the US dollarand 22 currencies and build a boxplot to compare the exchange rates for the Scandinavian currencies. The data wasdownloaded from Kaggle and both the dataand the final notebook and be consulted here.

Installing Anaconda and starting JupyterLab

Before starting to use JupyterLab, you need to have a python installation in our computer. The easiest way of doing thisis to install Anaconda. Anaconda is an open-source distribution package for python and Rthat simplifies package management and installation. The main advantage is that because Anaconda has a focus on supportingData Science work, it already comes with mist of the data exploration and plotting packages you’ll need.

In their page, they have step-by-step guides on how to install it, either for macOSor Windows. Make sure you download the Python 3.7 version.

Once the installation is done, which can take some minutes, you can launch Anaconda. After the launch, you’ll see a screenwith a grid of applications you can use. Launch the JupyterLab application by clicking on dedicated the button:

From Excel to Jupyter (with boxplots!) (2)

Anaconda will start JupyterLab and a new session will open in your default browser, and you’ll see JupyterLab’s home:

From Excel to Jupyter (with boxplots!) (3)

JupyterLab is simply a web-based interface to run your Jupyter Notebooks and manage your data and folders. If you clickon the folder icon in the top-left, you will see your home directory:

From Excel to Jupyter (with boxplots!) (4)

You can then navigate in these folders until you find the folder in which you want to work (i.e., where you want to saveyou Jupyter Notebooks and data). For this example, I have an Excel file with some data I want to analyse, and so I go tothat directory and create a new notebook by clicking on Launcher:

From Excel to Jupyter (with boxplots!) (5)

And this is the result! A new clean notebook to start:

From Excel to Jupyter (with boxplots!) (6)

You can rename the notebook by clicking File -> Rename Notebook.

Using Jupyter Notebooks

Now that you have created your first notebook, I will quickly explain what are these notebooks and how they work.According to the Jupyter Project’s website:

The Jupyter Notebook is an open-source web application that allows you to create and share documents that containlive code, equations, visualizations and narrative text

In order words, with Jupyter notebook, we have human-readable documents where we can write rich text (such as text,figures, equations, etc.), run code and see its outputs directly in the notebook. This feature makes it a great tool fordata exploration and reporting since we can process the data, build the visualizations and write the report all in oneplace!

Jupyter notebooks are saved with the .ipynb extension and are simply a sequence of cells saved in a special JSON format.A cell is a multiline text input field and Jupyter has three main types - code cells, markdown cells, and raw cells. Themost important are the code cells, where you can write and execute code, and the markdown cells, where you can write therich text. Going into detail about Markdown is out of the scope of this post, however, if you want to know more, youcan check the Wikipedia page.

To execute a cell, either a code or markdown cell, you can simply use the keyboard shortcut Shift-Enter or use the Playbottom on the toolbar. By default, cells are code cells. However, you can change the cell’s type by using the drop-downmenu of the toolbar. You can see bellow how to do this:

From Excel to Jupyter (with boxplots!) (7)

To know more about how to work with Jupyter Notebooks, I suggest Jupyter’sdocumentation.

Finally, before we move on to the part of importing the data and building a boxplot, we’ll need to import the necessarypython packages to do these analyses. Thus, create a new code cell and run the following code:

123
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns

This will import the 3 packages we’ll use until the end: pandas for reading and preparing the data and seaborn andmatplotlib for the visualization.

Loading and prepping the data with Pandas

Now we can use pandas to read the foreign exchange data from the Excel file. Recall that the data file in the samedirectory as the notebook. If you don’t create the notebook in the same directory, you’ll need to change the path toExcel file. pandas has an easy function to read Excel files, read_excel:

1234
# Read the excel data into a pandas dataframedata_df = pd.read_excel("data.xlsx", na_values=['ND'])# Print the first 51 rowsdata_df.head()

Create a new code cell and the code above. In the fist line, we are using pandas (which was imported as pd) to readthe Excel file. The variable data_df will store the pandas DataFrame that contains the data in the Excel file. DataFramesare one of the most important structures in Pandas as they can be used to store tabular data. In short, a DataFrame is tablewhere each column is a single variable and each row is a data point. It has a direct translation to the columns and rowsin Excel.

Note that the Excel file with the foreign exchange rates had some cells with the string “ND” instead of number. These werethe cases where the markets were closed and thus there is no quote for the exchange rates in that day. Because Excelrepresents null values in different ways, pandas gives us the option of specifying a list of strings that should beinterpreted as a null value. In our case, we only had the “ND” string.

Then, we use the data_df.head() command to print the first 5 rows of the DataFrame:

From Excel to Jupyter (with boxplots!) (8)

One of the advantages of using notebooks with pandas is that you just need to print a pandas DataFrame and Jupyter willautomatically format it in a nice table that you can easily investigate. Thus, with two lines of code you can importour data to pandas and have a quick view of it.

Now, we need to apply some transformations to the data in order to build the desired boxplot. The code is below:

 1 2 3 4 5 6 7 8 910111213
columns_to_keep = [ 'DENMARK - DANISH KRONE/US$', 'NORWAY - NORWEGIAN KRONE/US$', 'SWEDEN - KRONA/US$']new_column_names = ["Danish Krone", "Norwegian Krone", "Swedish Krone"]# Select columnsclean_data_df = data_df[columns_to_keep]# Change columns namesclean_data_df.columns = new_column_names# Unpivotclean_data_df = clean_data_df.melt(var_name="Currency", value_name="Rate")

Let’s analyze it step-by-step. We start by defining two lists, one with the columns we wish to keep (i.e. the threeScandinavian currencies) and one with the column names we want to appear in the boxplot. Then we create a new DataFramenamed clean_data_df which is set as the original DataFrame data_df with only the three desired columns. With the fourthcode command, we change the column names and, in the final command, we unpivot the DataFrame from a wide to a long format.In other words, in order to plot the boxplot with Seaborn, we need to have the exchange rates of all the currencies in asingle column (which will be named Rate) and one extra column indicating the currency (which will be named Currency).After running this code, we get the following result:

And now, we are ready to build our boxplot!

Plotting with Seaborn

Since we already have our DataFrame in the correct format, where the exchange rates are in the column Rate and thecurrency names are in the column Currency, we just need to use Seaborn’s function boxplot and indicate the name of thecolumn that should be used in the x-axis, the name of the column that should be used in the y-axis and the DataFrame thatshould be used to populate the plot. Then we just need to run the matplotlib command plt.show() in order to print the plot:

12
sns.boxplot(x="Currency", y="Rate", data=clean_data_df)plt.show()

After running these cells, you get the following output:

From Excel to Jupyter (with boxplots!) (9)

From Excel to Jupyter (with boxplots!) (2024)

References

Top Articles
Latest Posts
Article information

Author: Ms. Lucile Johns

Last Updated:

Views: 6288

Rating: 4 / 5 (41 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Ms. Lucile Johns

Birthday: 1999-11-16

Address: Suite 237 56046 Walsh Coves, West Enid, VT 46557

Phone: +59115435987187

Job: Education Supervisor

Hobby: Genealogy, Stone skipping, Skydiving, Nordic skating, Couponing, Coloring, Gardening

Introduction: My name is Ms. Lucile Johns, I am a successful, friendly, friendly, homely, adventurous, handsome, delightful person who loves writing and wants to share my knowledge and understanding with you.