Data visualization is an essential step in quantitative analysis with Python.
There are many tools at our disposal for data visualization and the topics we will cover in this guide include:
- Matplotlib
- Pandas
- Time Series Visualization
- Seaborn
- Plotly & Dash
This article is based on notes from this course on Python for Financial Analysis and Algorithmic Trading. This guide is certainly not comprehensive with regard to each data visualization library, instead it aims to provide an overview of the most important capabilities.
1. Matplotlib
Let's get started with the grandfather of data visualization libraries: matplotlib.
Matplotlib has established itself as the benchmark for data visualization and is a robust and reliable tool.
As this Python for Finance textbook describes:
It is both easy to use for standard plots and flexible when it comes to more complex plots and customizations. In addition, it is tightly integrated with NumPy and the data structures that it provides.
Matplotlib is modeled after MatLab's plotting capabilities, and creates static image files of almost any plot type.
Let's look at a few of the main types of plots we can create with Matplotlib from their gallery:
Let's start with a simple example with 2 NumPy arrays.
In this example we're setting x
as a linearly spaced NumPy array, with 10 numbers between 0 and 10 exclusive.
We then set y
to x**2
We can plot with Matplotlib in two different ways:
- Functional method
- Object-oriented method
Stay up to date with AI
Functional Method
With the functional method we just call plt.plot()
and then pass in x
and y
.
# functional method
We can also create 2 subplots with plt.subplot()
:
# create 2 subplots
plt.subplot(1,2,1)
plt.plot(x,y,'r')
plt.subplot(1,2,2)
plt.plot(x,y,'b')
Object-Oriented Method
We can also create plots in Matplotlib in an object-oriented way.
To do this we first create a fig
object, then we add axes the canvas, and finally we plot on the axes.
# create a figure object
fig = plt.figure()
# add axes to the canvas
# left, bottom, width, height
axes = fig.add_axes([0.1,0.1,1,1])
# next we plot on the axes
axes.plot(x,y)
axes.set_xlabel('X Label')
axes.set_ylabel('Y Label')
axes.set_title('OOP Method')
We can also create a plot within our canvas by passing in a list of values to fig.add_axes()
—the list we're passing in is: left, bottom, width, and height.
# create plot within canvas
fig = plt.figure()
axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])
axes2 = fig.add_axes([0.2, 0.4, 0.5, 0.4])
axes1.plot(x,y)
axes2.plot(y,x)
We can add a legend by specifying labels in ax.plot()
for ax.legend()
to reference.
# add legend
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.plot(x, x**2, label='X Squared')
ax.plot(x, x**3, label='X Cubed')
ax.legend()
Finally we can save the figure with fig.savefig()
and then passing in the location and file type to save to.
# save figure
fig.savefig('my_plot.png')
Change Plot Appearance
We can change our plot's appearance in many ways, but here are a few examples:
# change color of plot
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.plot(x,y,color='green')
# change linewidth
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.plot(x,y,color='purple', linewidth=10, linestyle='--')
That's it for our introduction to Matplotlib, but if you want to see more examples check out these tutorials.
2. Pandas
The main purpose of pandas is data analysis, but the libray also has some data visualization capabilities.
Pandas uses matplotlib on the backend through simple .plot
calls. The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
.
Pandas does, however, have a limited scope of plot types and they are all static.
Let's look at a few examples:
import pandas as pd
import numpy as np
# Basic Plotting
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))ts = ts.cumsum()ts.plot()
We can also easily create a histogram:
# histogram
ts.plot.hist(bins=30)
3. Time Series Visualization
Before moving on to other libraries, let's take a look at time-series visualization with pandas and Matplotlib.
To demonstrate this let's downloaded daily stock data for TSLA for 1 year from Yahoo Finance.
When we read in our data with pd.read_csv()
we want to pass in index_col='Date'
, and parse_dates=True
.
It's important to note that we don't want to plot the entire DataFrame since the Volume column is on such a different scale than the other columns.
Let's instead plot the adjusted close and volume on the their own with df['Adj Close'].plot()
and df['Volume'].plot()
.
We can just plot a specific month by setting xlim
argument to a list or tuple.
# plot January 2019
df['Adj Close'].plot(xlim=['2019-01-01', '2019-02-01'])
4. Seaborn
Another common visualization library is Seaborn, which is a Python data visualization library based on Matplotlib.
Seaboard provides a high-level interface for drawing attractive and informative statistical graphics.
Here are a few examples from their Gallery:
Let's look at an example of visualizing linear relationships with regression.
Two main functions in Seaborn are used to visualize a linear relationship as determined through regression. These functions, regplot()
and lmplot()
are closely related, and share much of their core functionality.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
sns.regplot(x="total_bill", y="tip", data=tips);
5. Plotly & Dash
All of the plots we've seen so far are static—once you create them you can't interact with the plot in any way. This is what Plotly solves.
Plotly is both a company and an open source library.
Plotly the company focuses on data visualization for business intelligence, and the open source library is a general data visualization library that specializes in interactive visualizations.
Plotly has libraries for JavaScript, React, R, and Python - but we'll stick with Python in this guide.
Using the plotly python library creates interactive plots as .html files.
Users can interact with these plots (zoom in, select, hover, etc) - but one of the limitation is that these plots can't be connected to changing data sources.
Once the plot is generated, the data is essentially locked-in at that time, and in order to regenerate a plot to see updates you need to re-run the .py script.
This is where Plotly's Dash comes in.
Often users want plots to be able to interact with each other, interact with components, or have the plot update in real time.
To do this, we need a dashboard.
Dash is an open-source library that lets you create a full dashboard with components, interactivity, and multiple plots.
Instead of creating a .html file, Dash produces a dashboard web application at your local host, which you can then visit and interact with.
Since Dash renders a full web app we can also deploy them online.
Here's an example from their Github of a Dash app that's styled to look like a PDF report:
And here's an example a Dash app for forex trading:
Summary: Data Visualization with Python
As we've seen, Python has many data visualization libraries including Matplotlib, Pandas, Seaborn, and Plotly.
Most of these are static visualization libraries, but the open-source library Plotly lets you create interactive images and and Dash lets you create dashboard web applications.