Skip to main content

Command Palette

Search for a command to run...

How to Create Stunning Charts with Matplotlib

A Comprehensive Guide on using matplotlib (visualization library) to create stunning charts with image added and annotations

Updated
12 min read
How to Create Stunning Charts with Matplotlib
E

I am a data analyst, data scientist and a python Machine learning Engineer

Introduction

The purpose of this article is to provide readers with a concise guide to using Matplotlib in Python for creating customized charts and plots, covering data handling, chart customization, and even image integration.

Prerequisite?

Make sure you have the Python interpreter installed on your machine and possess a foundational understanding of Python.

What Plot We Are Making?

In this tutorial, we'll be crafting a horizontal bar chart, that juxtaposes the objectives of the leading goal-getters in the English Premier League (EPL), German League (Bundesliga), Italian League (Serie A), French League (Ligue 1), and the Spanish League (La Liga) for the 2022/23 season.

top 20 goal scorers across europe's top 5 leagues

Why are we particulate in this?

Columns or bar charts are excellent for data comparison, making them an optimal choice for this task. The graph can serve to settle football debates and address inquiries like "Who emerged as the top goalscorer in the top 5 leagues in the 2022/23 season?". Nevertheless, bear in mind that additional metrics such as "who claimed the title of the highest goal scorer across all of Europe" could be taken into account for a more comprehensive analysis.

If the graph you intend to create isn't a column chart, then rest assured that the insights you gain from this tutorial can be applied to other graphs you'll construct using Matplotlib.

Necessary Libraries

In the making of this bar chart, packages like pandas for data handling, manipulation and cleaning, matplotlib for visualizing the chart, Pillow/PIL for image editing and Highlight_text for incorporating headings and subheadings to our beautiful figures

If you are using the popular Jupyter Notebook or the cloud programming integrated development environment (IDE), Google Collab, the above libraries can be installed by running the below lines of code in the cell or terminal.

pip install pandas matplotlib pillow highlight_text

Data Gathering and Cleaning

Information is crucial in the field of data science because you can't achieve much without it.

In this football-related project, finding the data we need isn't difficult. We could simply search online for the top goal scorers in Europe's top 5 leagues. However, to better show how data cleaning works, I decided to look at each league separately and gather information about their top goal scorers. Wikipedia is a good place to find this data.

After my research, I found the datasets on Wikipedia pages: Bundesliga, Premier League, Ligue 1, Serie A and La Liga. This dataset includes the names of the players and how many goals they scored in their respective leagues. So, to work with this data, we need to load them into the IDE using a function called read_html() from the pandas library to extract it from the Wikipedia pages.

# Bundesliga
bundesliga = pd.read_html("https://en.wikipedia.org/wiki/2022%E2%80%9323_Bundesliga")[12]

# English Premier League (EPL)
epl = pd.read_html('https://en.wikipedia.org/wiki/2022%E2%80%9323_Premier_League')[6]

# Ligue 1
ligue1 = pd.read_html("https://en.wikipedia.org/wiki/2022%E2%80%9323_Ligue_1")[8]

# Serie A
seriea = pd.read_html("https://en.wikipedia.org/wiki/2022%E2%80%9323_Serie_A")[10]

# La Liga
laliga = pd.read_html("https://en.wikipedia.org/wiki/2022%E2%80%9323_La_Liga")[7]

The basic code is pd.read_html(link)[column_number] which loads the selected column_number (remembering that Python starts its count from 0) from the provided link.

There are two things to observe from this collected data:

  1. The data are all in different dataframes

  2. They all have a rank column which could be dropped as it is not needed

  3. Even if we try merging, the "Goals column name" differs in each data frame and is not the same for proper concatenation.

Please do note, that data might not always come so clean and ready to use, that's why a good knowledge of pandas is required to properly clean the data for your use.

What then do we do?

So we have to edit the Goal column name to a generic one. something like 'Goals' is okay, remove the Rank column and just add the league column to each data frame. After which we can then join them to form a whole.

The code below should better explain this concept

# Remove the 'Rank' column from each DataFrame
bun = bun.drop(columns='Rank')
epl = epl.drop(columns='Rank')
ser = ser.drop columns='Rank')
laliga = laliga.drop(columns='Rank')
lig1 = lig1.drop(columns='Rank')

# Add a 'League' column to each DataFrame
bun['League'] = 'Bundesliga'
epl['League'] = 'Premier League'
ser['League'] = 'Serie A'
laliga['League'] = 'La Liga'
lig1['League'] = 'Ligue 1'

# Define the new column order
new_columns = ['Player', 'Club', 'Goals', 'League']

# Create a list of the top 5 leagues' DataFrames
top5 = [bun, epl, lig1, ser, laliga]

# Rename the columns of each DataFrame
for i in top5:
    i.columns = new_columns

# Concatenate the DataFrames into a single DataFrame
required_data = pd.concat([epl, ser, laliga, lig1, bun], ignore_index=True)

If you observe closely in this article, we have made use of three similar words (concat, join and merge), which are completely different functions in Python. Since we decided to add each data frame vertically, we made use of the concat() function, the join and merge functions can do something similar, but visit the page to better understand when to use each of them.

A glimpse at the data frame shows that the "Goals" column is not arranged. So we arrange it in ascending order of Goals and extract the top 20 players in the data frame, this will make the plot easy.

# Sort data by 'Goals' in ascending order.
sorted_req_data = required_data.sort_values(by='Goals', ascending=True)

# Get the top 20 goal scorers.
sorted_req_data = sorted_req_data.tail(20)

# Extract player names.
y = sorted_req_data['Player']

# Extract corresponding goal values.
x = sorted_req_data['Goals']

The first line of code arranges the data from the lowest to the highest number of goals. Since we set the ascending function in line 1 to True, then we select the last 20 rows which indicate players with the highest number of goals (by using the tail() function) from the sorted DataFrame, then we extract the 'Player' column from the sorted DataFrame and assign it to the variable 'y' and also extract the 'Goals' column from the sorted DataFrame and assign it to the variable 'x'. Hence we can move on to making the plot

Making the Plot

After the exploration and manipulation of a given dataset, it is of essence to note that the layman will barely understand all that has been achieved, that's where visualization comes in. Various types of plots such as box plots, histograms, heatmaps, scatter plots, bar charts, pie charts, line charts and many others could be used. But for the sake of this article, we will stick to the bar plot.

In a bar plot, it is of essence to note the three distinct words: Figure, Axis and Axes are terms commonly used in the context of plotting and visualization but are different.

A figure typically refers to the entire window or canvas in which a plot or graph is displayed on the other hand, **Axes (**which is the plural form of Axis) represent the complete coordinate system used for plotting data, including both horizontal and vertical scales and labels. The Axis could be thought of as an individual x-axis or y-axis in a plot, in which points can be marked. Note that understanding these terms is crucial for creating and customizing visualizations when working with libraries like Matplotlib in Python.

After explaining the terms we go ahead to create a figure, as well as placing an axes (horizontal plot) on the created figure by running the code below:

#import the pyplot function of matplotlib
import matplolib.pyplot as plt

#Create the figure by specifying the size and facelor
fig=plt.figure(figsize=(8,11),facecolor='#dbbba7')

#Creat an axis on the plot to hold the barplot
ax=plt.subplot(1,1,1,facecolor='#4f49c9')

#Create an horizontal bar plot with the players category on the y axis and goals on the x axis
ax.barh(y,x,height=0.7)

We created a figure (fig) for the plot with a specified size of 12x11 inches and sets the background color to a light brownish tone by inputting the hexadecimal format ('#dbbba7'). The plt.subplot() is a function from Matplotlib's pyplot module used to create subplots within a figure, where the three integer arguments (1, 1, 1) specify the layout of subplots in rows, columns, and the current subplot's position. The axes use facecolor='#5f58e8' which sets the background color of the subplot to a specific shade of blue. You can change this color to customize the subplot's background or even the figure background to suit the color of your interest. The Google Color Picker page can aid your selection. The output of the above code should look somewhat like this:

Where do we go from here?

From our plot, it is observed that the plot is situated in a square-like (or rectangular-like: depending on the figsize) shape with small marks that label each point. The former is thought of as the Spine and the latter is Tick.

The axis spine is the central line or the baseline along which data points are plotted. It represents the numerical scale of the data being visualized. On the other hand, Ticks are the small marks or divisions along the axis spine that help to quantify and label specific points or intervals on the axis which serve as reference points for understanding the values associated with the data.

We can improve our plot by getting rid of these two. On a regular Cartesian coordinate system, such as the one used in most graphs, there are usually two axis spines - one horizontal (X-axis) and one vertical (Y-axis), but our plot has 4 of them situated at both the vertical sides (left and right) and the horizontal (top and bottom) sides. The Matplotlib page can explain better the difference between them.

So the below code can be used to remove both the Spines and the ticks

# Remove the spines of each corner
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)

# Remove ticks in the x and y axes
ax.tick_params(left=False, bottom=False)

Next, we'll add labels to the bars. To do this, we'll go through each player in the data frame and place the "Goals" value next to the corresponding player's bar. Matplotlib provides the axes.annotate() function for this purpose. The code with comments can be seen below:

# Loop through the sorted data to annotate the goals for each player's bar
for index, goals in enumerate(sorted_req_data["Goals"]):
    # Use the annotate function to add text to the plot
    ax.annotate(
        # Specify the position of the annotation (x, y)
        xy=(goals, index),
        # The text to be displayed (in this case, the number of goals)
        text=f"{goals}",
        # Offset the text position slightly to the right
        xytext=(10, 0),
        # Set the font size for the annotation
        size=13,
        # Specify the coordinate system for the text position (offset from the data point)
        textcoords="offset points",
        # Horizontal alignment of the text (centered in this case)
        ha="center",
        # Vertical alignment of the text (centered in this case)
        va="center",
        # Make the text bold
        weight="bold"
    )

This code iterates through the sorted data and adds annotations to the bars in a plot. It specifies the position, text, font size, and formatting for the annotations using the annotate() function from Matplotlib.

From the plot, we can see that the interval of the tick labels on the X-axis appears too frequent. To adjust the interval of the tick labels on the X-axis, you can use the set_major_locator function from Matplotlib's ticker module. In the code you provided, it sets the tick intervals to be every 10 units on the X-axis. This means that tick labels will appear at intervals of 10 units along the X-axis.

#change X axis tick labels interval with the ticker function
ax.xaxis.set_major_locator(ticker.MultipleLocator(10))

Adding Headings and Subheadings

So we can add a main heading to the plot using fig_text from the highlight_text library and then specify the coordinates on the plot with a font type, font size and color font. We will also add a sub-heading using the same function but now with a smaller font size as the sub-heading provides additional information about the data being displayed.

#import fig_text to add and style headings and sub headings in the plot
from highlight_text import fig_text

# Add the main heading to the plot
fig_text(x=-0.02, y=0.92,
         s='<EUROPE TOP 5 LEAGUE 2022/2023 Season>',  # Main heading text
         weight='bold',  # Make the text bold
         size=20,  # Set the font size to 20
         highlight_textprops=[{"color": 'Black'}]  # Highlight the text in black
        )

# Add the sub-heading just below the main heading
fig_text(x=-0.02, y=0.89,
         s='<OVERALL GOALS | LEAGUE AND CUP MATCHES | TOP 20 PLAYERS ONLY>',  # Sub-heading text
         size=13,  # Set the font size to 13
         highlight_textprops=[{"color": 'Black'}]  # Highlight the text in black
        )

Adding an Image

In adding images to an existing plot the below tips are necessary:

Import the Required Libraries: The PIL library which is python imaging library will be used. We will import the Image function from the library to open, manipulate, and save images in different formats.

Load the Image: Load the image you want to add to your plot using the Image.open() function from PIL. Specify the path to your image file.

Create an axes: You create an axis for the logo within the existing figure by specifying the positions and the size of the logo

Display the image: Display the 'logo' image on the created axis.

Turn off the axis: This is crucial to allow the image to appear without any axis markings (i.e. ticks and spines) on the plot.

#Copy the path of the intended logo in the file variable
file='top5.png'

#Open the image
logo=Image.open(file)

#Creat the axis of the image to add to the fig
logo_ax=fig.add_axes([0.825,0.825,0.125,0.125])

#Show the logo on the axis
logo_ax.imshow(logo)

#Put off the axis  
logo_ax.axis("off")

top 20 goal scorers across europe's top 5 leagues

The image can also be saved on the system by using the plt.savefig function. You will need to enter the name and the format in which it should be saved.

# Save the current plot as a JPG format
plt.savefig('top 5 league chart.jpg')

Here is the full code for this use case:

import pandas as pd
import matplotlib.pyplot as plt
from highlight_text import fig_text
from PIL import Image
import matplotlib.ticker as ticker

bun = bun.drop(columns='Rank')
epl = epl.drop(columns='Rank')
ser = ser.drop columns='Rank')
laliga = laliga.drop(columns='Rank')
lig1 = lig1.drop(columns='Rank')

bun['League'] = 'Bundesliga'
epl['League'] = 'Premier League'
ser['League'] = 'Serie A'
laliga['League'] = 'La Liga'
lig1['League'] = 'Ligue 1'

new_columns = ['Player', 'Club', 'Goals', 'League']

top5 = [bun, epl, lig1, ser, laliga]

for i in top5:
    i.columns = new_columns

required_data = pd.concat([epl, ser, laliga, lig1, bun], ignore_index=True)

sorted_req_data = required_data.sort_values(by='Goals', ascending=True)
sorted_req_data = sorted_req_data.tail(20)
y = sorted_req_data['Player']
x = sorted_req_data['Goals']

# Create plot
fig=plt.figure(figsize=(8,11),facecolor='#dbbba7')
ax=plt.subplot(1,1,1,facecolor='#4f49c9')
ax.barh(y,x,height=0.7)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)

# Annotate the bars
for index, goals in enumerate(sorted_req_data["Goals"]):
    ax.annotate(
        xy=(goals, index),
        text=f"{goals}",
        xytext=(10, 0),
        size=13,
        textcoords="offset points",
        ha="center",
        va="center",
        weight="bold"
    )

ax.xaxis.set_major_locator(ticker.MultipleLocator(10))

# Add the main heading
fig_text(x=-0.02, y=0.92,
         s='<EUROPE TOP 5 LEAGUE 2022/2023 Season>',
         weight='bold',
         size=20,
         highlight_textprops=[{"color": 'Black'}]
        )

# Add the sub-heading
fig_text(x=-0.02, y=0.89,
         s='<OVERALL GOALS | LEAGUE AND CUP MATCHES | TOP 20 PLAYERS ONLY>',
         size=13,
         highlight_textprops=[{"color": 'Black'}]
        )

# Add a Logo
file='top5.png'
logo=Image.open(file)
logo_ax=fig.add_axes([0.825,0.825,0.125,0.125])
logo_ax.imshow(logo)  
logo_ax.axis("off")


# Save the current plot as a JPG format
plt.savefig('top 5 league chart.jpg')

The code can also be gotten from here.

Conclusion

In this article, we've shown you how to make your data visualizations in Python more interesting and informative. We've explained how to add pictures, like logos or icons, to your plots, and even how to control their placement.

By following these steps, you can make your charts and graphs not only informative but also visually appealing. Combining data and images allows you to create more engaging visualizations for different purposes, whether it's analyzing data, making presentations, or creating reports. These techniques can help you improve your data visualization skills and make your charts stand out.

Thanks for reading. If you do find this article beneficial, do well to react, share and follow to unravel the world of python together.

A

Very interesting Keep it up

1
E

Thanks alot bro. I really do hope you enjoyed it

E

Nice job boss

1
E

Thanks tech bro ✌

I

A data well cleaned, analyzed, and visualized.

1
E

Thanks bro ✌😉

W

Nice one bro keep it up

1
E

Thanks alot bro I hope you had fun reading

G
Great Pam2y ago

Prolific penmanship, looking forward to more articles in the same vein.

1
E

Thanks alot man, I hope you enjoyed it.

J

Great Job!!! But I have some questions? 😬 😂

1
E

Ride on. I'm willing to answer

E

Great job 👍

2
E

Thanks alot bro.