Basic data manipulation in Python…

In this post, we will deal with data from ECDC and we will explain basic data manipulation in Python with the Pandas package.

In our day, data is everywhere in enormous size and depth. Data science is an emerging field that penetrates every aspect of our life and, lately, it has proved to be an extraordinary weapon for predicting infections from Covid-19 and organizing strategies to limit the damage.

To import Pandas and Matplotlib packages we code:

import pandas as pd
import matplotlib.pyplot as plt

We download the excel file locally from ECDC site and open it using the read_excel function of Pandas library. We have named the file as data.xls in our case.

df=pd.read_excel("data.xlsx", engine="openpyxl")

We can first explore the data and the columns of the dataframe df:

We observe the columns of the dataframe — in our case, we will use the columns: dateRep, cases and deaths. Additionally, the name of the country is stored in column countriesAndTerritories.

We next select ‘Italy’ as the country under study. A new column is created named DateTime of type datetime where we store the day. In the following, we create a new dataframe with the name df_italia which is the same as the dataframe df_italia_sorted but it is sorted according to the column DateTime

df_italia=df[df.countriesAndTerritories=='Italy']
df_italia['DateTime']=pd.to_datetime(df_italia['dateRep'],format="%d/%m/%Y")

#We sort according to DateTime
df_italia_sorted=df_italia.sort_values(by='DateTime')

df_italia_selected=df_italia_sorted[df_italia_sorted.month>4]

We are interested in data after the month of April (i.e., May, June, July, August, … etc) so we choose to filter using the column month and create a new dataframe df_italia_selected.

Since the data in columns cases and deaths may have great variation, it is practical in order to understand the trend to use a moving average. We choose a moving average of seven days and we create two new columns (Moving Average Cases and Moving Average Deaths) where we store the average values of cases and deaths.

#Calculate moving average

df_italia_selected['Moving Average Cases']=df_italia_selected.cases.rolling(7,min_periods=1).mean()
df_italia_selected['Moving Average Deaths']=df_italia_selected.deaths.rolling(7,min_periods=1).mean()

We now plot the cases and deaths as functions of time. We choose the red color for cases and blue for deaths. It is useful to plot cases and deaths in the same figure with common x-axis in order to understand possible connection and relation. So, we use the subplots function and first create figure fig and axis ax1 (this will be the axis for the cases and it will be the left axis). We then create ax2 using twinx function. The values for deaths will be our right axis. A dashed line is used for the average values.

#Figure

fig, ax1=plt.subplots()
color1='tab:red'

ax1.plot(df_italia_selected['DateTime'], df_italia_selected['cases'], color=color1)

ax1.plot(df_italia_selected['DateTime'], df_italia_selected['Moving Average Cases'], color=color1,linestyle='dashed')

ax1.set_xlabel('Data')

ax1.set_ylabel('Cases',color=color1)
ax1.tick_params(axis='y',labelcolor=color1)

locs, labels=plt.xticks()
plt.setp(labels,rotation=90)

ax2=ax1.twinx() #instantiate a second axes that shares the same x-axis
color2='tab:blue'

ax2.plot(df_italia_selected['DateTime'], df_italia_selected['deaths'], color=color2)
ax2.plot(df_italia_selected['DateTime'], df_italia_selected['Moving Average Deaths'], color=color2,linestyle='dashed')

ax2.set_ylabel('Deaths',color=color2)
ax2.tick_params(axis='y',labelcolor=color2)

fig.tight_layout() #otherwise the right y-label is slightly clipped

The figure below is the program output.

Cases and deaths as a function of data for Italy

Python animations with Matplotlib

In Python, plotting graphs is straightforward — you can use powerful libraries like Matplotlib. It happens, however, that you need to visualize the trend over time of some variables – that is, you need to animate the graphs.

Luckily, it’s just as easy to create animations as it is to create plots with Matplotlib.

Matplotlib

Matplotlib – as you can read on the official site – is a comprehensive library for creating static, animated, and interactive visualizations in Python. You can plot interactive graphs, histograms, bar charts, and so on.

How to Install Matplotlib

Installing Matplotlib is simple. Just open up your terminal and run:

pip install matplotlib

Numpy

Also, if you don’t have numpy, please install it so you can follow the examples in this tutorial:

pip install numpy

How to Plot with Matplotlib

Even though this tutorial is about animations in Matplotlib, first let’s create a simple static graph of a sine wave:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 10, 0.1)
y = np.sin(x)

fig = plt.figure()
ax = plt.axes(xlim=(0, 10), ylim=(-1.1, 1.1))
diagram = plt.plot(x, y)
 
plt.show()
A basic sine wave

How to Animate with Matplotlib

To create an animation with Matplotlib you need to use Matplotlib’s animation framework’s FuncAnimation class.

For instance, let’s create an animation of a sine wave:

import numpy as np
from matplotlib import pyplot as plt
from matplotlib.animation import FuncAnimation

fig = plt.figure()
ax = plt.axes(xlim=(0, 5), ylim=(-1.5, 1.5))
line, = ax.plot([], [], lw=2)

def init():
    line.set_data([], [])
    return line,

def animate(i):
    x = np.linspace(0, 5, 500)
    y = np.sin(2 * np.pi * (x + 0.02 * i))
    line.set_data(x, y)
    return line,

sinewawe_animation = FuncAnimation(fig, animate, init_func=init, frames=200, interval=20, blit=True)
sinewawe_animation.save("Animation.gif")

plt.show()

We have:

Let’s then go through the code above in a bit more detail to better understand how animations work with Matplotlib.

Lines 1–3

Here you add the required libraries. In particular, we add the FuncAnimation class that can be used to create an animation for you.

Lines 5–7

fig = plt.figure()
ax = plt.axes(xlim=(0, 5), ylim=(-1.5, 1.5))
line, = ax.plot([], [], lw=2)

Here you first create an empty window for the animation figure. Then you create an empty line object. This line is later modified to form the animation.

Lines 9–11

def init():
    line.set_data([], [])
    return line,

Here you create an init() function that sets the initial state for the animation.

Lines 13–17

You then create an animate() function. This is the function that gives rise to the sine wave. It takes the frame number i as its argument, then it creates a sine wave that is shifted according to the frame number (the bigger it is, the more the wave is shifted). Finally, it returns the updated line object. Now the animation framework updates the graph based on how the line has changed.

def animate(i):
    x = np.linspace(0, 5, 500)
    y = np.sin(2 * np.pi * (x + 0.02 * i))
    line.set_data(x, y)
    return line,

Line 19

sinewawe_animation = FuncAnimation(fig, animate, init_func=init, frames=200, interval=20, blit=True)

This line of code puts it all together and creates the actual animation object. It simply:

  • Renders an animation to the figure fig by repeatedly calling the animate() function starting from the initial state defined by init()
  • The number of frames rendered to “one round of animation” is 200.
  • A delay between two frames is 20 milliseconds (1000ms / 20ms = 50 FPS).
  • (The blit=True makes sure only changed pieces of the plot are drawn, to improve the efficiency)

Line 21

sinewawe_animation.save("Animation.gif")

This piece of code is used to generate an animated gif (the same one I used in this tutorial to show you the animation)

Line 22

You guessed it — this line shows the animation.