Time Series Analysis & Visualization in Python

27 July 2024

0

Every dataset has its own characteristics and we use their characteristics as a feature to get insight into the data. In this article, We will discuss an important kind of dataset which is Time series data.

What Is Time Series Data

A time series data is a series of data points listed in consecutive time order or we can say time series data is a sequence of successive equal interval points in time. A time-series analysis consists of methods for analyzing time-series data in order to extract meaningful insights and other valuable characteristics of data.

Time-series data analysis is becoming very important in so many industries like financial industries, pharmaceuticals, social media companies, web service providers, research, and many more. To understand the time-series data, Visualization of the data is essential. In fact, Any type of data analysis is not complete without visualizations. Because one good visualization can provide meaningful and interesting insights into data.

Time Series Data Visualization using Python

We will use Python libraries for visualizing the data. The link for the dataset can be found here. We will perform the visualization step by step as we do in any Time -series data project.

Importing the Libraries

We will import all the libraries that we will be using throughout this article in one place so that do not have to import every time we use it this will save both our time and effort.

Numpy – A Python library that is used for numerical mathematical computation and handling multidimensional ndarray, it also has a very large collection of mathematical functions to operate on this array.
Pandas – A Python library built on top of NumPy for effective matrix multiplication and dataframe manipulation, it is also used for data cleaning, data merging, data reshaping, and data aggregation.
Matplotlib – It is used for plotting 2D and 3D visualization plots, it also supports a variety of output formats including graphs for data.

Python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Loading The Dataset

To load the dataset into a dataframe we will use the pandas read_csv() function. We will use head() function to print the first five rows of the dataset. Here we will use the ‘parse_dates’ parameter in the read_csv function to convert the ‘Date’ column to the DatetimeIndex format. By default, Dates are stored in string format which is not the right format for time series data analysis.

Python3

# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv", 
                 parse_dates=True, 
                 index_col="Date")
 
# displaying the first five rows of dataset
df.head()

Output:

            Unnamed: 0   Open   High    Low  Close    Volume  Name
Date                                                              
2006-01-03         NaN  39.69  41.22  38.79  40.91  24232729  AABA
2006-01-04         NaN  41.22  41.90  40.77  40.97  20553479  AABA
2006-01-05         NaN  40.93  41.73  40.85  41.53  12829610  AABA
2006-01-06         NaN  42.88  43.57  42.80  43.21  29422828  AABA
2006-01-09         NaN  43.10  43.66  42.82  43.42  16268338  AABA

Dropping Unwanted Columns

We will drop columns from the dataset that are not important for our visualization.

Python3

# deleting column
df.drop(columns='Unnamed: 0')

Output:

             Open   High    Low  Close    Volume  Name
Date                                                  
2006-01-03  39.69  41.22  38.79  40.91  24232729  AABA
2006-01-04  41.22  41.90  40.77  40.97  20553479  AABA
2006-01-05  40.93  41.73  40.85  41.53  12829610  AABA
2006-01-06  42.88  43.57  42.80  43.21  29422828  AABA
2006-01-09  43.10  43.66  42.82  43.42  16268338  AABA

Plotting Line plot for Time Series data.

Python3

df['Volume'].plot()

Output:

Line Plot for Time Series Data

Here, we have plotted the ‘Volume’ column data.

Now let’s plot all other columns using a subplot.

Python3

df.plot(subplots=True, figsize=(4, 4))

Output:

Cumulative plot of all the features

The line plots used above are good for showing seasonality.

Seasonality: In time-series data, seasonality is the presence of variations that occur at specific regular time intervals less than a year, such as weekly, monthly, or quarterly.

Resampling: Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter. Resampling for months or weeks and making bar plots is another very simple and widely used method of finding seasonality. Here we are going to make a bar plot of month data for 2016 and 2017.

Resample and Plot The Data

Python3

# Resampling the time series data based on monthly 'M' frequency
df_month = df.resample("M").mean()
 
# using subplot
fig, ax = plt.subplots(figsize=(6, 6))
 
# plotting bar graph
ax.bar(df_month['2016':].index, 
       df_month.loc['2016':, "Volume"], 
       width=25, align='center')

Output:

Histogram of Resample data

There are 24 bars in the graph and each bar represents a month.

Differencing: Differencing is used to make the difference in values of a specified interval. By default, it’s one, we can specify different values for plots. It is the most popular method to remove trends in the data.

Example 4:

Python3

df.Low.diff(2).plot(figsize=(6, 6))

Output:

Differentiating Time Series value

Python3

df.High.diff(2).plot(figsize=(10, 6))

Output:

Higher differentiating for Time series data

Trend In The Dataset

We can see the change in trend in our dataset, Trend helps us see where the value of data that we are considering is going upward or downward in the long run.

Python code for Trend

Python3

# Finding the trend in the "Open"
# column using moving average method
window_size = 50
rolling_mean = df['Open'].rolling\
            (window_size).mean()
rolling_mean.plot()

Output:

Trend in Time Series data

Plotting the Changes in Data

We can also plot the changes that occurred in data over time. There are a few ways to plot changes in data.

Shift: The shift function can be used to shift the data before or after the specified time interval. We can specify the time, which will shift the data by one day by default. That means we will get the previous day’s data. It is helpful to see the previous day’s data and today’s data simultaneously side by side.

In this code, .div() function helps to fill up the missing data values. Actually, div() means division. If we take df. div(6) it will divide each element in df by 6. We do this to avoid the null or missing values that are created by the ‘shift()’ operation.

Here, we have taken .div(df.Close.shift()), it will divide each value of df to df.Close.shift() to remove null values.

Python3

df['Change'] = df.Close.div(df.Close.shift())
df['Change'].plot(figsize=(10, 8), fontsize=16)

Output:

Change in close price of Time Series data

We can also take a specific interval of time and plot to have a clearer look. Here we are plotting the data of only 2017.

Python3

df['2017']['Change'].plot(figsize=(10, 6))

Output:

Year Data Zooming of Time Series data

Box Plot in Time Series Dataset

We can also use box plot to see the distribution of values in a specific column. Lets tasks an example. In this we are getting a new column named Year by using datetime. And then we are taking ‘Open’ column on Y-axis.

Python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
 
# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv", parse_dates=True)
df.drop(columns='Unnamed: 0', inplace=True)
 
df['Date']= pd.to_datetime(df['Date'])
 
# extract year from date column
df["Year"] = df["Date"].dt.year
 
# box plot grouped by year
sns.boxplot(data=df, x="Year", y="Open")

Output:

Box Plot

Time Series Analysis & Visualization in Python

What Is Time Series Data

Time Series Data Visualization using Python

Importing the Libraries

Python3

Loading The Dataset

Python3

Dropping Unwanted Columns

Python3

Plotting Line plot for Time Series data.

Python3

Python3

Resample and Plot The Data

Python3

Example 4:

Python3

Python3

Trend In The Dataset

Python code for Trend

Python3

Plotting the Changes in Data

Python3

Python3

Box Plot in Time Series Dataset

Python3

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US