Saturday, September 28, 2024
Google search engine
HomeLanguagesTime Series Analysis & Visualization in Python

Time Series Analysis & Visualization in Python

Every dataset has its own characteristics and we use their characteristics as a feature to get insight into the data. In this article, We will discuss an important kind of dataset which is Time series data. 

What Is Time Series Data

A time series data is a series of data points listed in consecutive time order or we can say time series data is a sequence of successive equal interval points in time. A time-series analysis consists of methods for analyzing time-series data in order to extract meaningful insights and other valuable characteristics of data. 

Time-series data analysis is becoming very important in so many industries like financial industries, pharmaceuticals, social media companies, web service providers, research, and many more. To understand the time-series data, Visualization of the data is essential. In fact, Any type of data analysis is not complete without visualizations. Because one good visualization can provide meaningful and interesting insights into data.

Time Series Data Visualization using Python

We will use Python libraries for visualizing the data. The link for the dataset can be found here. We will perform the visualization step by step as we do in any Time -series data project.

Importing the Libraries

We will import all the libraries that we will be using throughout this article in one place so that do not have to import every time we use it this will save both our time and effort.

  • Numpy – A Python library that is used for numerical mathematical computation and handling multidimensional ndarray, it also has a very large collection of mathematical functions to operate on this array.
  • Pandas A Python library built on top of NumPy for effective matrix multiplication and dataframe manipulation, it is also used for data cleaning, data merging, data reshaping, and data aggregation.
  • MatplotlibIt is used for plotting 2D and 3D visualization plots, it also supports a variety of output formats including graphs for data. 

Python3




import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


Loading The Dataset

To load the dataset into a dataframe we will use the pandas read_csv() function. We will use head() function to print the first five rows of the dataset. Here we will use the ‘parse_dates’ parameter in the read_csv function to convert the ‘Date’ column to the DatetimeIndex format. By default, Dates are stored in string format which is not the right format for time series data analysis.

Python3




# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv",
                 parse_dates=True,
                 index_col="Date")
 
# displaying the first five rows of dataset
df.head()


Output:

            Unnamed: 0   Open   High    Low  Close    Volume  Name
Date
2006-01-03 NaN 39.69 41.22 38.79 40.91 24232729 AABA
2006-01-04 NaN 41.22 41.90 40.77 40.97 20553479 AABA
2006-01-05 NaN 40.93 41.73 40.85 41.53 12829610 AABA
2006-01-06 NaN 42.88 43.57 42.80 43.21 29422828 AABA
2006-01-09 NaN 43.10 43.66 42.82 43.42 16268338 AABA

Dropping Unwanted Columns  

We will drop columns from the dataset that are not important for our visualization.

Python3




# deleting column
df.drop(columns='Unnamed: 0')


Output:

             Open   High    Low  Close    Volume  Name
Date
2006-01-03 39.69 41.22 38.79 40.91 24232729 AABA
2006-01-04 41.22 41.90 40.77 40.97 20553479 AABA
2006-01-05 40.93 41.73 40.85 41.53 12829610 AABA
2006-01-06 42.88 43.57 42.80 43.21 29422828 AABA
2006-01-09 43.10 43.66 42.82 43.42 16268338 AABA

Plotting Line plot for Time Series data.

Python3




df['Volume'].plot()


Output:

sss

Line Plot for Time Series Data

Here, we have plotted the ‘Volume’ column data.

Now let’s plot all other columns using a subplot.

Python3




df.plot(subplots=True, figsize=(4, 4))


Output:

sss-(1)

Cumulative plot of all the features

The line plots used above are good for showing seasonality.

Seasonality: In time-series data, seasonality is the presence of variations that occur at specific regular time intervals less than a year, such as weekly, monthly, or quarterly. 

Resampling: Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter. Resampling for months or weeks and making bar plots is another very simple and widely used method of finding seasonality. Here we are going to make a bar plot of month data for 2016 and 2017.

Resample and Plot The Data 

Python3




# Resampling the time series data based on monthly 'M' frequency
df_month = df.resample("M").mean()
 
# using subplot
fig, ax = plt.subplots(figsize=(6, 6))
 
# plotting bar graph
ax.bar(df_month['2016':].index,
       df_month.loc['2016':, "Volume"],
       width=25, align='center')


Output:

t1

Histogram of Resample data

There are 24 bars in the graph and each bar represents a month.

Differencing: Differencing is used to make the difference in values of a specified interval. By default, it’s one, we can specify different values for plots. It is the most popular method to remove trends in the data.

Example 4:

Python3




df.Low.diff(2).plot(figsize=(6, 6))


Output:

t2

Differentiating Time Series value

Python3




df.High.diff(2).plot(figsize=(10, 6))


Output:

Higher differentiating  for Time series data

Higher differentiating  for Time series data 

Trend In The Dataset 

We can see the change in trend in our dataset, Trend helps us see where the value of data that we are considering is going upward or downward in the long run.

Python code for Trend 

Python3




# Finding the trend in the "Open"
# column using moving average method
window_size = 50
rolling_mean = df['Open'].rolling\
            (window_size).mean()
rolling_mean.plot()


Output:

t2-(1)

Trend in Time Series data

Plotting the Changes in Data

We can also plot the changes that occurred in data over time. There are a few ways to plot changes in data.

Shift: The shift function can be used to shift the data before or after the specified time interval. We can specify the time, which will shift the data by one day by default. That means we will get the previous day’s data. It is helpful to see the previous day’s data and today’s data simultaneously side by side.

In this code, .div() function helps to fill up the missing data values. Actually, div() means division. If we take df. div(6) it will divide each element in df by 6. We do this to avoid the null or missing values that are created by the ‘shift()’ operation. 

Here, we have taken .div(df.Close.shift()), it will divide each value of df to df.Close.shift() to remove null values.

Python3




df['Change'] = df.Close.div(df.Close.shift())
df['Change'].plot(figsize=(10, 8), fontsize=16)


Output:

Change in close price of Time series data

Change in close price of Time Series data 

We can also take a specific interval of time and plot to have a clearer look. Here we are plotting the data of only 2017.

Python3




df['2017']['Change'].plot(figsize=(10, 6))


Output:

Year Data  Zooming of Time series data

Year Data  Zooming of Time Series data 

Box Plot in Time Series Dataset

We can also use box plot to see the distribution of values in a specific column. Lets tasks an example. In this we are getting a new column named Year by using datetime. And then we are taking ‘Open’ column on Y-axis.

Python3




import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
 
# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv", parse_dates=True)
df.drop(columns='Unnamed: 0', inplace=True)
 
df['Date']= pd.to_datetime(df['Date'])
 
# extract year from date column
df["Year"] = df["Date"].dt.year
 
# box plot grouped by year
sns.boxplot(data=df, x="Year", y="Open")


Output:

plot

Box Plot

RELATED ARTICLES

Most Popular

Recent Comments