Conditioning Plot

27 July 2024

2

A conditioning plot or co-plot or subset plot is a scatter plot of two variables when conditioned on a third variable. The third variable is called the conditioning variable. This variable can have both values either continuous or categorical. In the continuous variable, we created subsets by dividing them into a smaller range of values. In categorical variables, the subsets are created based on different categories.

Let’s take three variables X, Y and Z. Z be the variable which we divided into the k groups. Here, there are many ways in which a group can be formed such as:

By dividing the data into equal size of k groups.
By dividing the data into different clusters on the basis of scatter plot.
By dividing the range of data points into equal values.
The categorical data have natural grouping on the basis of different categories of the dataframe.

Then, we plot n rows and m columns matrix where n*m >= k. Each set of (row, column) represents an individual scatter plot, in which each scatters plot consists of the following components.

Vertical Axis: Variable Y
Horizontal Axis: Variable X

where, points in the group corresponding to row i and column j are used.

The conditioning plot provides the answer to the following questions:

Is there any relationship between the two variables?
If there is a relationship then, does the nature of the relationship depend upon the third variable?
Do different groups in the data behave similarly?
Are there any outliers in the data?

Implementation

Python3

# code
% matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
 
# load training file for titanic dataset
titanic_dataset =pd.read_csv('train.csv')
 
# head of dataset
titanic_dataset.head()
 
# conditioning plot on the basis of categorical variables
sns.lmplot(x='Age', y ='Fare',hue='Survived', col ='Sex',data=titanic_dataset)
sns.lmplot(x='Age', y ='Fare',hue='Survived', col ='Pclass',data=titanic_dataset)
 
# conditioning plot on the basis of continuous variables
df1, df2 = titanic_dataset.loc[titanic_dataset['Age'] < 20 ] ,
    titanic_dataset.loc[titanic_dataset['Age'] >= 20 ]
 
 
lm = sns.lmplot(x='Parch', y ='Fare',hue='Survived',data=df1)
ax1 =lm.axes
ax1=plt.gca()
ax1.set_title('Age < 20')
 
lm_2 = sns.lmplot(x='Parch', y ='Fare',hue='Survived',data=df2)
 
ax2 =lm_2.axes
ax2=plt.gca()
ax2.set_title('Age >= 20')

Conditional Plot on the basis of Sex

Conditional Plot on the basis of Passenger_Class

Conditional Plot on the basis of Age

References:

NIST handbook

Conditioning Plot

Implementation

Python3

References:

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US