Altair is a simple and easy to use statistical visualization library for python. It provides many types of visualizations ranging from simple bar charts to compound visualizations like box plots. Scatter Plot is one of the most useful visualizations in the Altair library for bivariate analysis and finding relationships between two data columns in a data set.
Getting Started
Sometimes a simple scatter plot is not enough to gauge the relationships between the variables in a data set. A better visualization would be a plot between two quantitative variables/data columns with respect to a third variable. This third variable is almost always a nominal or categorical variable. We can color the data points in the scatter plot using this third variable. Coloring the scatter plot will help us to recognize which data point corresponds to which category of the third variable.
To color to scatter plot, the user simply has to map a nominal variable from the dataset to the color encoding.
Let us understand the importance of scatter plot coloring using an example:
The Iris dataset is one of the most popular datasets used in Data Science and is available in most dataset libraries. The dataset documents the iris flowers across three species. The data columns available in the dataset are sepalLength, sepalWidth, petalLength, petalWidth and species. First, we will visualize this dataset using a simple scatter plot and then see what can be achieved by coloring this scatter plot.
To make a simple scatter plot, we use the iris dataset from the Vega_datasets library and pass it to the Chart object and use the mark_point() method. Then, we map the x and y-axis encoding to be sepalLength and petalLength variables.
The simple scatter plot using iris without coloring:
Python3
# Python3 program to illustrate # How to color a Scatter Plot # using altair # Importing altair and vega_datasets library import altair as alt from vega_datasets import data # Selecting the iris dataset iris = data.iris() # Making the Scatter Plot alt.Chart(iris).mark_point().encode( # Map the sepalLength to x-axis x = 'sepalLength' , # Map the petalLength to y-axis y = 'petalLength' , ) |
Output:
As you can see, we can infer from this scatter plot that one group of points is linearly separable from the other group but we can’t see which data points correspond to which species and what types of relationships are present. To make this plot more informative, we will color this scatter plot using the species variable.
Code:
Python3
# Python3 program to illustrate # How to color a Scatter Plot # using altair # Importing altair and vega_datasets library import altair as alt from vega_datasets import data # Selecting the iris dataset iris = data.iris() # Making the Scatter Plot alt.Chart(iris).mark_point().encode( # Map the sepalLength to x-axis x = 'sepalLength' , # Map the petalLength to y-axis y = 'petalLength' , # Coloring the Scatter Plot # Map the species to color color = 'species' ) |
Output:
Altair automatically generates the legend specifying which color represents which category of the color variable. On seeing the colored data points, we can infer that the setosa species has long sepals but short petals. The versicolor species have almost equal and medium-sized petals and sepals whereas the virginica species also have almost equal but large-sized petals and sepals.
As you can see, we can extract more information by coloring a scatter plot.
Customizing Colors
If you don’t like the colors chosen by Altair for your scatter plot, you can customize the colors. The default colors can be changed using the scale argument of the Color class, By passing the Scale class to the scale argument. The available customizations are:
- Custom mapping of colors to discrete values: For custom mapping, we use domain and range parameters of the Scale and pass list for values and colors resp.
- Color Schemes: There are many color schemes given by the Vega project. If you like dark colors, you can use the ‘dark2’ scheme and if there are more than 10 categories you can use the ‘category20’ scheme.
Example 1: Custom mapping of colors to discrete values:
Python3
# Python3 program to illustrate # How to do custom mapping # of colors to discrete values # for scatter plot coloring # using altair # Importing altair and vega_datasets library import altair as alt from vega_datasets import data # Selecting the cars dataset cars = data.cars() # Making two lists for # values and colors resp. dom = [ 'Europe' , 'Japan' , 'USA' ] rng = [ 'red' , 'green' , 'black' ] # Making the Scatter Plot alt.Chart(cars).mark_point().encode( # Map Miles_per_Gallon to x-axis x = 'Miles_per_Gallon' , # Map the Horsepower to y-axis y = 'Horsepower' , # Coloring the Scatter Plot # using Origin variable and # custom colors color = alt.Color( 'Origin' , scale = alt. Scale(domain = dom, range = rng)) ) |
Output:
Example 2(Color Schemes):
Python3
# Python3 program to illustrate # How to select color schemes # for scatter plot coloring # using altair # Importing altair and vega_datasets library import altair as alt from vega_datasets import data # Selecting the cars dataset cars = data.cars() # Making the Scatter Plot alt.Chart(cars).mark_point().encode( # Map Miles_per_Gallon to x-axis x = 'Miles_per_Gallon' , # Map the Horsepower to y-axis y = 'Horsepower' , # Coloring the Scatter Plot # using Origin variable and # color scheme color = alt.Color( 'Origin' , scale = alt. Scale(scheme = 'dark2' )) ) |
Output: