Apply a 2D Max Pooling in PyTorch

22 July 2024

1

Pooling is a technique used in the CNN model for down-sampling the feature coming from the previous layer and produce the new summarised feature maps. In computer vision reduces the spatial dimensions of an image while retaining important features. The goal of pooling is to reduce the computational complexity of the model and make it less sensitive to small translations in the input image.

Types of Pooling

There are two main types of pooling used in deep learning: Max Pooling and Average Pooling.

Max Pooling: Max Pooling selects the maximum value from each set of overlapping filters and passes this maximum value to the next layer. This helps to retain the most important feature information while reducing the size of the representation.

Average Pooling: Average Pooling computes the average value of each set of overlapping filters, and passes this average value to the next layer. This helps to retain a more general form of the feature information, but with a reduced spatial resolution.

Pooling is usually applied after a convolution operation and helps to reduce overfitting and improve the generalization performance of the model.

2d Max pooling

As the name suggests, selects the maximum value in each pooling region and passes it on to the next layer. This helps to retain the most important feature information while discarding less important information. Max pooling is used to detect the presence of a feature in an image.

Syntax :

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

where, 
kernel_size : It denotes the filtered kernel shape, which is considered at a time.
stride : Number of pixel shift over the input feature,(default is kernel_size)
padding  : Extra 0 padding layers around both side the feature, (default is 0)
dialation: Control the stride (default is 1)
return_indices : True or False (default is False) return the max indices.
ceil_mode : True or False (default is False) when True it will use ceil instead of floor.

Let the input tensor be of a shape (n, c, h, w) and our kernel size is (k_h, k_w) then the output can be computed as:

$\begin{aligned}out(n_i, c_j, h, w) =& \max_{m=0, \ldots, k_h-1} \;\; \max_{n=0, \ldots, k_w-1} &[\text{input}(n_i, c_j, \text{stride[0]} \times h + m,\; \text{stride[1]} \times w + n)] \end{aligned}$

then the output Shape will be :
$h_{out} = \left[\frac{h_{in} - \text{kernel\_size[0]}+ 2 * \text{padding[0]}}{\text{stride[0]}} + 1 \right]$

$w_{out} = \left[\frac{w_{in} - \text{kernel\_size[1]}+ 2 * \text{padding[1]}}{\text{stride[1]}} + 1 \right]$

In this example, the input image is 4×4 and the Max-pooling operation is performed using a 2×2 pooling kernel and with stride 2X2. Stride defines how many numbers of pixels will shift over the input image.

2d Max Pooling with 2X2 kernel and stride 2

Here’s an example of how Max-pooling can be implemented in PyTorch:

Python

import torch 
import torch.nn as nn 
  
# Define the input tensor 
input_tensor = torch.tensor( 
    [ 
        [1, 1, 2, 4], 
        [5, 6, 7, 8], 
        [3, 2, 1, 0], 
        [1, 2, 3, 4] 
    ], dtype = torch.float32) 
  
# Reshape the input_tensor 
input_tensor = input_tensor.reshape(1, 1, 4, 4) 
  
# Initialize the Max-pooling layer with kernel 2X2 and stride 2 
pool = nn.MaxPool2d(kernel_size=2, stride=2) 
  
# Apply the Max-pooling layer to the input tensor 
output = pool(input_tensor) 
  
# Print the output tensor 
output

Output :

tensor([[[[6., 8.],
          [3., 4.]]]])

Mathematically, the Output shape can be calculated as:

$\begin{aligned} h_{out} & =\left[\frac{h_{in} - \text{kernel\_size[0]} + 2 * \text{padding[0]} }{\text{stride[0]}} + 1\right] \\&=\left [\frac{4 -2 + 2 * 0}{2} + 1\right] \\ &= \left[\frac{4 -2 + 0}{2} + 1\right] \\ &= \left[\frac{2}{2} + 1\right] \\ &= \left [1+1 \right] \\ &= 2 \end{aligned}$

$\begin{aligned} w_{out} & =\left[\frac{w_{in} - \text{kernel\_size[1]} + 2 * \text{padding[1]} }{\text{stride[1]}} + 1\right] \\&=\left [\frac{4 -2 + 2 * 0}{2} + 1\right] \\ &= \left[\frac{4 -2 + 0}{2} + 1\right] \\ &= \left[\frac{2}{2} + 1\right] \\ &= \left [1+1 \right] \\ &= 2 \end{aligned}$

Apply 2d max pooling on a real image

Python3

import torch 
from PIL import Image 
import torchvision.transforms as T 
  
# Read the image file 
image = Image.open('GFG.jpg') 
    
# convert input image to torch tensor 
Input = T.ToTensor()(image) 
    
# unsqueeze image to make 4D 
Input = Input.unsqueeze(0) 
print('Input Tensor :',Input.shape) 
    
# define 2d Max pooling with square window 
# of (kernel_size=4, stride=2 and padding=1) 
pooling = torch.nn.MaxPool2d(kernel_size=(5,3),  
                             stride=(3,2),  
                             padding=(1,1), 
                             dilation=1) 
Output = pooling(Input) 
print('Output Tensor :',Output.shape) 
# squeeze image 
Out_img = Output.squeeze(0) 
  
# convert tensor to image 
Out_img = T.ToPILImage()(Out_img) 
Out_img

Output:

Input Tensor : torch.Size([1, 3, 561, 799])
Output Tensor : torch.Size([1, 3, 187, 400])

Output Image 2d max pooling

The output shape can be calculated as:

$\begin{aligned} h_{out} & =\left[\frac{h_{in} -\text{kernel\_size[0]}+ 2 * \text{padding[0]}}{\text{stride[0]}} + 1\right] \\&=\left [\frac{561 - 5 + 2 * 1 }{3} + 1\right] \\ &= \left[\frac{561 - 5+2}{3} + 1\right] \\ &= \left[\frac{561 - 3}{3} + 1\right] \\ &= \left[\frac{558}{3} + 1\right] \\ &= \left [186+1 \right] \\ &= 187 \end{aligned}$

$\begin{aligned} w_{out} & =\left[\frac{w_{in} -\text{kernel\_size[1]}+ 2 * \text{padding[1]}}{\text{stride[1]}} + 1\right] \\&=\left [\frac{799 - 3 + 2 * 1 }{2} + 1\right] \\ &= \left[\frac{799 -3+2}{2} + 1\right] \\ &= \left[\frac{799 - 1}{2} + 1\right] \\ &= \left[\frac{798}{2} + 1\right] \\ &= \left [399+1 \right] \\ &= 400 \end{aligned}$

1D, 2D, 3D pooling

In PyTorch, the terms “1D,” “2D,” and “3D” pooling refer to the number of spatial dimensions in the input that are being reduced by the pooling operation.

1D Pooling is used to reduce the spatial resolution of 1D signals, such as time series or audio signals. In 1D pooling, the input is divided into non-overlapping regions along the time axis, and the values in each region are aggregated into a single output value.

2D Pooling is used to reduce the spatial resolution of 2D images or maps. In 2D pooling, the input is divided into non-overlapping regions along both the row and column axes, and the values in each region are aggregated into a single output value.

3D Pooling is used to reduce the spatial resolution of 3D signals, such as video sequences or volumetric data. In 3D pooling, the input is divided into non-overlapping regions along all three spatial dimensions (height, width, and depth), and the values in each region are aggregated into a single output value.

Apply a 2D Max Pooling in PyTorch

Types of Pooling

2d Max pooling

Python

Apply 2d max pooling on a real image

Python3

1D, 2D, 3D pooling

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Recent Comments

EDITOR PICKS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR POSTS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR CATEGORY

ABOUT US

FOLLOW US