Data binning, bucketing is a data pre-processing method used to minimize the effects of small observation errors. The original data values are divided into small intervals known as bins and then they are replaced by a general value calculated for that bin. This has a smoothing effect on the input data and may also reduce the chances of overfitting in the case of small datasets
There are 2 methods of dividing data into bins: Â
- Equal Frequency Binning: bins have an equal frequency.
- Equal Width Binning : bins have equal width with a range of each bin are defined as [min + w], [min + 2w] …. [min + nw] where w = (max – min) / (no of bins).
Equal frequency:Â
Input:[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215] Output: [5, 10, 11, 13] [15, 35, 50, 55] [72, 92, 204, 215]
Equal Width:Â Â
Input: [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215] Output: [5, 10, 11, 13, 15, 35, 50, 55, 72] [92] [204, 215]
Code : Implementation of Binning Technique:Â
Python
# equal frequencydef equifreq(arr1, m):        a = len(arr1)    n = int(a / m)    for i in range(0, m):        arr = []        for j in range(i * n, (i + 1) * n):            if j >= a:                break            arr = arr + [arr1[j]]        print(arr)  # equal widthdef equiwidth(arr1, m):    a = len(arr1)    w = int((max(arr1) - min(arr1)) / m)    min1 = min(arr1)    arr = []    for i in range(0, m + 1):        arr = arr + [min1 + w * i]    arri=[]          for i in range(0, m):        temp = []        for j in arr1:            if j >= arr[i] and j <= arr[i+1]:                temp += [j]        arri += [temp]    print(arri)   # data to be binneddata = [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]  # no of binsm = 3   print("equal frequency binning")equifreq(data, m)  print("\n\nequal width binning")equiwidth(data, 3) |
Output :Â
equal frequency binning [5, 10, 11, 13] [15, 35, 50, 55] [72, 92, 204, 215] equal width binning [[5, 10, 11, 13, 15, 35, 50, 55, 72], [92], [204, 215]]

… [Trackback]
[…] Read More here to that Topic: geeksforgeeks.org/binning-in-data-mining/ […]