Sunday, November 17, 2024
Google search engine
HomeLanguagesReplace negative values with latest preceding positive value in Pandas DataFrame

Replace negative values with latest preceding positive value in Pandas DataFrame

In this article, we will discuss how to replace the negative value in Pandas DataFrame Column with the latest preceding positive value.

While doing this there may arise two situations – 

  • Value remains unmodified if no proceeding positive value exists
  • Value update to 0 if no proceeding positive value exists

Let’s discuss these cases in detail.

Case 1: Value remain unmodified if no proceeding positive value exists

A variable is declared to store the latest preceding positive value initialized with some large negative integer. An iteration of the data frame is then performed column-wise. 

  • In case the value is negative, it is replaced with the preceding value positive variable if it exists otherwise, it remains unmodified.
  • And, in case the value is positive, the preceding positive value variable is updated.

Example:

Python3




import pandas as pd
  
  
# creating a pandas dataframe
df = pd.DataFrame([[8, -2, 0, 3, 51, 2],
                   [6, -2, -5, -7, 0, -1],
                   [-1, -12, -5, 4, 5, 3]])
print("Original DataFrame : \n")
print(df)
  
# declaring a pre defined value
prec_val = -999
  
# iterate over columns
for i in range(df.shape[1]):
  
    # resetting value over each column
    prec_val = -999
  
    # iterate over rows
    for j in range(df.shape[0]):
  
        # accessing the cell value
        cell = df.at[j, i]
  
        # check if cell value is negative
        if(cell < 0):
  
            # check if prec_val is not default
            # set value
            if(prec_val != -999):
  
                # replace the cell value
                df.at[j, i] = prec_val
        else:
  
            # store the latest value in variable
            prec_val = df.at[j, i]
  
print("Modified DataFrame : ")
print(df)


Output:

Case 2: Value update to 0 if no proceeding positive value exists

This approach uses the concept of data frame masking in order to replace the negative values of the data frame. The values are traversed from left to right column-wise, in a top to bottom manner. In this approach, initially, all the values < 0 in the data frame cells are converted to NaN. 

Pandas dataframe.ffill() method is used to fill the missing values in the data frame. ‘ffill’ in this method stands for ‘forward fill’ and it propagates the last valid encountered observation forward. The ffill() function is used to fill the missing values along the index axis that is specified. This method has the following syntax : 

Syntax: DataFrame.ffill(axis=None, inplace=False)

Parameters: 

  • axis – {0, index 1, column}
  • inplace : If True, fill in place.

This is followed by the fillna() method to fill the NA/NaN values using the specified value. Here, we fill the NaN values by 0, since it is the lowest positive integer value possible. All the negative values are thus converted to positive ones. This approach can work on data frames, that don’t have any string values stored. In case there are no preceding positive values, then the negative value is replaced by 0. 

Python3




import pandas as pd
  
# creating a pandas dataframe
data_frame = pd.DataFrame({'col1': [8, -2, 0, 3, 51, 2],
                           'col2': [-1, -2, -5, -7, 0, -1],
                           'col3': [-1, -12, -5, 4, 5, 3]})
  
print("Original DataFrame")
print(data_frame)
  
# masking the data frame
data_frame = data_frame.mask(data_frame.lt(
    0)).ffill().fillna(0).astype('int32')
  
print("Modified DataFrame")
print(data_frame)


Output:

RELATED ARTICLES

Most Popular

Recent Comments