The SettingWithCopyWarning may occur when we are trying to modify the data in the Pandas DataFrame. This warning is thrown when we write a line of code with getting and set operations.
To explain this in detail, Using get operation, Pandas won’t guarantee that the returned result from getting operation is either a View or Copy. If it returned a view then set operation would affect the original DataFrame. If it returned a copy then it would modify the copy but the original DataFrame remains unchanged. So here we are not sure whether changes happened to DataFrame or not.
DataFrame
Student Name |
Percentage |
Grade |
---|---|---|
Akhil |
– |
– |
Sai |
65 |
C |
Rohit |
90 |
O |
Prasanth |
79 |
B |
Divya |
89 |
A |
Code that throws a warning
Python3
# import necessary packages import pandas as pd # create a dataframe marks = pd.DataFrame({ 'Name' : [ 'Akhil' , 'Sai' , 'Rohit' , 'Prasanth' , 'Divya' ], 'Percentage' : [ '-' , 65 , 90 , 79 , 89 ], 'Grade' : [ '-' , 'C' , 'O' , 'B' , 'A' ]}) # Assign Absent if percentage is not specified marks[marks.Percentage = = '-' ].Grade = 'Ab' |
Output
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py:5303: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[name] = value
Solution
To solve this problem instead of slicing while getting the required data use the loc method to get required rows and columns. And also use the copy method to store a copy of DataFrame in another variable such that we can separate the get and set operation into 2 lines.
Example 1:
Use the above DataFrame and loc method while getting the required rows & columns in getting an operation.
Python3
# import necessary packages import pandas as pd # create a dataframe marks = pd.DataFrame({ 'Name' : [ 'Akhil' , 'Sai' , 'Rohit' , 'Prasanth' , 'Divya' ], 'Percentage' : [ '-' , 65 , 90 , 79 , 89 ], 'Grade' : [ '-' , 'C' , 'O' , 'B' , 'A' ]}) # Assign Absent if percentage is not specified marks.loc[marks.Percentage = = '-' , 'Grade' ] = 'Ab' # modified content marks |
Output
Explanation- Instead of slicing which is throwing warning, Here we used the loc method.
Note: But this loc method doesn’t ensure a 100% guarantee on warning-free output. So it is advised to create a copy of the original Data Frame and make modifications to that.
Example 2:
Create a copy of a DataFrame and make changes to it by using loc & copy methods.
Python3
# import necessary packages import pandas as pd # create a dataframe marks = pd.DataFrame({ 'Name' : [ 'Akhil' , 'Sai' , 'Rohit' , 'Prasanth' , 'Divya' ], 'Percentage' : [ '-' , '-' , 90 , 79 , 89 ], 'Grade' : [ '-' , '-' , 'O' , 'B' , 'A' ]}) # create a copy of original DataFrame whose $ percentage is empty(absenties) Absent_Students = marks.loc[marks.Percentage = = '-' , :].copy() # Make their grade as 'Ab' which indicates absent. Absent_Students.Grade = 'Ab' # modified content Absent_Students |
Output
Explanation- The above Absent_Students is a copy of the original DataFrame with only Absent students. It ensures that we are changing modifications only on the copy of a DataFrame. So it removes confusion between view & copy.