Python

Pandas: Cleaning Empty Cells

grace21110 2023. 10. 29. 10:03
반응형
  • Remove Cells 

If there are some empty rosws, you would want to remove them. 

 

Example 

Return a new Data Frame with no empty cells.

import pandas as pd

df = pd.read_csv('hello.csv')

new_df = df.dropna()

print(new_df.to_string())

** dropna() method returns a new Dataframe, not changing the original one. **

 

When you want to keep the orignial dataframe, use the inplace = True argument. 

 

Example 

Remove all rows with NULL values. 

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

** The dropna(inplace = True) will not return a new dataframe, but it will remove all rows containing NULL values from the original dataframe. 

 

  • Replace Empty Values 

Another way to remove empty cells is inserting new value instead. 

The fillna() method allows us to replace empty cells with a value.

 

Example 

Replace NULL values with the number 20. 

 

import pandas as pd 
df = pd.read_csv('may.csv')

df.fillna(20, inplace = True)

 

  • Replace Only For Specified Columns 

If you want to replace empty values for one column, specify the column name for the dataframe. 

 

Example 

Replace NULL values in the "Address" columns with the number 20. 

import pandas as pd 
df = pd.read_csv('may.csv')

df ["Address"].fillna(20, inplace = True)

 

  • Replace Using Mean, Median, or Mode 

There is another way that you can remove empty cells using mean(), median(), mode(). 

** mean is the average value (the sum of all values divided by number of values), median is the value in the middle, after you have sorted all values ascending, mode is the value that appears most frequently. ** 

 

Example 

Calculate the MEAN and replace any empty values with it.

import pandas as pd 
df = pd.read_csv("berry.csv')

a = df["Address"].mean()

df ["Address"].fillna(a, inplace = True)

 

Calculate the MODE and replace any empty values with it. 

import pandas as pd

df = pd.read_csv('hello.csv')

x = df["Address"].mode()[0]

df["Address"].fillna(x, inplace = True)

 

 

 

Exercise 

1. You are working with a dataset that contains information about the sales of products. The dataset includes a column called "ProductCategory." Your task is to calculate the mode (most frequent category) in the "ProductCategory" column and replace any empty (missing) values with this mode.

import pandas as pd 

df = pd.read_csv('productcategory.csv')

a = df["Productcategory"].mode()[0]

df["Productcategory"].fillna(a, inplace = True)

 

2. You are working with a dataset that contains information about students' exam scores. The dataset has columns for "StudentID," "Name," "MathScore," and "ScienceScore." Some of the students have missing scores. Your task is to clean the data by removing rows with missing scores.

import pandas as pd

df = pd.read_csv('studentid.csv')

studentid_df = df.dropna()

print(studentid_df)