Data Science Explorer

Pandas: Cleaning Data of Wrong Format 본문

Python

Pandas: Cleaning Data of Wrong Format

grace21110 2023. 10. 29. 10:37
반응형

Sometimes our data might not be in the format we expect. For example, numbers might be stored as text, or dates might not be recognized as dates. We need to identify these issues. 

 

  • Convert into a Correct Format 

Example 

Convert the 'Age' column to numbers.

df['Age'] = pd.to_numeric (df['Age'], errors = 'coerce')

** The 'errors' parameter helps handle cases where the conversion isn't possible, and we set it to 'coerce' to replace those cases with a special value (like NaN, which means "Not a Number"). **

 

  • Removing Rows

Example 

Remove rows with a NULL value in the "Age" column. 

df.dropna(subset=['Age'], inplace = True)

** 'subset' parameter specifies which columns to consider when checking for missing values. ** 

 

Exercise 

You have a dataset that contains dates in an incorrect format. The dates are currently represented as text in the format "DD/MM/YYYY," but you need to convert them into the standard date format "YYYY-MM-DD" for analysis. Write Python code using Pandas to perform the following tasks.

import pandas as pd 
df = pd.read_csv('date.csv')

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.strftime('%Y-%m-%d')

print (df)

** .dt.strftime() function to convert the datetime format to the desired "YYYY-MM-DD" format. **

** pd.to_datetime is a function used to convert date and time strings or objects into Pandas datetime objects. **

 

'Python' 카테고리의 다른 글

Pandas: Removing Duplicates  (0) 2023.10.30
Pandas: Fixing Wrong Data  (0) 2023.10.29
Pandas: Cleaning Empty Cells  (2) 2023.10.29
Pandas Read CSV  (0) 2023.10.28
Pandas DataFrames  (2) 2023.10.27