일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
- variables
- Else
- Github
- line width
- line color
- continue
- start exercise
- __init__
- SQL
- data distribution
- PROJECT
- Text mining
- error
- self parameter
- machine learning
- multiple lines
- AS
- matplotlib
- Text Analytics
- polynomial regression
- For loops
- iterates
- break
- PANDAS
- pie charts
- matplotlib.pyplot
- train/test
- Python
- MySQL
- Default X points
- Today
- Total
Data Science Explorer
Pandas: Cleaning Data of Wrong Format 본문
Sometimes our data might not be in the format we expect. For example, numbers might be stored as text, or dates might not be recognized as dates. We need to identify these issues.
- Convert into a Correct Format
Example
Convert the 'Age' column to numbers.
df['Age'] = pd.to_numeric (df['Age'], errors = 'coerce')
** The 'errors' parameter helps handle cases where the conversion isn't possible, and we set it to 'coerce' to replace those cases with a special value (like NaN, which means "Not a Number"). **
- Removing Rows
Example
Remove rows with a NULL value in the "Age" column.
df.dropna(subset=['Age'], inplace = True)
** 'subset' parameter specifies which columns to consider when checking for missing values. **
Exercise
You have a dataset that contains dates in an incorrect format. The dates are currently represented as text in the format "DD/MM/YYYY," but you need to convert them into the standard date format "YYYY-MM-DD" for analysis. Write Python code using Pandas to perform the following tasks.
import pandas as pd
df = pd.read_csv('date.csv')
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.strftime('%Y-%m-%d')
print (df)
** .dt.strftime() function to convert the datetime format to the desired "YYYY-MM-DD" format. **
** pd.to_datetime is a function used to convert date and time strings or objects into Pandas datetime objects. **
'Python' 카테고리의 다른 글
Pandas: Removing Duplicates (0) | 2023.10.30 |
---|---|
Pandas: Fixing Wrong Data (0) | 2023.10.29 |
Pandas: Cleaning Empty Cells (2) | 2023.10.29 |
Pandas Read CSV (0) | 2023.10.28 |
Pandas DataFrames (2) | 2023.10.27 |