일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
- Github
- iterates
- PROJECT
- AS
- __init__
- line width
- continue
- For loops
- error
- matplotlib
- SQL
- polynomial regression
- PANDAS
- start exercise
- Text Analytics
- train/test
- Else
- break
- Python
- pie charts
- Default X points
- matplotlib.pyplot
- self parameter
- line color
- data distribution
- Text mining
- machine learning
- MySQL
- multiple lines
- variables
- Today
- Total
Data Science Explorer
How to duplicate each row by Python 본문
I was working on data cleaning and faced a struggle today. I had to duplicate each row and put it together and did not know how to deal with this. It took me a few hours to think and look up the ways to figure it out.
Luckily I found the way out so I am going to show you how I did it!
Step #1: You have to import the data.
import pandas as pd
ss = pd.read_csv('/content/총물량데이터.csv')
print (ss)
Step #2: Check what's inside of the file.
ss.info()
Step #3: Since the data that I imported included all the regions, I had to sort it out with only data of Seoul.
seoul_df = ss[(ss['SEND_CITY'] == '서울특별시') & (ss['REC_CITY'] == '서울특별시')]
print(seoul_df)
Step #4: Set the ID for each column
seoul_df = ss[(ss['SEND_CITY'] == '서울특별시') & (ss['REC_CITY'] == '서울특별시')].reset_index(drop=True)
# Add a new 'ID' column
seoul_df['ID'] = range(1, len(seoul_df) + 1)
# Display the resulting DataFrame
print(seoul_df.head(2))
Step #5: Place the location of the ID column
seoul_df = seoul_df[['ID'] + [col for col in ss.columns if col != 'ID']]
seoul_df.info()
Step #6: Make the same ID column double
# Make the same ID column double
seoul_df = pd.concat([seoul_df, seoul_df], ignore_index=True)
# Set ID column as a standard
seoul_df.sort_values(by=['ID'], inplace=True)
# Reset the index
seoul_df.reset_index(drop=True, inplace=True)
seoul_df.head(10)
Step #7: Insert the new column name and fill it with 0
seoul_df_copy = seoul_df.copy()
seoul_df_copy['일하는_구'] = 0
seoul_df_copy.head()
Step #8: Use for loops to sort the Sending region for even numbers, and odd numbers for Receiving region.
for idx, row in seoul_df_copy.iterrows():
if idx % 2 == 0:
seoul_df_copy.at[idx, '일하는_구'] = seoul_df_copy.at[idx, 'SEND_GU']
else:
seoul_df_copy.at[idx, '일하는_구'] = seoul_df_copy.at[idx, 'REC_GU']
seoul_df_copy = seoul_df_copy.drop(columns=['SEND_GU', 'REC_GU'])
seoul_df_copy.head()
Step #9: Rename 일하는_구 into English
seoul_df_copy.rename(columns={'일하는_구': 'WORK_GU'}, inplace=True)
print(seoul_df_copy)
Step #10: Get the sum in total sorted by the categories
items = ['FURNITURE/INTERIOR', 'ETC', 'BOOK/ALBUM',
'DIGITAL/HOME APP', 'LIFE/HEALTH', 'SPORTS/LEISURE', 'FOOD',
'CHILDBIRTH/PARENTING', 'FASHION/CLOTHES', 'FASHION/ACC',
'COSMETICS/BEAUTY']
result = seoul_df_copy.groupby(['DATE', 'YEAR', 'MONTH', 'DAY', 'DAY_OF', 'holiday', 'SEND_CITY', 'REC_CITY','WORK_GU'])[items].sum().reset_index()
result.loc[result['DAY_OF'] == 'DAY', 'holiday'] = 1
result
Step #11: Check if I have all columns I need
seoul_df_copy.columns
Step #12: Export it into excel file
from google.colab import files
result.to_csv("총물량데이터.csv", index=False)
files.download("총물량데이터.csv")
All set!

'Projects' 카테고리의 다른 글
How to export excel file from Google Colab (0) | 2023.11.03 |
---|---|
How to fix the Error: WARNING:matplotlib.font_manager:findfont: Font family 'NanumBarunGothic' not found. (0) | 2023.11.03 |
Personal Project: Hangman (0) | 2023.11.02 |