How to duplicate each row by Python

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Data Science Explorer

How to duplicate each row by Python 본문

Projects

How to duplicate each row by Python

grace21110 2023. 11. 16. 21:03

I was working on data cleaning and faced a struggle today. I had to duplicate each row and put it together and did not know how to deal with this. It took me a few hours to think and look up the ways to figure it out.

Luckily I found the way out so I am going to show you how I did it!

Step #1: You have to import the data.

import pandas as pd 

ss = pd.read_csv('/content/총물량데이터.csv')
print (ss)

Step #2: Check what's inside of the file.

ss.info()

Step #3: Since the data that I imported included all the regions, I had to sort it out with only data of Seoul.

seoul_df = ss[(ss['SEND_CITY'] == '서울특별시') & (ss['REC_CITY'] == '서울특별시')]
print(seoul_df)

Step #4: Set the ID for each column

seoul_df = ss[(ss['SEND_CITY'] == '서울특별시') & (ss['REC_CITY'] == '서울특별시')].reset_index(drop=True)

# Add a new 'ID' column
seoul_df['ID'] = range(1, len(seoul_df) + 1)

# Display the resulting DataFrame
print(seoul_df.head(2))

Step #5: Place the location of the ID column

seoul_df = seoul_df[['ID'] + [col for col in ss.columns if col != 'ID']]
seoul_df.info()

Step #6: Make the same ID column double

# Make the same ID column double
seoul_df = pd.concat([seoul_df, seoul_df], ignore_index=True)

# Set ID column as a standard 
seoul_df.sort_values(by=['ID'], inplace=True)

# Reset the index 
seoul_df.reset_index(drop=True, inplace=True)

seoul_df.head(10)

Step #7: Insert the new column name and fill it with 0

seoul_df_copy = seoul_df.copy()
seoul_df_copy['일하는_구'] = 0
seoul_df_copy.head()

Step #8: Use for loops to sort the Sending region for even numbers, and odd numbers for Receiving region.

for idx, row in seoul_df_copy.iterrows():
  if idx % 2 == 0:
    seoul_df_copy.at[idx, '일하는_구'] = seoul_df_copy.at[idx, 'SEND_GU']
  else:
    seoul_df_copy.at[idx, '일하는_구'] = seoul_df_copy.at[idx, 'REC_GU']

seoul_df_copy = seoul_df_copy.drop(columns=['SEND_GU', 'REC_GU'])

seoul_df_copy.head()

Step #9: Rename 일하는_구 into English

seoul_df_copy.rename(columns={'일하는_구': 'WORK_GU'}, inplace=True)
print(seoul_df_copy)

Step #10: Get the sum in total sorted by the categories

items = ['FURNITURE/INTERIOR', 'ETC', 'BOOK/ALBUM',
       'DIGITAL/HOME APP', 'LIFE/HEALTH', 'SPORTS/LEISURE', 'FOOD',
       'CHILDBIRTH/PARENTING', 'FASHION/CLOTHES', 'FASHION/ACC',
       'COSMETICS/BEAUTY']
      
result = seoul_df_copy.groupby(['DATE', 'YEAR', 'MONTH', 'DAY', 'DAY_OF', 'holiday', 'SEND_CITY', 'REC_CITY','WORK_GU'])[items].sum().reset_index()
result.loc[result['DAY_OF'] == 'DAY', 'holiday'] = 1
result

Step #11: Check if I have all columns I need

seoul_df_copy.columns

Step #12: Export it into excel file

from google.colab import files
result.to_csv("총물량데이터.csv", index=False)
files.download("총물량데이터.csv")

All set!

'Projects' 카테고리의 다른 글

How to export excel file from Google Colab (0)	2023.11.03
How to fix the Error: WARNING:matplotlib.font_manager:findfont: Font family 'NanumBarunGothic' not found. (0)	2023.11.03
Personal Project: Hangman (0)	2023.11.02

'Projects' Related Articles

Data Science Explorer

How to duplicate each row by Python 본문

How to duplicate each row by Python

'Projects' 카테고리의 다른 글

티스토리툴바