Data Science Explorer

Pandas: Removing Duplicates 본문

Python

Pandas: Removing Duplicates

grace21110 2023. 10. 30. 20:34
반응형

If you want to discover duplicates on the dataset, we can use duplicated() method. It gives you a boolean values (True or False).

 

Example 

print (df.duplicated())

 

  • Removing Duplicates 

To remove duplicates, use the drop_duplicates() method. 

 

Example 

df.drop_duplicates(inplace = True)

 

Exercises 

Explain the following code.

# Count duplicate student names
duplicate_count = df[df.duplicated(subset=["StudentName"], keep=False)].shape[0]

# Display the count
print("Count of duplicate student names:", duplicate_count)
  • 'duplicated()' was used to discover the duplicates.
  • subset=["StudentName"]: checks for duplicate values in the specified subset in this case, "StudentName".
  • The keep=False parameter means that all occurrences of duplicates will be marked as True.
  • .shape[0]: This part retrieves the number of rows in the filtered DataFrame, which corresponds to the count of duplicate student names.

'Python' 카테고리의 다른 글

Matplotlib: Matplotlib Pyplot  (0) 2023.11.04
Matplotlib  (0) 2023.11.04
Pandas: Fixing Wrong Data  (0) 2023.10.29
Pandas: Cleaning Data of Wrong Format  (0) 2023.10.29
Pandas: Cleaning Empty Cells  (2) 2023.10.29