df.unique() 基于列的整个 DataFrame
问题描述
我有一个 DataFrame df
填充有重复 Id 的行和列:
I have a DataFrame df
filled with rows and columns where there are duplicate Id's:
Index Id Type
0 a1 A
1 a2 A
2 b1 B
3 b3 B
4 a1 A
...
当我使用时:
uniqueId = df["Id"].unique()
我得到一个唯一 ID 列表.
I get a list of unique IDs.
但是,我怎样才能在整个 DataFrame 上应用此过滤,以便它保留结构但删除重复项(基于Id")?
How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?
解决方案
看来你需要DataFrame.drop_duplicates
参数 subset
指定测试重复的位置:
It seems you need DataFrame.drop_duplicates
with parameter subset
which specify where are test duplicates:
#keep first duplicate value
df = df.drop_duplicates(subset=['Id'])
print (df)
Id Type
Index
0 a1 A
1 a2 A
2 b1 B
3 b3 B
<小时>
#keep last duplicate value
df = df.drop_duplicates(subset=['Id'], keep='last')
print (df)
Id Type
Index
1 a2 A
2 b1 B
3 b3 B
4 a1 A
<小时>
#remove all duplicate values
df = df.drop_duplicates(subset=['Id'], keep=False)
print (df)
Id Type
Index
1 a2 A
2 b1 B
3 b3 B
相关文章