Pandas drop_duplicates 方法不适用于包含列表的数据框
问题描述
我正在尝试在我的数据帧上使用 drop_duplicates 方法,但我得到了一个错误.请参阅以下内容:
<块引用>错误:TypeError:不可散列的类型:'list'
我正在使用的代码:
df = db.drop_duplicates()
我的数据库很大,包含字符串、浮点数、日期、NaN、布尔值、整数...感谢任何帮助.
解决方案如错误消息所示,drop_duplicates 不适用于数据框中的列表.但是,您可以在转换为 str 的数据帧上删除重复项,然后使用结果中的索引从原始 df 中提取行.
设置
df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},'Y':{0:'yy',1:'yy',2:'yx',3:'ix',4:'xi'}})#Drop直接导致同样的错误df.drop_duplicates()回溯(最近一次通话最后):...类型错误:不可散列类型:列表"
解决方案
#convert hte df 为 str 类型,删除重复项,然后从原始 df 中选择行.df.loc[df.astype(str).drop_duplicates().index]输出[205]:关键字 X Y0 应用 [1, 2] 是2 应用 xy yx3 学期 xx ix4 学期 yy xi#列表元素在最终结果中仍然是列表.df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']输出[207]:[1, 2]
<块引用>
将 iloc 替换为 loc.在这种特殊情况下,两者都作为index 匹配位置索引,但不通用
I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following:
error: TypeError: unhashable type: 'list'
The code I am using:
df = db.drop_duplicates()
My DB is huge and contains strings, floats, dates, NaN's, booleans, integers... Any help is appreciated.
解决方案drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results.
Setup
df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},
'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},
'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}})
#Drop directly causes the same error
df.drop_duplicates()
Traceback (most recent call last):
...
TypeError: unhashable type: 'list'
Solution
#convert hte df to str type, drop duplicates and then select the rows from original df.
df.loc[df.astype(str).drop_duplicates().index]
Out[205]:
Keyword X Y
0 apply [1, 2] yy
2 apply xy yx
3 terms xx ix
4 terms yy xi
#the list elements are still list in the final results.
df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']
Out[207]: [1, 2]
Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general
相关文章