集合的交集作为 pandas 中的列

2022-01-17 00:00:00 python pandas set intersection

问题描述

我有一个 df，例如:

I have a df such as:

df=pd.DataFrame.from_items([('i', [set([1,2,3,4]), set([1,2,3,4]), set([1,2,3,4]),set([1,2,3,4])]), ('j', [set([2,3]), set([1]), set([4]),set([3,4])])])

看起来像

>>> df i j 0 {1, 2, 3, 4} {2, 3} 1 {1, 2, 3, 4} {1} 2 {1, 2, 3, 4} {4} 3 {1, 2, 3, 4} {3, 4}

我想计算 df.i.intersection(df.j) 并将其指定为 k 列.也就是说，我想要这个:

I would like to compute df.i.intersection(df.j) and assign that to be column k. That is, I want this:

df['k']=[df.i.iloc[t].intersection(df.j.iloc[t]) for t in range(4)] >>> df.k 0 {2, 3} 1 {1} 2 {4} 3 {3, 4} Name: k, dtype: object

这个有 df.apply() 吗?实际的 df 是数百万行.

Is there a df.apply() for this? The actual df is millions of rows.

解决方案

使用 sets, lists 和 dicts in pandas 有点问题，因为最好使用标量:

Working with sets, lists and dicts in pandas is a bit problematic, because best working with scalars:

df['k'] = [x[0] & x[1] for x in zip(df['i'], df['j'])] print (df) i j k 0 {1, 2, 3, 4} {2, 3} {2, 3} 1 {1, 2, 3, 4} {1} {1} 2 {1, 2, 3, 4} {4} {4} 3 {1, 2, 3, 4} {3, 4} {3, 4}

<小时>

df['k'] = [x[0].intersection(x[1]) for x in zip(df['i'], df['j'])] print (df) i j k 0 {1, 2, 3, 4} {2, 3} {2, 3} 1 {1, 2, 3, 4} {1} {1} 2 {1, 2, 3, 4} {4} {4} 3 {1, 2, 3, 4} {3, 4} {3, 4}

应用的解决方案:

df['k'] = df.apply(lambda x: x['i'].intersection(x['j']), axis=1) print (df) i j k 0 {1, 2, 3, 4} {2, 3} {2, 3} 1 {1, 2, 3, 4} {1} {1} 2 {1, 2, 3, 4} {4} {4} 3 {1, 2, 3, 4} {3, 4} {3, 4}

相关文章