将数据框转换为元组列表字典
问题描述
我有一个如下所示的数据框
I have a dataframe that looks like the following
user item
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 1
3 1
4 1
并想实现如下结构:
dict-> list of tuples
user-> (item, rating)
b80344d063b5ccb3212f76538f3d9e43d87dca9e -> list((The Cove - Jack
Johnson, 1), ... , )
我能做到:
item_set = dict((user, set(items)) for user, items in
data.groupby('user')['item'])
但这只会让我半途而废.如何从 groupby 中获取相应的评分"值?
But that only gets me halfways. How do I get the corresponding "rating" value from the groupby?
解决方案
设置user
为索引,使用df.apply
转换成元组,使用分组索引df.groupby(level=0)
并使用 dfGroupBy.agg
获取列表并使用 df.to_dict
转换为字典:
Set user
as index, convert to tuple using df.apply
, groupby index using df.groupby(level=0)
and get a list using dfGroupBy.agg
and convert to dictionary using df.to_dict
:
In [1417]: df
Out[1417]:
user item
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 2
3 2
4 2
In [1418]: df.set_index('user').apply(tuple, 1)
.groupby(level=0).agg(lambda x: list(x.values))
.to_dict()
Out[1418]:
{'b80344d063b5ccb3212f76538f3d9e43d87dca9e': [('The Cove - Jack Johnson', 1),
('Entre Dos Aguas - Paco De Lucia', 2),
('Stronger - Kanye West', 2),
('Constellations - Jack Johnson', 2),
('Learn To Fly - Foo Fighters', 2)]}
相关文章