在 Pandas DataFrame 中评估 if-then-else 块中的多个条件
问题描述
我想通过在 if-then-else 块中评估多个条件来在 Pandas DataFrame 中创建一个新列.
如果 events.hour <= 6:事件['time_slice'] = '晚上'elif events.hour <= 12:事件['time_slice'] = '早上'elif events.hour <= 18:事件['time_slice'] = '下午'elif events.hour <= 23:事件['time_slice'] = '晚上'
当我运行它时,我收到以下错误:
<块引用>ValueError:Series 的真值不明确.使用 a.empty、a.bool()、a.item()、a.any() 或 a.all().
所以我尝试通过添加如下所示的任何语句来解决这个问题:
if (events.hour <= 6).any():事件['time_slice'] = '晚上'elif (events.hour <= 12).any():事件['time_slice'] = '早上'elif (events.hour <= 18).any():事件['time_slice'] = '下午'elif (events.hour <= 23).any():事件['time_slice'] = '晚上'
现在我没有收到任何错误,但是当我检查 time_slice 的唯一值时,它只显示 'night'
np.unique(events.time_slice)
<块引用>
array(['night'], dtype=object)
我该如何解决这个问题,因为我的数据包含应该是早上"、下午"或晚上"的样本.谢谢!
解决方案你可以使用 pd.cut() 方法来对您的数据进行分类:
演示:
在 [66]: events = pd.DataFrame(np.random.randint(0, 23, 10), columns=['hour'])在 [67] 中:事件出[67]:小时0 51 172 123 24 205 226 207 118 149 8在 [71] 中: events['time_slice'] = pd.cut(events.hour, bins=[-1, 6, 12, 18, 23], labels=['night','morning','afternoon','晚上'])在 [72] 中:事件出[72]:小时时间片0 5 晚1月17日下午2 12 上午3 2 晚4 20 晚5月22日晚6月20日晚上7 月 11 日上午8月14日下午9 8 上午
I want to create a new column in a Pandas DataFrame by evaluating multiple conditions in an if-then-else block.
if events.hour <= 6:
events['time_slice'] = 'night'
elif events.hour <= 12:
events['time_slice'] = 'morning'
elif events.hour <= 18:
events['time_slice'] = 'afternoon'
elif events.hour <= 23:
events['time_slice'] = 'evening'
When I run this, I get the error below:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I tried to solve this by adding the any statement like shown below:
if (events.hour <= 6).any():
events['time_slice'] = 'night'
elif (events.hour <= 12).any():
events['time_slice'] = 'morning'
elif (events.hour <= 18).any():
events['time_slice'] = 'afternoon'
elif (events.hour <= 23).any():
events['time_slice'] = 'evening'
Now I do not get any error, but when I check the unique values of time_slice, it only shows 'night'
np.unique(events.time_slice)
array(['night'], dtype=object)
How can I solve this, because my data contains samples that should get 'morning', 'afternoon' or 'evening'. Thanks!
解决方案you can use pd.cut() method in order to categorize your data:
Demo:
In [66]: events = pd.DataFrame(np.random.randint(0, 23, 10), columns=['hour'])
In [67]: events
Out[67]:
hour
0 5
1 17
2 12
3 2
4 20
5 22
6 20
7 11
8 14
9 8
In [71]: events['time_slice'] = pd.cut(events.hour, bins=[-1, 6, 12, 18, 23], labels=['night','morning','afternoon','evening'])
In [72]: events
Out[72]:
hour time_slice
0 5 night
1 17 afternoon
2 12 morning
3 2 night
4 20 evening
5 22 evening
6 20 evening
7 11 morning
8 14 afternoon
9 8 morning
相关文章