在 Pandas DataFrame 中评估 if-then-else 块中的多个条件

2022-01-21 00:00:00 python pandas dataframe conditional

问题描述

我想通过在 if-then-else 块中评估多个条件来在 Pandas DataFrame 中创建一个新列.

如果 events.hour <= 6:事件['time_slice'] = '晚上'elif events.hour <= 12:事件['time_slice'] = '早上'elif events.hour <= 18:事件['time_slice'] = '下午'elif events.hour <= 23:事件['time_slice'] = '晚上'

当我运行它时,我收到以下错误:

<块引用>

ValueError:Series 的真值不明确.使用 a.empty、a.bool()、a.item()、a.any() 或 a.all().

所以我尝试通过添加如下所示的任何语句来解决这个问题:

if (events.hour <= 6).any():事件['time_slice'] = '晚上'elif (events.hour <= 12).any():事件['time_slice'] = '早上'elif (events.hour <= 18).any():事件['time_slice'] = '下午'elif (events.hour <= 23).any():事件['time_slice'] = '晚上'

现在我没有收到任何错误,但是当我检查 time_slice 的唯一值时,它只显示 'night'

np.unique(events.time_slice)

<块引用>

array(['night'], dtype=object)

我该如何解决这个问题,因为我的数据包含应该是早上"、下午"或晚上"的样本.谢谢!

解决方案

你可以使用 pd.cut() 方法来对您的数据进行分类:

演示:

在 [66]: events = pd.DataFrame(np.random.randint(0, 23, 10), columns=['hour'])在 [67] 中:事件出[67]:小时0 51 172 123 24 205 226 207 118 149 8在 [71] 中: events['time_slice'] = pd.cut(events.hour, bins=[-1, 6, 12, 18, 23], labels=['night','morning','afternoon','晚上'])在 [72] 中:事件出[72]:小时时间片0 5 晚1月17日下午2 12 上午3 2 晚4 20 晚5月22日晚6月20日晚上7 月 11 日上午8月14日下午9 8 上午

I want to create a new column in a Pandas DataFrame by evaluating multiple conditions in an if-then-else block.

if events.hour <= 6:
    events['time_slice'] = 'night'
elif events.hour <= 12:
    events['time_slice'] = 'morning'
elif events.hour <= 18:
    events['time_slice'] = 'afternoon'
elif events.hour <= 23:
    events['time_slice'] = 'evening'

When I run this, I get the error below:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So I tried to solve this by adding the any statement like shown below:

if (events.hour <= 6).any():
    events['time_slice'] = 'night'
elif (events.hour <= 12).any():
    events['time_slice'] = 'morning'
elif (events.hour <= 18).any():
    events['time_slice'] = 'afternoon'
elif (events.hour <= 23).any():
    events['time_slice'] = 'evening'

Now I do not get any error, but when I check the unique values of time_slice, it only shows 'night'

np.unique(events.time_slice)

array(['night'], dtype=object)

How can I solve this, because my data contains samples that should get 'morning', 'afternoon' or 'evening'. Thanks!

解决方案

you can use pd.cut() method in order to categorize your data:

Demo:

In [66]: events = pd.DataFrame(np.random.randint(0, 23, 10), columns=['hour'])

In [67]: events
Out[67]:
   hour
0     5
1    17
2    12
3     2
4    20
5    22
6    20
7    11
8    14
9     8

In [71]: events['time_slice'] = pd.cut(events.hour, bins=[-1, 6, 12, 18, 23], labels=['night','morning','afternoon','evening'])

In [72]: events
Out[72]:
   hour time_slice
0     5      night
1    17  afternoon
2    12    morning
3     2      night
4    20    evening
5    22    evening
6    20    evening
7    11    morning
8    14  afternoon
9     8    morning

相关文章