Pandas 对数据框的布尔比较
问题描述
当我对数据框中的单个元素进行比较时出现错误,但我不明白为什么.
I am getting the error when I make a comparison on a single element in a dataframe, but I don't understand why.
我有一个数据框 df,其中包含许多客户的时间序列数据,其中包含一些空值:
I have a dataframe df with timeseries data for a number of customers, with some null values within it:
df.head()
8143511 8145987 8145997 8146001 8146235 8147611
2012-07-01 00:00:00 NaN NaN NaN NaN NaN NaN
2012-07-01 00:30:00 0.089 NaN 0.281 0.126 0.190 0.500
2012-07-01 01:00:00 0.090 NaN 0.323 0.141 0.135 0.453
2012-07-01 01:30:00 0.061 NaN 0.278 0.097 0.093 0.424
2012-07-01 02:00:00 0.052 NaN 0.278 0.158 0.170 0.462
在我的脚本中,行if pd.isnull(df[[customer_ID]].loc[ts]):
产生错误:
In my script, the line
if pd.isnull(df[[customer_ID]].loc[ts]):
generates an error:
ValueError: Series 的真值不明确.使用 a.empty、a.bool()、a.item()、a.any() 或 a.all().
但是,如果我在脚本行设置断点,并且当脚本停止时,我会在控制台中输入:
However, if I put a breakpoint on the line of script, and when the script stops I type this into the console:
pd.isnull(df[[customer_ID]].loc[ts])
输出是:
8143511 True
Name: 2012-07-01 00:00:00, dtype: bool
如果我允许脚本从该点继续,则会立即生成错误.
If I allow the script to continue from that point, the error is generated immediately.
如果布尔表达式可以求值并且值为True
,为什么它会在if 表达式中产生错误?这对我来说毫无意义.
If the boolean expression can be evaluated and has the value True
, why does it generate an error in the if expression? This makes no sense to me.
解决方案
第二组 []
正在返回一个我误认为是单个值的系列.最简单的解决方案是删除 []
:
The second set of []
was returning a series which I mistook for a single value. The simplest solution is to remove []
:
if pd.isnull(df[customer_ID].loc[ts]):
pass
相关文章