ValueError:endog必须在单位间隔内

2022-02-26 00:00:00 python statsmodels regression

问题描述

在使用statsmodel时,我收到了这个奇怪的错误:ValueError: endog must be in the unit interval.有人能给我提供有关此错误的更多信息吗?谷歌帮不上忙。

产生错误的代码:

"""
Multiple regression with dummy variables. 
"""

import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np

df = pd.read_csv('cost_data.csv')
df.columns = ['Cost', 'R(t)', 'Day of Week']
dummy_ranks = pd.get_dummies(df['Day of Week'], prefix='days')
cols_to_keep = ['Cost', 'R(t)']
data = df[cols_to_keep].join(dummy_ranks.ix[:,'days_2':])
data['intercept'] = 1.0

print(data)

train_cols = data.columns[1:]
logit = sm.Logit(data['Cost'], data[train_cols])

result = logit.fit()

print(result.summary())

和回溯:

Traceback (most recent call last):
  File "multiple_regression_dummy.py", line 20, in <module>
    logit = sm.Logit(data['Cost'], data[train_cols])
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/statsmodels/discrete/discrete_model.py", line 404, in __init__
    raise ValueError("endog must be in the unit interval.")
ValueError: endog must be in the unit interval.

解决方案

当我的目标列的值大于1时,我收到此错误。 请确保您的目标列介于0和1之间(这是Logistic回归所必需的),然后重试。 例如,如果目标列的值为1-5,则将4和5设为正类别,将1,2,3设为负类别。希望这能有所帮助。

相关文章