混合效应Logistic回归

2022-04-19 00:00:00 python logistic-regression mixed-models

问题描述

我正在尝试在Python语言中实现混合效果逻辑回归。作为比较，我在R中使用lme4包中的glmer函数。

我发现statsmodels模块有一个BinomialBayesMixedGLM应该能够适应这样的模型。但是，我遇到了一些问题：

我发现statsmodels函数的文档不完全有帮助或不清楚，因此我不完全确定如何正确使用该函数。
到目前为止，我的尝试还没有产生与我在R中使用glmer拟合模型时得到的结果相同的结果。
我希望BinomialBayesMixedGLM函数不计算p值，因为它是贝叶斯函数，但我似乎想不出如何访问参数的完整后验分布。

作为测试用例，我使用可用的泰坦尼克型数据集here。

import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb

titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))

r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()

#           Type    Post. Mean  Post. SD       SD SD (LB) SD (UB)
# Intercept M           3.1623    0.3616            
# Age       M          -0.0380    0.0061            
# Pclass    V           0.0754    0.5669    1.078   0.347   3.351

但是，除了年龄的斜率之外，这似乎与我在R中得到的glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial")：

不匹配

Random effects:
 Groups Name        Variance Std.Dev.
 Pclass (Intercept) 0.8563   0.9254  
Number of obs: 887, groups:  Pclass, 3

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.961780   0.573402   1.677   0.0935 .  
Age         -0.038708   0.006243  -6.200 5.65e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

那么，我在使用python创建模型时犯了什么错误呢？而且，一旦解决了这个问题，我如何提取后验或p值呢？最后，他们是否有更类似于R中的实现的混合效果逻辑回归的Python实现？

解决方案

只需对Python执行类似的操作，如评论Pymer4所示似乎提供了一种合适的方法(尤其是如果您熟悉R的话)。使用问题：

中提到的示例数据集‘tianic’

from pymer4.models import Lmer

model = Lmer("Survived  ~ Age  + (1|Pclass)",
             data=titanic, family = 'binomial')

print(model.fit())

输出：

Formula: Survived~Age+(1|Pclass)

Family: binomial     Inference: parametric

Number of observations: 887  Groups: {'Pclass': 3.0}

Log-likelihood: -525.812     AIC: 1057.624

Random effects:

               Name    Var    Std
Pclass  (Intercept)  0.856  0.925

No random effect correlations specified

Fixed effects:

             Estimate  2.5_ci  97.5_ci     SE     OR  OR_2.5_ci  OR_97.5_ci  
(Intercept)     0.962  -0.162    2.086  0.573  2.616       0.85       8.050   
Age            -0.039  -0.051   -0.026  0.006  0.962       0.95       0.974   

              Prob  Prob_2.5_ci  Prob_97.5_ci  Z-stat  P-val  Sig  
(Intercept)  0.723        0.460         0.889   1.677  0.093    .  
Age          0.490        0.487         0.493  -6.200  0.000  ***

作为附加注释(抱歉偏离了主要问题)，我使用Python 3.8.8在Ubuntu 20.04机器上运行了这段代码。不确定是否有其他人遇到此问题，但当使用Pymer4运行上面的模型时，包抛出一个错误(当我尝试从Pymer4文档here中复制类似的模型时出现相同的错误)：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

引用/pymer4/Models/Lmer.py程序包文件中的此行：

--> 444         if design_matrix:

我通过将其更改为(不确定这是最优雅还是最安全的方法，很高兴在这里更正)解决了这个问题：

if design_matrix.any():

这似乎使程序包运行，并在我测试的少数情况下提供R等价结果。

以下评论中建议的更好方法可能是：

if design_matrix is not None:

感谢Giacomo Petrillo指出这一点，但我没有测试过这一点。

相关文章