帕累托分布：R与Python-不同的结果

2022-03-02 00:00:00 python scipy r data-science

问题描述

我正在尝试使用scipy.stats在Python中复制R的fitdist()结果(引用，不能修改R代码)。结果完全不同。有人知道为什么吗？如何在Python中复制R的结果？

data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]

R代码：

library(fitdistrplus)
library(actuar)

fitdist(data, 'pareto', "mle")$estimate

R结果：

       shape        scale 
    0.760164 10066.274196

Python代码

st.pareto.fit(data, floc=0, scale=1)

Python结果

(0.4019785013487883, 0, 1399.0339889072732)

解决方案

出现差异的主要原因是pdf不同。

Python

在pythonst.pareto.fit()中使用通过此pdf：

定义的帕累托分布

import scipy.stats as st
data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]
print(st.pareto.fit(data, floc = 0, scale = 1))

# (0.4019785013487883, 0, 1399.0339889072732)

R

鉴于您的R代码使用的是带有此pdf的Pareto：

library(fitdistrplus)
library(actuar)
data <- c(2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8)
fitdist(data, 'pareto', "mle")$estimate

#    shape        scale 
#    0.760164 10066.274196

使R镜像成为Python

要使R使用与st.pareto.fit()相同的分布，请使用actuar::dpareto1()：

library(fitdistrplus)
library(actuar)
data <- c(2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8)
fitdist(data, 'pareto1', "mle")$estimate

#     shape          min 
#   0.4028921 1399.0284977

将Python镜像设置为R

这里是用Python近似您的R代码的一种方法：

import numpy as np
from scipy.optimize import minimize

def dpareto(x, shape, scale):
    return shape * scale**shape / (x + scale)**(shape + 1)

def negloglik(x):
    data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]
    return -np.sum([np.log(dpareto(i, x[0], x[1])) for i in data])

res = minimize(negloglik, (1, 1), method='Nelder-Mead', tol=2.220446e-16)
print(res.x)

# [7.60082820e-01 1.00691719e+04]

相关文章