Power BI 中 Python 可视化中时间序列的最佳数据格式是什么?

2022-01-10 00:00:00 python powerbi

问题描述

截至 2018 年 8 月 9 日,Power BI 支持 Python 可视化.他们以前支持 R 可视化,但我仍然觉得这些集成有点尴尬.让我告诉你我的意思:

<小时>

假设您有一个包含时间序列数据的表,其中第一行包含名称日期"和值",内容分别是格式为 yyyy-mm-dd 的日期和一个数字:

日期、数值2017-01-12,12017-01-13,42017-01-14,22017-01-15,42017-01-16,22017-01-17,22017-01-18,22017-01-19,52017-01-20,52017-01-21,52017-01-22,52017-01-23,62017-01-24,32017-01-25,62017-01-26,62017-01-27,52017-01-28,82017-01-29,42017-01-30,2

如果您将该数据集存储为像 timerseries.csv 这样的文本文件并使用 Get Data | 导入它.文本/CSV,你会得到一个比 VISUALIZATIONS |字段,像这样:

您可以使用 VISUALIZATIONS | 检查您的表表 并获取:

有了这个设置,你应该认为你已经准备好使用这个漂亮的新功能来释放 Py VISUALIZATION 的力量了:

如果你点击它,你会得到这个:

你被告知

<块引用>

将字段拖入可视化"窗格中的值"区域以开始脚本

如果你从 Value 开始,你会在编辑器中得到这个默认设置:

如果您按照 Power BI 团队在

但这就是我目前的结局.

如果编辑器中的默认数据框共享标准数据框的功能,您应该能够引用该数据框中的列并使用此代码段轻松绘制图表:

import matplotlib.pyplot as pltplt.plot(数据集['值'])plt.show()

但是当你运行它时,它只会返回一个错误:

至少可以说细节很详细.

我也尝试过同时导入 DatesValues,并尝试使用 dataset.plot(),但似乎没有任何效果.我还尝试通过这种方式将日期层次结构分解为简单的日期:

那么,对数据格式、导入方法和/或代码片段有什么想法吗?

感谢您的任何建议!

编辑 1 - 按照 Foxan Ng 的回答:

在值"字段中添加两列:

这仍然返回一个错误

<块引用>

TypeError: from_bounds() 接受 4 个位置参数,但给出了 6 个

解决方案

我没有遇到你提到的错误.您是否将两列都放入 Values?

import matplotlib.pyplot as pltplt.plot(数据集['日期'],数据集['值'])plt.show()

<小时>

使用 M 查询更新:

让Source = Csv.Document(File.Contents("C:your-directory..	imerseries.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),#"PromoteHeaders" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"Value", Int64.Type}})在#改变类型"

As of today, August 9 2018, Power BI supports Python Visualizations. They've had support for R Visualizations before, but I still find these integrations to be a bit awkward. Let me show you what I mean:


Let's say that you have a table with time series data, where the top row containts the names 'Date' and 'Value', and the contents are dates of the form yyyy-mm-dd and a number, respectively:

Date,Value
2017-01-12,1
2017-01-13,4
2017-01-14,2
2017-01-15,4
2017-01-16,2
2017-01-17,2
2017-01-18,2
2017-01-19,5
2017-01-20,5
2017-01-21,5
2017-01-22,5
2017-01-23,6
2017-01-24,3
2017-01-25,6
2017-01-26,6
2017-01-27,5
2017-01-28,8
2017-01-29,4
2017-01-30,2

If you store that dataset as a textfile like timerseries.csv and import it using Get Data | Text/CSV, you get a table uner VISUALIZATIONS | FIELDS, like this:

You can inspect your table using VISUALIZATIONS | Table and get:

With this setup, one should think that you were all set for unleashing the power of a Py VISUALIZATION using this beautiful new feature:

If you click that, you get this:

And you're told to

Drag fields into the Values area in the Visualization pane to start scripting

If you start with Value, you get this default setup in the editor:

And if you follow the instructions given by the Power BI team in the August 2018 feature summary you should be able to make a matplotlib plot quite easily.

But this is where it ends for me at the time being.

If the default dataframe in the editor shares the features of a standard dataframe, you should be able to reference a column in that dataframe and easily make a plot with this snippet:

import matplotlib.pyplot as plt
plt.plot(dataset['Value'])
plt.show()

But when you run it, it onlu returns an error:

And the details are elaborate to say the least.

I've also tried to import both Dates and Values, and I've tried plotting the dataframe directly with dataset.plot(), but nothing seems to be working. I've also tried stripping the date hierarchy down to simple dates this way:

So, any ideas on the dataformat, import method and/or the snippet?

Thank you for any suggestions!

EDIT 1 - Following the answer from Foxan Ng:

Add both columns in the Value field:

This still returns an error edning with:

TypeError: from_bounds() takes 4 positional arguments but 6 were given

解决方案

I didn't encounter errors that you've mentioned. Have you dropped in both columns into Values?

import matplotlib.pyplot as plt
plt.plot(dataset['Date'], dataset['Value'])
plt.show()


UPDATED with M query:

let
    Source = Csv.Document(File.Contents("C:your-directory..	imerseries.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"Value", Int64.Type}})
in
    #"Changed Type"

相关文章