如何使用 Python 在 PowerBI 中制作可重现的数据样本?

2022-01-10 00:00:00 python random powerbi

问题描述

这是一个自我回答的帖子.为什么?因为缺乏数据样本，Power BI 中的许多问题都没有得到解答.此外，许多人似乎想知道如何使用 Python 在 Power BI 中编辑数据表.当然，世界需要在 Power BI 中更广泛地使用 Python.有些人认为您必须将 Python 片段应用到在其他地方加载的现有表.我对这篇文章的回答将向您展示如何在一个空的 Power BI 文件中使用几行代码构建一个(相当大的)数据样本.

那么，如何在 Power BI 中使用 Python 构建数据样本并对其进行更改?

解决方案

我将向您展示如何构建包含分类值和数值的 10000 行的数据集.我正在使用 Python 库

现在，使用 Transform >运行 Python 脚本，插入上面的代码片段，然后点击 OK 得到这个:

您现在有一个包含 2 列和 3 行的初步表格.这是在 Power BI 中实现 Python 的一个非常简洁的细节.这是运行代码片段后可供您使用的三个不同数据集.Dataset 是默认构造的，但是因为我们从一个空表开始，所以它是空的.如果我们从一些其他数据开始，Run Python Script 的第一行解释了这个表的用途# 'dataset' 保存了这个脚本的输入数据.它是以 pandas 数据框的形式构建的.最后一个表 df_metadata 只是我们真正感兴趣的数据集的简要描述:df_dataset，但我将其添加到混合中是为了说明所有您在片段中制作的数据框将可供您使用.您通过单击名称旁边的 Table 来选择要继续处理的表格.

就是这样！您现在有一个混合数据类型表，可以继续使用 Python 或 Power BI 本身进行处理:

从这里您可以:

使用任何菜单选项继续处理您的桌子
插入另一个 Python 脚本
复制您的原始数据框并通过右键单击 Queries 下的 Table 创建一个 Reference 继续处理另一个版本:

This is a self-answered post. Why? Because many questions in Power BI go unanswered because of lacking data samples. Also, many seem to wonder how to edit data tables in Power BI using Python. And, of course, the world needs a more wide-spread usage of Python in Power BI. Some think that you have to apply a Python snippet to an existing table loaded elsewhere. My answer to this post will show you how to build a (fairly big) data sample with a few lines of code in an otherwise empty Power BI file.

So, how can you build a data sample and make changes to it using Python in Power BI?

解决方案

I'll show you how to build a dataset of 10000 rows that contains both categorical and numerical values. I'm using the Python libraries numpy and pandas for the data generation and table operations, respectively. The snippet below simply draws a random element from two lists 10000 times to build two columns with a few street and city names, and adds a list of random numbers into the mix. Then I'm using pandas to organize the data in a dataframe. Using Python in the Power BI Power Query Editor, your input has to be a table, and your output has to be a pandas dataframe.

Python snippet:

import numpy as np
import pandas as pd

np.random.seed(123)
streets=['Broadway', 'Bowery', 'Houston Street']
cities=['New York', 'Chicago', 'Baltimore']

rows = 1000

lst_cities=np.random.choice(cities,rows).tolist()
lst_streets=np.random.choice(streets,rows).tolist()
lst_numbers= np.random.randint(low=0, high=100, size=rows).tolist()
df_dataset=pd.DataFrame({'City':lst_cities,
                      'Street':lst_streets,
                      'ID':lst_numbers})
df_metadata = pd.DataFrame([df_dataset.shape])

Power BI:

In Power BI Desktop, click Enter Data to go to the Power Query Editor. In the following dialog window, do absolutely nothing but clicking OK. The result is an empty table and two steps under Applied steps:

Now, use Transform > Run Python Script, insert the snippet above and click OK to get this:

You now have a preliminary table with 2 columns and 3 rows. And this is a pretty neat detail of the implementation of Python in Power BI. These are three different datasets that are made available to you after running your snippet. Dataset is constructed by default, but is empty since we started out with an empty table. If we started out with some other data, the first line of the Run Python Script explains the purpose of this table # 'dataset' holds the input data for this script. And it is constructed in the form of a pandas dataframe. The last table df_metadata is only a brief description of the dataset we're really interested in: df_dataset, but I've added it to the mix in order to illustrate that all dataframes made by you in your snippet will be available to you. You chose which table to continue working on by clicking Table next to the name.

And that's it! You now have a table of mixed datatypes to keep working on either using Python or Power BI itself:

From here you can:

Keep working on your table using any menu option
Insert another Python script
Duplicate your original dataframe and keep working on another version by creating a Reference by right-clicking Table under Queries:

相关文章