pd merge和df merge有什么区别
In my voluntary role providing online technical support for www.dataquest.io, I come across numerous questions that allow me to dive deeper into interesting questions I usually skim through.
在我为www.dataquest.io提供在线技术支持的自愿角色中,我遇到了许多问题,这些问题使我能够更深入地研究通常平时浏览的有趣问题。
Today, the question is: What’s the difference between left_df.merge(right_df) vs pd.merge(left_df, right_df)?
今天的问题是: left_df.merge(right_df)与pd.merge(left_df,right_df)有什么区别?
The short answer is left_df.merge()
calls pd.merge()
.
简短的答案是left_df.merge()
调用pd.merge()
。
The former is used because it allows method chaining, analogous to the %>%
pipe operator in R which allows you to write and read data processing code from left to right, such as left_df.merge(right_df).merge(right_df2)
. If you had to do pd.merge(), this is not the chaining style but wrapping style which ends up with an ugly pd.merge(pd.merge(left_df,right_df),right_df2)
if you see where this is going.
使用前者是因为它允许方法链接,类似于R中的%>%
管道运算符,后者允许您从左到右编写和读取数据处理代码,例如left_df.merge(right_df).merge(right_df2)
。 如果必须执行pd.merge(),那么这不是链接样式,而是包装样式,如果您看到此情况pd.merge(pd.merge(left_df,right_df),right_df2)
以丑陋的pd.merge(pd.merge(left_df,right_df),right_df2)
。
Now let’s go down the rabbit hole to see what’s going on.
现在,让我们沿着兔子洞走下去,看看发生了什么。
First, when you see pd.merge
, it actually, means pandas.merge
, which means you have done import pandas
. When youimport
something, the __init__.py
file of that module name (pandas
in this question) is run.
首先,当您看到pd.merge
,它实际上意味着pandas.merge
,这意味着您已经完成了import pandas
。 import
某些内容时,将运行该模块名称的__init__.py
文件(此问题中的pandas
)。
The main purpose of all these __init__.py
files is to organize the API, and to allow the user to type a shorter import code by importing the middle packages for you, so you can write pandas.merge()
once you import pandas
rather than requiring from pandas.core.reshape.merge import merge
first before you use the merge()
function.
所有这些__init__.py
文件的主要目的是组织API,并允许用户通过为您导入中间包来输入较短的导入代码,因此,一旦import pandas
即可编写pandas.merge()
而不是在使用merge()
函数之前,首先需要from pandas.core.reshape.merge import merge
。
Now let’s see what I mean by “import the middle packages for you” If you open https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/__init__.py#L129-L143, you will see how it imports many things, one line of which is from pandas.core.reshape.api
(Figure 1), in that block merge
is imported.
现在,让我们看看“为您导入中间包”的含义是什么:如果打开https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/__init__.py#L129-L143 ,您将看看它是如何导入许多东西的,其中一行from pandas.core.reshape.api
(图1),在该块merge
是导入的。
Figure 1 图1
This is what allows you to callpd.merge
directly, but let’s get to the bottom of this.Going into pandas.core.reshape.api
https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/reshape/api.py you see from pandas.core.reshape.merge import merge
.(Figure 2)
这是允许您直接调用pd.merge
,但让我们深入到此。进入pandas.core.reshape.api
https://github.com/pandas-dev/pandas/blob/v0.25.1/ from pandas.core.reshape.merge import merge
看到的pandas / core / reshape / api.py (图2)
Figure 2 图2
Now you see where the previous merge
of from pandas.core.reshape.api
came from.
现在,您将看到from pandas.core.reshape.api
的先前merge
from pandas.core.reshape.api
。
Finally let’s get to the source, going into pandas.core.reshape.merge
at https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/reshape/merge.py#L53 you see def merge
.(Figure 4)
最后,让我们进入源代码,进入https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/reshape/merge.py#L53进入pandas.core.reshape.merge
def merge
。(图4)
Figure 4 图4
Now let’s see what the chaining style of coding, left_df.merge
is doing, from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html, click source
to go https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/frame.py#L7304-L7335 to see def merge(self
(Figure 5), the self
which tells you this is a class (DataFrame in this case) method, which later imports from pandas.core.reshape.merge import merge
, and passes all your parameters back into the merge
from pandas.core.reshape.merge
, with the only difference being it now automatically passes self
as the left
parameter for you.
现在,从https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html ,单击source
转到https , left_df.merge
的编码链接方式是什么。 ://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/frame.py#L7304-L7335看一下def merge(self
(图5),告诉您这是一个类的self
(在这种情况下为DataFrame)方法,该方法随后from pandas.core.reshape.merge import merge
,然后将所有参数从pandas.core.reshape.merge
传递回merge
,唯一的不同是它现在会自动传递self
作为您的left
参数。
Figure 5 图5
You can compare the 2 function signatures of def merge
from left_df.merge
here and the earlier pd.merge
discussion to see they are exactly the same merge
.
你可以比较的2个函数签名def merge
从left_df.merge
这里和前面的pd.merge
讨论,看看他们是完全一样的merge
。
How did I even know to start searching from the keyword merge
? Actually, I began the search from the source code of left_df.merge
first, but I felt it’s better to explain the lowest level code first, then introduce the idea of self
substituting left
parameter so more complex ideas build on simpler ones.
我什至不知道如何从关键字merge
开始搜索? 实际上,我首先是从left_df.merge
的源代码开始搜索的,但是我觉得最好先解释最低级别的代码,然后再引入self
替代left
参数的想法,这样,更复杂的想法将基于更简单的想法。
I hope this article has inspired others to not fear the source, but encourage curiosity in how things really work beneath the hood, how the API is designed, and in so doing possibly contribute to Pandas in future.
我希望本文能激发其他人不要害怕消息来源,而是鼓励人们对事物的真正运作方式,API的设计方式以及未来这样做可能对熊猫做出贡献的好奇心。
翻译自: https://towardsdatascience.com/whats-the-difference-between-pd-merge-and-df-merge-ab387bc20a2e
原文地址: https://blog.csdn.net/weixin_26711425/article/details/108935594
本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
相关文章