如何将多个 python 文件组织到一个模块中而不像一个包一样?

2022-01-13 00:00:00 python module package

问题描述

有没有办法使用 __init__.py 将多个文件组织成一个模块?

Is there a way to use __init__.py to organize multiple files into a module?

原因:模块比包更容易使用,因为它们没有那么多层的命名空间.

Reason: Modules are easier to use than packages, because they don't have as many layers of namespace.

通常它会打包,我明白了.问题出在一个包上,import thepackage"给了我一个空的命名空间.然后,用户必须要么使用from thepackage import *"(不赞成),要么确切地知道其中包含什么,然后手动将其拉出到可用的命名空间中.

Normally it makes a package, this I get. Problem is with a package, 'import thepackage' gives me an empty namespace. Users must then either use "from thepackage import *" (frowned upon) or know exactly what is contained and manually pull it out into a usable namespace.

我想要的是用户执行导入包"并拥有看起来像这样的漂亮干净的命名空间,公开与项目相关的函数和类以供使用.

What I want to have is the user do 'import thepackage' and have nice clean namespaces that look like this, exposing functions and classes relevant to the project for use.

current_module

  doit_tools/
  
   - (class) _hidden_resource_pool
   - (class) JobInfo
   - (class) CachedLookup
   - (class) ThreadedWorker
   - (Fn) util_a
   - (Fn) util_b
   - (Fn) gather_stuff
   - (Fn) analyze_stuff

维护者的工作是避免在不同的文件中定义相同的名称,当项目像我这样小时,这应该很容易.

The maintainer's job would be to avoid defining the same name in different files, which should be easy when the project is small like mine is.

如果人们可以执行 from doit_stuff import JobInfo 并让它检索类,而不是包含该类的模块,那就太好了.

It would also be nice if people can do from doit_stuff import JobInfo and have it retrieve the class, rather than a module containing the class.

如果我的所有代码都在一个巨大的文件中,这很容易,但我喜欢在事情开始变大时进行组织.我在磁盘上的内容看起来像这样:

This is easy if all my code is in one gigantic file, but I like to organize when things start getting big. What I have on disk looks sort of like this:

place_in_my_python_path/
  doit_tools/
    __init__.py
    JobInfo.py
      - class JobInfo:
    NetworkAccessors.py
      - class _hidden_resource_pool:
      - class CachedLookup:
      - class ThreadedWorker:
    utility_functions.py
      - def util_a()
      - def util_b()
    data_functions.py
      - def gather_stuff()
      - def analyze_stuff()

我只是将它们分开,因此我的文件不会很大且无法导航.它们都是相关的,尽管有人(可能是我)可能希望自己使用这些类而不导入所有内容.

I only separate them so my files aren't huge and unnavigable. They are all related, though someone (possible me) may want to use the classes by themselves without importing everything.

我已经阅读了各种主题中的许多建议,以下是我能找到的关于如何执行此操作的每个建议的情况:

I've read a number of suggestions in various threads, here's what happens for each suggestion I can find for how to do this:

如果我不使用 __init__.py,我将无法导入任何内容,因为 Python 不会从 sys.path 进入文件夹.

If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

如果我使用空白的__init__.py,当我import doit_tools 时,它是一个空的命名空间,其中没有任何内容.我的文件都没有导入,这使得它更难使用.

If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

如果我列出 __all__ 中的子模块,我可以使用(皱眉?) from thing import * 语法,但所有我的课程再次出现在不必要的命名空间障碍之后.用户必须 (1) 知道他们应该使用 from x import * 而不是 import x,(2) 手动重新洗牌,直到他们可以合理地遵守线宽样式约束.

If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

如果我将 from thatfile import X 语句添加到 __init__.py,我会更接近,但我会遇到命名空间冲突 (?) 和额外的我不想在那里的东西的名称空间.在下面的示例中,您会看到:

If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there. In the below example, you'll see that:

  1. JobInfo 类覆盖了名为 JobInfo 的模块对象,因为它们的名称相同.Python 可以通过某种方式解决这个问题,因为 JobInfo 的类型是 <class 'doit_tools.JobInfo.JobInfo'>.(doit_tools.JobInfo 是一个类,但 doit_tools.JobInfo.JobInfo 是同一个类...这很纠结,看起来很糟糕,但似乎没有破坏任何东西.)
  2. 每个文件名都进入了 doit_tools 命名空间,如果有人正在查看模块的内容,这会使查看时更加混乱.我希望 doit_tools.utility_functions.py 保存一些代码,而不是定义新的命名空间.
  1. The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type <class 'doit_tools.JobInfo.JobInfo'>. (doit_tools.JobInfo is a class, but doit_tools.JobInfo.JobInfo is that same class... this is tangled and seems very bad, but doesn't seem to break anything.)
  2. Each filename made its way into the doit_tools namespace, which makes it more confusing to look through if anyone is looking at the contents of the module. I want doit_tools.utility_functions.py to hold some code, not define a new namespace.

.

current_module

  doit_tools/
  
   - (module) JobInfo
      
       - (class) JobInfo
   - (class) JobInfo
   - (module) NetworkAccessors
      
       - (class) CachedLookup
       - (class) ThreadedWorker
   - (class) CachedLookup
   - (class) ThreadedWorker
   - (module) utility_functions
      
       - (Fn) util_a
       - (Fn) util_b
   - (Fn) util_a
   - (Fn) util_b
   - (module) data_functions
      
       - (Fn) gather_stuff
       - (Fn) analyze_stuff
   - (Fn) gather_stuff
   - (Fn) analyze_stuff

当他们执行from doit_tools import JobInfo"时,仅导入数据抽象类的人会得到与他们预期不同的东西:

Also someone importing just the data abstraction class would get something different than they expect when they do 'from doit_tools import JobInfo':

current_namespace

 JobInfo (module)
  
   -JobInfo (class)

instead of:

current_namespace

 - JobInfo (class)

那么,这只是组织 Python 代码的错误方式吗?如果不是,那么将相关代码拆分但仍以类似模块的方式收集它的正确方法是什么?

So, is this just a wrong way to organize Python code? If not, what is a correct way to split related code up but still collect it in a module-like way?

也许最好的情况是执行从 doit_tools 导入 JobInfo"对于使用该软件包的人来说有点混乱?

Maybe the best case scenario is that doing 'from doit_tools import JobInfo' is a little confusing for someone using the package?

也许是一个名为api"的 python 文件,以便使用该代码的人执行以下操作?:

Maybe a python file called 'api' so that people using the code do the following?:

import doit_tools.api
from doit_tools.api import JobInfo

=============================================

============================================

回复评论的示例:

在 python 路径中的文件夹 'foo' 中获取以下包内容.

Take the following package contents, inside folder 'foo' which is in python path.

foo/__init__.py

__all__ = ['doit','dataholder','getSomeStuff','hold_more_data','SpecialCase']
from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
from specialcase import SpecialCase

foo/specialcase.py

class SpecialCase:
    pass

foo/more.py

def getSomeStuff():
    pass

class hold_more_data(object):
    pass

foo/stuff.py

def doit():
    print "I'm a function."

class dataholder(object):
    pass

这样做:

>>> import foo
>>> for thing in dir(foo): print thing
... 
SpecialCase
__builtins__
__doc__
__file__
__name__
__package__
__path__
another_class
dataholder
descriptive_name
doit
getSomeStuff
hold_more_data
specialcase

another_classdescriptive_name 有杂乱无章的东西,并且还有额外的副本,例如doit() 在它们的命名空间下.

another_class and descriptive_name are there cluttering things up, and also have extra copies of e.g. doit() underneath their namespaces.

如果我在名为 Data.py 的文件中有一个名为 Data 的类,当我执行从数据导入数据"时,我会遇到命名空间冲突,因为 Data 是当前命名空间中的一个类,它位于模块 Data 中,不知何故是也在当前命名空间中.(但 Python 似乎能够处理这个问题.)

If I have a class named Data inside a file named Data.py, when I do 'from Data import Data' then I get a namespace conflict because Data is a class in the current namespace that is inside module Data, somehow is also in the current namespace. (But Python seems to be able to handle this.)


解决方案

你可以这样做,但这并不是一个好主意,而且你正在与 Python 模块/包的工作方式作斗争.通过在 __init__.py 中导入适当的名称,您可以使它们在包命名空间中可访问.通过删除模块名称,您可以使它们无法访问.(有关为什么需要删除它们,请参阅 这个问题).所以你可以接近你想要的东西(在 __init__.py 中):

You can sort of do it, but it's not really a good idea and you're fighting against the way Python modules/packages are supposed to work. By importing appropriate names in __init__.py you can make them accessible in the package namespace. By deleting module names you can make them inaccessible. (For why you need to delete them, see this question). So you can get close to what you want with something like this (in __init__.py):

from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
del another_class, descriptive_name
__all__ = ['doit', 'dataholder', 'getSomeStuff', 'hold_more_data']

但是,这会破坏后续import package.another_class 的尝试.一般来说,你不能从一个 package.module 中导入任何东西,而不使 package.module 可作为对该模块的可导入引用访问(尽管使用 __all__你可以阻止从包导入模块).

However, this will break subsequent attempts to import package.another_class. In general, you can't import anything from a package.module without making package.module accessible as an importable reference to that module (although with the __all__ you can block from package import module).

更一般地说,通过按类/函数拆分代码,您正在使用 Python 包/模块系统.Python 模块通常应该包含您想要作为一个单元导入的内容.为方便起见,直接在顶级包命名空间中导入子模块组件并不少见,但反过来 --- 试图隐藏子模块并允许通过顶级包命名空间仅访问其内容---会导致问题.此外,尝试清理"模块的包命名空间没有任何好处.这些模块应该在包命名空间中;那是他们所属的地方.

More generally, by splitting up your code by class/function you are working against the Python package/module system. A Python module should generally contain stuff you want to import as a unit. It's not uncommon to import submodule components directly in the top-level package namespace for convenience, but the reverse --- trying to hide the submodules and allow access to their contents only through the top-level package namespace --- is going to lead to problems. In addition, there is nothing to be gained by trying to "cleanse" the package namespace of the modules. Those modules are supposed to be in the package namespace; that's where they belong.

相关文章