我的 Python 进程在哪些 CPU 内核上运行?
问题描述
设置
我用 Python(在 Windows PC 上)编写了一个相当复杂的软件.我的软件基本上启动了两个 Python 解释器 shell.当您双击 main.py
文件时,第一个 shell 启动(我想).在该 shell 中,其他线程以下列方式启动:
# 启动 TCP_threadTCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))TCP_thread.start()# 启动 UDP_threadUDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))TCP_thread.start()
Main_thread
启动一个 TCP_thread
和一个 UDP_thread
.尽管它们是单独的线程,但它们都在一个 Python shell 中运行.
Main_thread
也启动一个子进程.这是通过以下方式完成的:
p = subprocess.Popen(['python', mySubprocessPath], shell=True)
从 Python 文档中,我了解到此子进程在单独的 Python 解释器会话/shell 中同时 (!) 运行.此子进程中的 Main_thread
完全专用于我的 GUI.GUI 为其所有通信启动一个 TCP_thread
.
我知道事情变得有些复杂.因此,我在此图中总结了整个设置:
<小时>我有几个关于此设置的问题.我会在这里列出它们:
问题 1 [已解决]
Python 解释器是否一次只使用一个 CPU 内核来运行所有线程?换句话说,Python 解释器会话 1
(从图中)是否会运行所有 3 个线程(Main_thread
、TCP_thread
和 UDP_thread
) 在一个 CPU 内核上?
回答:是的,这是真的.GIL(全局解释器锁)确保所有线程一次在一个 CPU 内核上运行.
问题 2 [尚未解决]
我有办法跟踪它是哪个 CPU 内核吗?
问题 3 [部分解决]
对于这个问题,我们忘记了 threads,但我们关注 Python 中的 subprocess 机制.启动一个新的子进程意味着启动一个新的 Python 解释器instance.这是正确的吗?
回答:是的,这是正确的.起初对于以下代码是否会创建新的 Python 解释器实例存在一些混淆:
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
问题已得到澄清.这段代码确实启动了一个新的 Python 解释器实例.
Python 是否足够聪明,可以让单独的 Python 解释器实例在不同的 CPU 内核上运行?有没有办法跟踪哪一个,也许还有一些零星的打印语句?
问题 4 [新问题]
社区讨论提出了一个新问题.生成新进程时(在新的 Python 解释器实例中)显然有两种方法:
# 方法 1(a)p = subprocess.Popen(['python', mySubprocessPath], shell = True)# 方法 1(b) (J.F. Sebastian)p = subprocess.Popen([sys.executable, mySubprocessPath])# 方法 2p = multiprocessing.Process(target=foo, args=(q,))
第二种方法有一个明显的缺点,它只针对一个函数——而我需要打开一个新的 Python 脚本.无论如何,这两种方法在实现的目标上是否相似?
解决方案问: Python 解释器是否一次只使用一个 CPU 内核来运行所有线程?
没有.GIL 和 CPU 亲和性是不相关的概念.GIL 可以在阻塞 I/O 操作期间释放,无论如何在 C 扩展中进行长时间的 CPU 密集型计算.
如果一个线程在 GIL 上被阻塞;它可能不在任何 CPU 内核上,因此可以公平地说纯 Python 多线程代码在 CPython 实现中一次只能使用一个 CPU 内核.
<块引用>问: 换句话说,Python 解释器会话 1(从图中)是否会在一个 CPU 内核上运行所有 3 个线程(Main_thread、TCP_thread 和 UDP_thread)?
我认为 CPython 不会隐式管理 CPU 关联性.它可能依赖于操作系统调度程序来选择在哪里运行线程.Python 线程是在真正的 OS 线程之上实现的.
<块引用>问:或者 Python 解释器能够将它们分布在多个内核上吗?
要找出可用 CPU 的数量:
>>>导入操作系统>>>len(os.sched_getaffinity(0))16
同样,线程是否调度在不同的 CPU 上并不依赖于 Python 解释器.
<块引用>问:假设问题 1 的答案是多核",我是否可以通过一些零星的打印语句来跟踪每个线程在哪个核上运行?如果问题 1 的答案是只有一个核心",我是否有办法跟踪它是哪一个?
我想,一个特定的 CPU 可能会从一个时隙更改为另一个时隙.您可以查看类似 /proc/<pid>/task/<tid>/status
的内容旧的 Linux 内核.在我的机器上,可以从 读取
或 task_cpu
/proc/<pid>/stat/proc/<pid>/task/<tid>/stat
:
>>>open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]'4'
对于当前的便携式解决方案,请查看 psutil
是否公开了此类信息.
您可以将当前进程限制为一组 CPU:
os.sched_setaffinity(0, {0}) # 0-th core 上的当前进程
<块引用>
问:对于这个问题,我们忘记了线程,而是关注 Python 中的子进程机制.启动一个新的子进程意味着启动一个新的 Python 解释器会话/shell.这个对吗?
是的.subprocess
模块创建新的操作系统进程.如果您运行 python
可执行文件,那么它会启动一个新的 Python 解释器.如果您运行 bash 脚本,则不会创建新的 Python 解释器,即运行 bash
可执行文件不会启动新的 Python 解释器/会话/等.
问:假设它是正确的,Python 是否足够聪明,可以让单独的解释器会话在不同的 CPU 内核上运行?有没有办法跟踪这一点,也许还有一些零星的打印语句?
见上文(即,操作系统决定在哪里运行您的线程,并且可能有操作系统 API 公开线程的运行位置).
<块引用>multiprocessing.Process(target=foo, args=(q,)).start()
multiprocessing.Process
还会创建一个新的操作系统进程(运行新的 Python 解释器).
实际上,我的子进程是另一个文件.所以这个例子不适合我.
Python 使用模块来组织代码.如果您的代码在 another_file.py
中,则在主模块中 import another_file
并将 another_file.foo
传递给 multiprocessing.Process代码>.
不过,您如何将它与 p = subprocess.Popen(..) 进行比较?如果我使用 subprocess.Popen(..) 与 multiprocessing.Process(..) 启动新进程(或者我应该说python 解释器实例"),这有关系吗?
multiprocessing.Process()
可能是在 subprocess.Popen()
之上实现的.multiprocessing
提供类似于 threading
API 的 API,它抽象出 Python 进程之间的通信细节(Python 对象如何序列化以在进程之间发送).
如果没有 CPU 密集型任务,那么您可以在单个进程中运行您的 GUI 和 I/O 线程.如果您有一系列 CPU 密集型任务,那么要一次使用多个 CPU,请使用具有 C 扩展名的多个线程,例如 lxml
、regex
、numpy代码>(或您自己使用 Cython 创建的代码),可以在长时间计算期间释放 GIL 或将它们卸载到单独的进程中(一个简单的方法是使用
concurrent.futures提供的进程池代码>).
问:社区讨论提出了一个新问题.生成新进程时(在新的 Python 解释器实例中)显然有两种方法:
# 方法 1(a)p = subprocess.Popen(['python', mySubprocessPath], shell = True)# 方法 1(b) (J.F. Sebastian)p = subprocess.Popen([sys.executable, mySubprocessPath])# 方法 2p = multiprocessing.Process(target=foo, args=(q,))
方法 1(a)" 在 POSIX 上是错误的(尽管它可能在 Windows 上工作).为了可移植性,请使用 "Approach 1(b)" 除非您知道您需要 cmd.exe
(在这种情况下传递一个字符串,以确保正确的命令行使用转义).
第二种方法有一个明显的缺点,它只针对一个函数——而我需要打开一个新的 Python 脚本.无论如何,这两种方法在实现的目标上是否相似?
subprocess
创建新进程,any 进程,例如,您可以运行 bash 脚本.multprocessing
用于在另一个进程中运行 Python 代码.导入 Python 模块并运行其功能比将其作为脚本运行更灵活.请参阅使用子进程在 python 脚本中调用带有输入的 python 脚本.
The setup
I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py
file. Within that shell, other threads are started in the following way:
# Start TCP_thread
TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
TCP_thread.start()
# Start UDP_thread
UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
TCP_thread.start()
The Main_thread
starts a TCP_thread
and a UDP_thread
. Although these are separate threads, they all run within one single Python shell.
The Main_thread
also starts a subprocess. This is done in the following way:
p = subprocess.Popen(['python', mySubprocessPath], shell=True)
From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_thread
in this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread
for all its communications.
I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:
I have several questions concerning this setup. I will list them down here:
Question 1 [Solved]
Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1
(from the figure) run all 3 threads (Main_thread
, TCP_thread
and UDP_thread
) on one CPU core?
Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.
Question 2 [Not yet solved]
Do I have a way to track which CPU core it is?
Question 3 [Partly solved]
For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?
Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
The issue has been clarified. This code indeed starts a new Python interpreter instance.
Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?
Question 4 [New question]
The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
解决方案Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?
No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.
If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.
Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.
Q: Or is the Python interpreter able to spread them over multiple cores?
To find out the number of usable CPUs:
>>> import os
>>> len(os.sched_getaffinity(0))
16
Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.
Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?
I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status
on old Linux kernels. On my machine, task_cpu
can be read from /proc/<pid>/stat
or /proc/<pid>/task/<tid>/stat
:
>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'
For a current portable solution, see whether psutil
exposes such info.
You could restrict the current process to a set of CPUs:
os.sched_setaffinity(0, {0}) # current process on 0-th core
Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?
Yes. subprocess
module creates new OS processes. If you run python
executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash
executable does not start a new Python interpreter/session/etc.
Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?
See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).
multiprocessing.Process(target=foo, args=(q,)).start()
multiprocessing.Process
also creates a new OS process (that runs a new Python interpreter).
In reality, my subprocess is another file. So this example won't work for me.
Python uses modules to organize the code. If your code is in another_file.py
then import another_file
in your main module and pass another_file.foo
to multiprocessing.Process
.
Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?
multiprocessing.Process()
is likely implemented on top of subprocess.Popen()
. multiprocessing
provides API that is similar to threading
API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).
If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml
, regex
, numpy
(or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures
).
Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a) p = subprocess.Popen(['python', mySubprocessPath], shell = True) # Approach 1(b) (J.F. Sebastian) p = subprocess.Popen([sys.executable, mySubprocessPath]) # Approach 2 p = multiprocessing.Process(target=foo, args=(q,))
"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe
(pass a string in this case, to make sure that the correct command-line escaping is used).
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
subprocess
creates new processes, any processes e.g., you could run a bash script. multprocessing
is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.
相关文章