Python解释器阻止多线程DNS请求?
问题描述
我只是玩了一下 python 和线程,发现即使在多线程脚本中,DNS 请求也会阻塞.考虑以下脚本:
I just played around a little bit with python and threads, and realized even in a multithreaded script, DNS requests are blocking. Consider the following script:
从线程导入线程导入套接字
from threading import Thread import socket
class Connection(Thread):
def __init__(self, name, url):
Thread.__init__(self)
self._url = url
self._name = name
def run(self):
print "Connecting...", self._name
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.connect((self._url, 80))
except socket.gaierror:
pass #not interested in it
print "finished", self._name
if __name__ == '__main__':
conns = []
# all invalid addresses to see how they fail / check times
conns.append(Connection("conn1", "www.2eg11erdhrtj.com"))
conns.append(Connection("conn2", "www.e2ger2dh2rtj.com"))
conns.append(Connection("conn3", "www.eg2de3rh1rtj.com"))
conns.append(Connection("conn4", "www.ege2rh4rd1tj.com"))
conns.append(Connection("conn5", "www.ege52drhrtj1.com"))
for conn in conns:
conn.start()
我不知道确切的超时时间,但是在运行时会发生以下情况:
I dont know exactly how long the timeout is, but when running this the following happens:
- 所有线程都开始了,我得到了打印输出
- 每 xx 秒显示一个线程已完成,而不是一次全部完成
- 线程按顺序完成,而不是一次完成(超时 = 全部相同!)
所以我唯一的猜测是这与 GIL 有关?显然线程不会同时执行它们的任务,一次只尝试一个连接.
So my only guess is that this has to do with the GIL? Obviously the threads do not perform their task concurrently, only one connection is attempted at a time.
有人知道解决这个问题的方法吗?
Does anyone know a way around this?
(asyncore 没有帮助,我暂时不想使用 twisted)难道用python就不能搞定这么简单的小事吗?
(asyncore doesnt help, and I'd prefer not to use twisted for now) Isn't it possible to get this simple little thing done with python?
你好,汤姆
我在 MacOSX 上,我只是让我的朋友在 linux 上运行它,他确实得到了我希望得到的结果.即使在非线程环境中,他的 socket.connects() 也会立即返回.即使他将套接字设置为阻塞,超时时间为 10 秒,他的所有线程也会同时完成.
I am on MacOSX, I just let my friend run this on linux, and he actually does get the results I wished to get. His socket.connects()'s return immediately, even in a non Threaded environment. And even when he sets the sockets to blocking, and timeout to 10 seconds, all his Threads finish at the same time.
谁能解释一下?
解决方案
在某些系统上,getaddrinfo 不是线程安全的.Python 认为一些这样的系统是 FreeBSD、OpenBSD、NetBSD、OSX 和 VMS.在这些系统上,Python 专门为 netdb 维护一个锁(即 getaddrinfo 和朋友).
On some systems, getaddrinfo is not thread-safe. Python believes that some such systems are FreeBSD, OpenBSD, NetBSD, OSX, and VMS. On those systems, Python maintains a lock specifically for the netdb (i.e. getaddrinfo and friends).
因此,如果您无法切换操作系统,则必须使用不同的(线程安全的)解析器库,例如twisted's.
So if you can't switch operating systems, you'll have to use a different (thread-safe) resolver library, such as twisted's.
相关文章