关闭 urllib2 连接

2022-01-09 00:00:00 python connection urllib2 ftp

问题描述

我正在使用 urllib2 从 ftp 和 http 服务器加载文件.

某些服务器仅支持每个 IP 一个连接.问题是,urllib2 不会立即关闭连接.查看示例程序.

从 urllib2 导入 urlopen从时间导入睡眠url = 'ftp://user:pass@host/big_file.ext'定义加载文件(网址):f = urlopen(url)加载 = 0而真:数据 = f.read(1024)如果数据 == '':休息已加载 += len(数据)f.close()#睡眠(1)print('已加载 {0}'.format(已加载))加载文件(网址)加载文件(网址)

代码从仅支持 1 个连接的 ftp 服务器加载两个文件(此处两个文件相同).这将打印以下日志:

已加载 463675266回溯(最近一次通话最后):文件conection_test.py",第 20 行,在 <module>加载文件(网址)文件connection_test.py",第 7 行,在 load_file 中f = urlopen(url)文件/usr/lib/python2.6/urllib2.py",第 126 行,在 urlopenreturn _opener.open(网址,数据,超时)文件/usr/lib/python2.6/urllib2.py",第 391 行,打开响应 = self._open(请求,数据)_open 中的文件/usr/lib/python2.6/urllib2.py",第 409 行'_open',请求)_call_chain 中的文件/usr/lib/python2.6/urllib2.py",第 369 行结果 = 函数(*args)文件/usr/lib/python2.6/urllib2.py",第 1331 行,在 ftp_openfw = self.connect_ftp(用户,密码,主机,端口,目录,req.timeout)文件/usr/lib/python2.6/urllib2.py",第 1352 行,在 connect_ftpfw = ftpwrapper(用户、密码、主机、端口、目录、超时)__init__ 中的文件/usr/lib/python2.6/urllib.py",第 854 行self.init()文件/usr/lib/python2.6/urllib.py",第 860 行,在 initself.ftp.connect(self.host,self.port,self.timeout)文件/usr/lib/python2.6/ftplib.py",第 134 行,在连接中self.welcome = self.getresp()文件/usr/lib/python2.6/ftplib.py",第 216 行,在 getresp 中提高error_temp,respurllib2.URLError: <urlopen 错误 ftp 错误: 421 来自您的 Internet 地址的连接太多.>

所以第一个文件被加载,第二个文件失败,因为第一个连接没有关闭.

但是当我在 f.close() 之后使用 sleep(1) 时不会发生错误:

已加载 463675266已加载 463675266

有什么办法可以强制关闭连接,以免第二次下载失败?

解决方案

原因确实是文件描述符泄漏.我们还发现,使用 jython 时,问题比使用 cpython 时要明显得多.一位同事提出了这个解决方案:

<上一页>fdurl = urllib2.urlopen(req,timeout=self.timeout)realsock = fdurl.fp._sock.fp._sock** # 我们想稍后关闭真实"套接字req = urllib2.Request(url, header)尝试:fdurl = urllib2.urlopen(req,timeout=self.timeout)除了 urllib2.URLError,e:打印urlopen 异常",erealsock.close()fdurl.close()

修复很丑陋,但确实有效,不再有打开的连接太多".

I'm using urllib2 to load files from ftp- and http-servers.

Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.

from urllib2 import urlopen
from time import sleep

url = 'ftp://user:pass@host/big_file.ext'

def load_file(url):
    f = urlopen(url)
    loaded = 0
    while True:
        data = f.read(1024)
        if data == '':
            break
        loaded += len(data)
    f.close()
    #sleep(1)
    print('loaded {0}'.format(loaded))

load_file(url)
load_file(url)

The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:

loaded 463675266
Traceback (most recent call last):
  File "conection_test.py", line 20, in <module>
    load_file(url)
  File "conection_test.py", line 7, in load_file
    f = urlopen(url)
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
    fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
  File "/usr/lib/python2.6/urllib.py", line 854, in __init__
    self.init()
  File "/usr/lib/python2.6/urllib.py", line 860, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python2.6/ftplib.py", line 134, in connect
    self.welcome = self.getresp()
  File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
    raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>

So the first file is loaded and the second fails because the first connection was not closed.

But when i use sleep(1) after f.close() the error does not occurr:

loaded 463675266
loaded 463675266

Is there any way to force close the connection so that the second download would not fail?

解决方案

The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython. A colleague proposed this sollution:

 

    fdurl = urllib2.urlopen(req,timeout=self.timeout)
    realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later 
    req = urllib2.Request(url, header)
    try:
             fdurl = urllib2.urlopen(req,timeout=self.timeout)
    except urllib2.URLError,e:
              print "urlopen exception", e
    realsock.close() 
    fdurl.close()

The fix is ugly, but does the job, no more "too many open connections".

相关文章