python-urlparse

2023-01-31 01:01:45 python

Http://docs.python.org/2/library/urlparse.html?highlight=urlparse#urlparse

主要函数如下：

1。urlparse

#!/usr/bin/Python 
import urlparse 
WEBURL = "http://www.Google.com/search?hl=en&q=python&btnG=Google+Search" 
#parseTuple = urlparse.urlsplit(webURL) 
parseTuple = urlparse.urlparse(webURL) 
print parseTuple

输出如下：

ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='hl=en&q=python&btnG=Google+Search', fragment='')

我们可以看到输入为6个部分；元组 (scheme, netloc, path, parameters, query, fragment)

2. urlparse.urlunparse(parts)

#!/usr/bin/python 
import urlparse 
URLschema = "ftp" 
webURL = "http://www.google.com/search?hl=en&q=python&btnG=Google+Search" 
#parseTuple = urlparse.urlsplit(webURL) 
parseTuple = urlparse.urlparse(webURL) 
print parseTuple 
u = urlparse.urlunparse((URLschema,parseTuple.netloc,parseTuple.path,parseTuple.params,parseTuple.query,'')) 
print u

结果如下：

重新拼合成了一个新的url

ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='hl=en&q=python&btnG=Google+Search', fragment='') 
ftp://www.google.com/search?hl=en&q=python&btnG=Google+Search

urlparse.urlsplit(urlstring[, scheme[, allow_fragments]])
This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).

4.urlparse.urljoin(base, url[, allow_fragments])

这个的主要作用是拼接url

import urlparse 
#-*- coding:utf-8 -*- 
#测试1 
base_url = "http://motor.blog.51cto.com/blog/addblog.PHP" 
relative_url = "../blog/test.php" 
abs_url = urlparse.urljoin(base_url, relative_url) 
print abs_url 
#测试2 
base_url_2 = "http://motor.blog.51cto.com/blog/addblog.php" 
relative_url_2 = "test.php" 
abs_url_2 = urlparse.urljoin(base_url_2, relative_url_2) 
print abs_url_2 
#测试3 
base_url_3 = "http://motor.blog.51cto.com/blog/" 
relative_url_3 = "test.php" 
abs_url_3 = urlparse.urljoin(base_url_3, relative_url_3) 
print abs_url_3 
#测试4 
base_url_4 = "http://motor.blog.51cto.com/blog" 
relative_url_4 = "test.php" 
abs_url_4 = urlparse.urljoin(base_url_4, relative_url_4) 
print abs_url_4

结果如下：

http://motor.blog.51cto.com/blog/test.php 
http://motor.blog.51cto.com/blog/test.php 
http://motor.blog.51cto.com/blog/test.php 
http://motor.blog.51cto.com/test.php

相关文章