python-urlparse
2023-01-31 01:01:45
python
Http://docs.python.org/2/library/urlparse.html?highlight=urlparse#urlparse
主要函数如下:
1。urlparse
- #!/usr/bin/Python
- import urlparse
- WEBURL = "http://www.Google.com/search?hl=en&q=python&btnG=Google+Search"
- #parseTuple = urlparse.urlsplit(webURL)
- parseTuple = urlparse.urlparse(webURL)
- print parseTuple
输出如下:
- ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='hl=en&q=python&btnG=Google+Search', fragment='')
我们可以看到输入为6个部分;元组 (scheme, netloc, path, parameters, query, fragment)
2. urlparse.urlunparse(parts)
- #!/usr/bin/python
- import urlparse
- URLschema = "ftp"
- webURL = "http://www.google.com/search?hl=en&q=python&btnG=Google+Search"
- #parseTuple = urlparse.urlsplit(webURL)
- parseTuple = urlparse.urlparse(webURL)
- print parseTuple
- u = urlparse.urlunparse((URLschema,parseTuple.netloc,parseTuple.path,parseTuple.params,parseTuple.query,''))
- print u
结果如下:
重新拼合成了一个新的url
- ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='hl=en&q=python&btnG=Google+Search', fragment='')
- ftp://www.google.com/search?hl=en&q=python&btnG=Google+Search
3.
- urlparse.urlsplit(urlstring[, scheme[, allow_fragments]])
- This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).
- SplitResult(scheme='http', netloc='www.google.com', path='/search', query='hl=en&q=python&btnG=Google+Search', fragment='')
4.urlparse.urljoin(base, url[, allow_fragments])
这个的主要作用是拼接url
- import urlparse
- #-*- coding:utf-8 -*-
- #测试1
- base_url = "http://motor.blog.51cto.com/blog/addblog.PHP"
- relative_url = "../blog/test.php"
- abs_url = urlparse.urljoin(base_url, relative_url)
- print abs_url
- #测试2
- base_url_2 = "http://motor.blog.51cto.com/blog/addblog.php"
- relative_url_2 = "test.php"
- abs_url_2 = urlparse.urljoin(base_url_2, relative_url_2)
- print abs_url_2
- #测试3
- base_url_3 = "http://motor.blog.51cto.com/blog/"
- relative_url_3 = "test.php"
- abs_url_3 = urlparse.urljoin(base_url_3, relative_url_3)
- print abs_url_3
- #测试4
- base_url_4 = "http://motor.blog.51cto.com/blog"
- relative_url_4 = "test.php"
- abs_url_4 = urlparse.urljoin(base_url_4, relative_url_4)
- print abs_url_4
结果如下:
- http://motor.blog.51cto.com/blog/test.php
- http://motor.blog.51cto.com/blog/test.php
- http://motor.blog.51cto.com/blog/test.php
- http://motor.blog.51cto.com/test.php
相关文章