
2022-01-24 00:00:00 connection java jsoup

我正在使用 jsoup 创建一个类,它将执行以下操作:

I'm creating a class using jsoup that will do the following:

  1. 构造函数打开一个到 url 的连接.
  2. 我有一个方法可以检查页面的状态.即 200、404 等.
  3. 我有一个方法可以解析页面并返回一个 url 列表.#


Below is a rough working of what I am trying to do, not its very rough as I've been trying a lot of different things

public class ParsePage {
private String path;
Connection.Response response = null;

private ParsePage(String langLocale){
    try {
        response = Jsoup.connect(path)
                .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
    } catch (IOException e) {
        System.out.println("io - "+e);

public int getSitemapStatus(){
    int statusCode = response.statusCode();
    return statusCode;

public ArrayList<String> getUrls(){
    ArrayList<String> urls = new ArrayList<String>();



As you can see I can get the page status, but using the already open connection from the constructor I don't know how to get the document to parse, I tried using:

Document doc = connection.get();


But that's a no go. Any suggestions? Or better ways to go about this?


如 Connection.Response 类型,有一个 parse() 方法将响应的主体解析为 Document 并返回它.当你拥有它时,你可以用它做任何你想做的事情.

As stated in the JSoup Documentation for the Connection.Response type, there is a parse() method that parse the response's body as a Document and returns it. When you have that, you can do whatever you want with it.


public class ParsePage {
   private String path;
   Connection.Response response = null;

   private ParsePage(String langLocale){
      try {
         response = Jsoup.connect(path)
            .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
      } catch (IOException e) {
         System.out.println("io - "+e);

   public int getSitemapStatus() {
      int statusCode = response.statusCode();
      return statusCode;

   public ArrayList<String> getUrls() {
      ArrayList<String> urls = new ArrayList<String>();
      Document doc = response.parse();
      // do whatever you want, for example retrieving the <url> from the sitemap
      for (Element url : doc.select("url")) {
      return urls;
