如何从 .html 页面中提取链接和标题?

2022-01-01 00:00:00 string web-crawler hyperlink php html


for my website, i'd like to add a new functionality.


I would like user to be able to upload his bookmarks backup file (from any browser if possible) so I can upload it to their profile and they don't have to insert all of them manually...

我唯一缺少的部分是从上传的文件中提取标题和 URL 的部分.. 任何人都可以提供线索从哪里开始或从哪里阅读?

the only part i'm missing to do this it's the part of extracting title and URL from the uploaded file.. can anyone give a clue where to start or where to read?

使用的搜索选项和(如何从原始 HTML 文件?)这是与我最相关的问题,它没有谈论它..

used search option and (How to extract data from a raw HTML file?) this is the most related question for mine and it doesn't talk about it..

我真的不介意它是使用 jquery 还是 php

I really don't mind if its using jquery or php





$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute.
    echo $link->nodeValue;
    echo $link->getAttribute('href'), '<br>';

这会显示为.html 文件中的所有链接分配的锚 文本和href.

This shows you the anchor text assigned and the href for all links in a .html file.

