如何从 .html 页面中提取链接和标题?
对于我的网站,我想添加一个新功能.
for my website, i'd like to add a new functionality.
我希望用户能够上传他的书签备份文件(如果可能,从任何浏览器),这样我就可以将其上传到他们的个人资料中,而他们不必手动插入所有这些...
I would like user to be able to upload his bookmarks backup file (from any browser if possible) so I can upload it to their profile and they don't have to insert all of them manually...
我唯一缺少的部分是从上传的文件中提取标题和 URL 的部分.. 任何人都可以提供线索从哪里开始或从哪里阅读?
the only part i'm missing to do this it's the part of extracting title and URL from the uploaded file.. can anyone give a clue where to start or where to read?
使用的搜索选项和(如何从原始 HTML 文件?)这是与我最相关的问题,它没有谈论它..
used search option and (How to extract data from a raw HTML file?) this is the most related question for mine and it doesn't talk about it..
我真的不介意它是使用 jquery 还是 php
I really don't mind if its using jquery or php
非常感谢.
推荐答案
谢谢大家,我明白了!
最终代码:
$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);
//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
这会显示为.html 文件中的所有链接分配的锚 文本和href.
This shows you the anchor text assigned and the href for all links in a .html file.
再次感谢.
相关文章