neo4j实现PageRank算法
一、什么是PageRank
PageRank,中文一般叫佩奇排名或网页排名,是利用网页简单的超链接来计算网页的分值,从而给网页进行排名的一种算法,以Google公司创办人Larry Page之姓来命名。Google用它来体现网页的相关性和重要性,在搜索引擎优化操作中是经常被用来评估网页优化的成效因素之一。
它的思想是模拟一个悠闲的上网者,上网者首先随机选择一个网页打开,然后在这个网页上呆了几分钟后,跳转到该网页所指向的链接,这样无所事事、漫无目的地在网页上跳来跳去,PageRank就是估计这个悠闲的上网者分布在各个网页上的概率。
二、neo4j实现PageRank算法例子:创建一个属性图
CREATE
(home:Page {name:'Home'}),
(about:Page {name:'About'}),
(product:Page {name:'Product'}),
(links:Page {name:'Links'}),
(a:Page {name:'Site A'}),
(b:Page {name:'Site B'}),
(c:Page {name:'Site C'}),
(d:Page {name:'Site D'}),
(home)-[:LINKS {weight: 0.2}]->(about),
(home)-[:LINKS {weight: 0.2}]->(links),
(home)-[:LINKS {weight: 0.6}]->(product),
(about)-[:LINKS {weight: 1.0}]->(home),
(product)-[:LINKS {weight: 1.0}]->(home),
(a)-[:LINKS {weight: 1.0}]->(home),
(b)-[:LINKS {weight: 1.0}]->(home),
(c)-[:LINKS {weight: 1.0}]->(home),
(d)-[:LINKS {weight: 1.0}]->(home),
(links)-[:LINKS {weight: 0.8}]->(home),
(links)-[:LINKS {weight: 0.05}]->(a),
(links)-[:LINKS {weight: 0.05}]->(b),
(links)-[:LINKS {weight: 0.05}]->(c),
(links)-[:LINKS {weight: 0.05}]->(d);
1:属性图如下 在这里插入图片描述
2:给这个属性图起一个名字
CALL gds.graph.create(
'myGraph',
'Page',
'LINKS',
{
relationshipProperties: 'weight'
})
3:可以评估一下这个属性图所占内存
CALL gds.pageRank.write.estimate('myGraph', {
writeProperty: 'pageRank',
maxIterations: 20,
dampingFactor: 0.85})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
4:现在可以调用GDS中的PageRank算法
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
5:还有stats、mutate和write执行模式,此处不在赘述 6:下面执行带权重的算法计算
CALL gds.pageRank.stream('myGraph', {
maxIterations: 20,
dampingFactor: 0.85,
relationshipWeightProperty: 'weight'})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
7:这里约束条件tolerance设为0.1,低于此值,算法结束,结果返回
CALL gds.pageRank.stream('myGraph', {
maxIterations: 20,
dampingFactor: 0.85,
tolerance: 0.1})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
8:阻尼系数dampingFactor:也就是概率值改变会引起中心度值不同,这里变为0.05,看起来和默认的0.85差不多,源于图节点比较少,还不足以引起质变
CALL gds.pageRank.stream('myGraph', {
maxIterations: 20,
dampingFactor: 0.05})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
9:个性化PageRank算法,其实就是制定一个特定的起点,重点关注这个指定点的中心度
MATCH (siteA:Page {name: 'Site A'})
CALL gds.pageRank.stream('myGraph', {
maxIterations: 20,
dampingFactor: 0.85,
sourceNodes: [siteA]})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
来源:https://mp.weixin.qq.com/s/iRnkUHo7D9Hxwri4waSG1g
相关文章