比较标签组以查找与 PHP/MySQL 的相似性/分数
如何将一组标签与我数据库中另一个帖子的标签进行比较以获取相关帖子?
我要做的是将帖子上的一组标签与另一个帖子的标签进行比较,而不是单独比较每个标签.因此,假设您想根据帖子中的标签获取真正相关的项目,然后从最相关到最不相关显示它们.无论关系级别如何,每次都必须显示三个相关项目.
What I'm trying to do is compare a group of tags on a post to another post's tags, but not each tag individually. So say you wanted to get truly related items based on tags from a post and then show them from the most related to the least related. Each time there have to be three related items shown, no matter the relationship level.
帖子 A 的标签有:建筑"、木材"、现代"、瑞士"
帖子 B 的标签有:建筑"、木材"、现代"
Post C 有标签:建筑"、现代"、石头"
帖子 D 有标签:建筑"、房子"、住宅"
Post A has the tags: "architecture", "wood", "modern", "switzerland"
Post B has the tags: "architecture", "wood", "modern"
Post C has the tags: "architecture", "modern", "stone"
Post D has the tags: "architecture", "house", "residence"
帖子 B 与帖子 A 的相关度为 75%(3 个相关标签)
帖子 C 与帖子 A 的相关度为 50%(2 个相关标签)
帖子 D 与帖子 A 的相关度为 25%(1 个相关标签)
Post B is related to post A by 75% (3 related tags)
Post C is related to post A by 50% (2 related tags)
Post D is related to post A by 25% (1 related tag)
我该怎么做?我目前正在使用 3-tables.
How can I do that? I'm currently using a 3-tables.
posts
> id
> image
> date
post_tags
> post_id
> tag_id
tags
> id
> name
我已经搜索了 Internet 和 Stack Overflow 以了解如何执行此操作.我最接近的发现是 如何查找相关项目"在 PHP 中,但它实际上并没有为我解决太多问题.
I have searched the Internet and Stack Overflow to find out how to do this. My closest find was How to find "related items" in PHP, but it actually didn't solve much for me.
推荐答案
注意:此解决方案仅适用于 MySQL,因为 MySQL 对 GROUP BY 有自己的解释
NOTE: This solution is MySQL only, as MySQL has its own interpretation of GROUP BY
我还使用了我自己的相似度计算方法.我将相同标签的数量除以帖子 A 和帖子 B 中的平均标签数.因此,如果帖子 A 有 4 个标签,而帖子 B 有 2 个与 A 共享的标签,则相似度为 66%.
I've also used my own calculation of similarity. I've taken the number of identical tags and divided it by the average tag count in post A and post B. So if post A has 4 tags, and post B has 2 tags which are both shared with A, the similarity is 66%.
(SHARED:2/((A:4 + B:2)/2)
或 (SHARED:2)/(AVG:3)
如果您想要/需要更改公式应该很容易......
It should be easy to change the formula if you want/need to...
SELECT
sourcePost.id,
targetPost.id,
/* COUNT NUMBER OF IDENTICAL TAGS */
/* REF GROUPING OF sourcePost.id and targetPost.id BELOW */
COUNT(targetPost.id) /
(
(
/* TOTAL TAGS IN SOURCE POST */
(SELECT COUNT(*) FROM post_tags WHERE post_id = sourcePost.id)
+
/* TOTAL TAGS IN TARGET POST */
(SELECT COUNT(*) FROM post_tags WHERE post_id = targetPost.id)
) / 2 /* AVERAGE TAGS IN SOURCE + TARGET */
) as similarity
FROM
posts sourcePost
LEFT JOIN
post_tags sourcePostTags ON (sourcePost.id = sourcePostTags.post_id)
INNER JOIN
post_tags targetPostTags ON (sourcePostTags.tag_id = targetPostTags.tag_id
AND
sourcePostTags.post_id != targetPostTags.post_id)
LEFT JOIN
posts targetPost ON (targetPostTags.post_id = targetPost.id)
GROUP BY
sourcePost.id, targetPost.id
相关文章