python 对字符串数组按照相似性排序

2022-03-11 00:00:00 字符串 数组 相似性
"""
作者:皮蛋编程(https://www.pidancode.com)
创建日期:2022/3/18
修改日期:2022/3/18
功能描述:python 对字符串数组按照相似性排序
"""


def sortByGroup(lst, percent=75):
    groups = []
    for item in lst:
        match = False
        for g in range(len(groups)):
            group = groups[g]
            parent = group[0]
            points = 0.0
            try:
                for x in range(len(parent)):
                    if parent[x] == item[x]:
                        points += 1
                if (points / len(parent)) * 100 >= percent:
                    group.append(item)
                    group.sort()
                    match = True
            except:
                pass
        if not match:
            groups.append([item])
    return groups


# 测试范例
random = [
    'pidancode.com/',
    'frank2',
    'pidancode.com/1',
    'joe2',
    'frank1',
    'pidancode.com/2',
    'joe1',
    'joe3'
]
groups = sortByGroup(random, percent=75)
for g in groups:
    for i in g:
        print(i)
    print('-' * 30)

输出结果:

pidancode.com/
pidancode.com/1
pidancode.com/2
------------------------------
frank1
frank2
------------------------------
joe1
joe2
joe3
------------------------------

以上代码在python3.9环境下测试通过。

相关文章