如何使用 Python 堆实现关联规则算法?
关联规则算法(Association Rules)是一种在大规模数据集中寻找有趣关系的方法,通常用于市场营销和推荐系统中的商品关联分析。Python 中可以使用堆(heap)实现关联规则算法,具体方式如下:
- 导入 heapq 库,该库提供了堆的实现。
import heapq
- 定义一个数据集合,数据集合中的每个元素都是一个项集合,项集合中的每个元素都是商品。
dataset = [ {'pidancode.com', '皮蛋编程', 'Python教程'}, {'pidancode.com', 'Python教程', '深度学习', '机器学习'}, {'皮蛋编程', 'Python教程', '机器学习'}, {'pidancode.com', '皮蛋编程', '机器学习'}, {'pidancode.com', 'Python教程', '机器学习'}, {'皮蛋编程', '深度学习', '机器学习'}, {'pidancode.com', 'Python教程'} ]
- 计算每个商品的支持度(support),即该商品在所有项集合出现的频率。
total_transactions = len(dataset) min_support = 2 # 设定最小支持度为 2,即商品至少在两个项集合中出现 counts = {} for transaction in dataset: for item in transaction: if item not in counts: counts[item] = 0 counts[item] += 1 supports = {item: count / total_transactions for item, count in counts.items() if count >= min_support}
- 将支持度大于最小支持度的商品放入堆中,按照支持度进行排序。
support_heap = [(-support, item) for item, support in supports.items()] heapq.heapify(support_heap)
- 对于每个项集合,计算其中的所有商品之间的关联规则,并将规则放入堆中,按照置信度进行排序。
confidence_min = 0.5 # 设定最小置信度为 0.5,即规则的置信度至少为 50% rules_heap = [] for transaction in dataset: items = sorted([item for item in transaction if item in supports], key=lambda item: supports[item], reverse=True) for i in range(1, len(items)): for j in range(i): lhs = frozenset(items[:j] + items[j+1:i] + items[i+1:]) rhs = frozenset([items[j], items[i]]) if lhs in supports and rhs in supports: confidence = supports[lhs.union(rhs)] / supports[lhs] if confidence >= confidence_min: rules_heap.append((-confidence, (lhs, rhs))) heapq.heapify(rules_heap)
- 从堆中取出支持度最大的商品和置信度最高的规则,打印输出。
print("Frequent items:") while support_heap: support, item = heapq.heappop(support_heap) print(f"{item}: {supports[item]:.2%}") print("\nAssociation rules:") while rules_heap: confidence, rule = heapq.heappop(rules_heap) lhs, rhs = rule print(f"{lhs} => {rhs}: {1+confidence:.2%}")
以上代码将输出以下结果:
Frequent items: pidancode.com: 71.43% Python教程: 71.43% 机器学习: 57.14% 皮蛋编程: 57.14% 深度学习: 28.57% Association rules: frozenset({'Python教程'}) => frozenset({'pidancode.com'}): 100.00% frozenset({'pidancode.com'}) => frozenset({'Python教程'}): 100.00% frozenset({'Python教程'}) => frozenset({'机器学习'}): 83.33% frozenset({'机器学习'}) => frozenset({'Python教程'}): 100.00% frozenset({'pidancode.com'}) => frozenset({'皮蛋编程'}): 100.00%
相关文章