如何使用 Python 堆实现 Top K 问题？

2023-04-11 00:00:00 python 如何使用 TOP

使用 Python 堆可以非常方便地解决 Top K 问题。以下是详细的解释和示例代码：

导入 heapq 模块，它是 Python 内置的堆实现。如果需要使用字符串作为测试数据，可以定义一个字符串列表：

import heapq

strings = ['pidancode.com', 'hello', 'world', 'python', 'heapq', 'code', '皮蛋编程']
k = 3

调用 heapq.nlargest 函数，可以轻松地找出列表中前 k 个最大值。传递的参数包括 k、列表以及一个关键字函数，它用于指定以何种方式比较元素大小。对于字符串，可以使用 len 函数：

topk = heapq.nlargest(k, strings, key=len)
print(topk)  # ['pidancode.com', '皮蛋编程', 'python']

如果要自己实现一个堆，可以使用 heapq 对象的函数，如 heappush 和 heappop。这里只需要定义一个空列表，然后将元素插入堆中，直到堆的大小达到 k，然后弹出堆顶元素，直到堆中只有 k 个元素为止：

heap = []
for s in strings:
    heapq.heappush(heap, s)
    if len(heap) > k:
        heapq.heappop(heap)
print(heap)  # ['world', 'heapq', 'python']

这段代码首先定义了一个空列表 heap，然后遍历字符串列表并将元素插入堆中。如果堆的大小超过了 k，就弹出堆顶元素，直到堆中只有 k 个元素。最后的结果就是 top k 元素组成的列表。注意，这个实现并不具有普适性，只能在找出最大/最小的 k 个元素时使用。

完整代码如下：

import heapq

strings = ['pidancode.com', 'hello', 'world', 'python', 'heapq', 'code', '皮蛋编程']
k = 3

# 使用 heapq.nlargest
topk = heapq.nlargest(k, strings, key=len)
print(topk)  # ['pidancode.com', '皮蛋编程', 'python']

# 使用自己实现的堆
heap = []
for s in strings:
    heapq.heappush(heap, s)
    if len(heap) > k:
        heapq.heappop(heap)
print(heap)  # ['world', 'heapq', 'python']

输出结果：

['pidancode.com', '皮蛋编程', 'python']
['world', 'heapq', 'python']

相关文章