如何在最多进行 3N 次比较时实现 std::make_heap ?
我查看了 C++0x 标准,发现 make_heap 的比较次数不应超过 3*N.
I looked in to the C++0x standard and found the requirement that make_heap should do no more than 3*N comparisons.
即heapify 一个无序集合可以在 O(N) 中完成
I.e. heapify an unordered collection can be done in O(N)
/* @brief Construct a heap over a range using comparison functor.
这是为什么?
来源没有给我任何线索(g++ 4.4.3)
The source gives me no clues (g++ 4.4.3)
while (true) + __parent == 0 不是线索,而是对 O(N) 行为的猜测
The while (true) + __parent == 0 are not clues but rather a guess for O(N) behaviour
template<typename _RandomAccessIterator, typename _Compare>
void
make_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
_Compare __comp)
{
const _DistanceType __len = __last - __first;
_DistanceType __parent = (__len - 2) / 2;
while (true)
{
_ValueType __value = _GLIBCXX_MOVE(*(__first + __parent));
std::__adjust_heap(__first, __parent, __len, _GLIBCXX_MOVE(__value),
__comp);
if (__parent == 0)
return;
__parent--;
}
}
__adjust_heap 看起来像一个 log N 方法:
__adjust_heap looks like a log N method:
while ( __secondChild < (__len - 1) / 2)
{
__secondChild = 2 * (__secondChild + 1);
对我来说是沼泽标准日志 N.
Is a bog standard log N to me.
template<typename _RandomAccessIterator, typename _Distance,
typename _Tp, typename _Compare>
void
__adjust_heap(_RandomAccessIterator __first, _Distance __holeIndex,
_Distance __len, _Tp __value, _Compare __comp)
{
const _Distance __topIndex = __holeIndex;
_Distance __secondChild = __holeIndex;
while (__secondChild < (__len - 1) / 2)
{
__secondChild = 2 * (__secondChild + 1);
if (__comp(*(__first + __secondChild),
*(__first + (__secondChild - 1))))
__secondChild--;
*(__first + __holeIndex) = _GLIBCXX_MOVE(*(__first + __secondChild));
__holeIndex = __secondChild;
}
if ((__len & 1) == 0 && __secondChild == (__len - 2) / 2)
{
__secondChild = 2 * (__secondChild + 1);
*(__first + __holeIndex) = _GLIBCXX_MOVE(*(__first
+ (__secondChild - 1)));
__holeIndex = __secondChild - 1;
}
std::__push_heap(__first, __holeIndex, __topIndex,
_GLIBCXX_MOVE(__value), __comp);
}
任何关于为什么这是 O <= 3N 的线索将不胜感激.
Any clues to why this is O <= 3N will be appreciated.
实验结果:
这个实际实现使用
- <2N 堆堆的比较
- <1.5N 用于以相反的顺序堆放堆.
推荐答案
使用巧妙的算法和巧妙的分析,可以在 O(n) 时间内创建一个超过 n 个元素的二进制堆.在接下来的内容中,我将讨论假设您有显式节点和显式左右子指针的情况下这是如何工作的,但是一旦您将其压缩为数组,这种分析仍然完全有效.
A binary heap over n elements can be created in O(n) time using a clever algorithm and a clever analysis. In what follows I'm just going to talk about how this works assuming that you have explicit nodes and explicit left and right child pointers, but this analysis is still perfectly valid once you compress it into an array.
该算法的工作原理如下.首先取大约一半的节点并将它们视为单例最大堆 - 因为只有一个元素,所以只包含该元素的树必须自动成为最大堆.现在,拿走这些树并将它们配对.对于每一对树,取一个你还没有使用过的值并执行以下算法:
The algorithm works as follows. Start off by taking about half of the nodes and treating them as singleton max-heaps - since there's only one element, the tree containing just that element must automatically be a max-heap. Now, take these trees and pair them off with one another. For each pair of trees, take one of the values that you haven't used yet and execute the following algorithm:
使新节点成为堆的根节点,使其左右子指针指向两个最大堆.
Make the new node the root of the heap, having its left and right child pointers refer to the two max-heaps.
当这个节点有一个比它大的子节点时,将子节点与其较大的子节点交换.
While this node has a child that's larger than it, swap the child with its larger child.
我的说法是,这个过程最终会产生一个新的最大堆,其中包含两个输入最大堆的元素,并且它在时间 O(h) 内这样做,其中 h 是两个堆的高度.证明是对堆高度的归纳.作为基本情况,如果子堆的大小为零,则算法立即以单例最大堆终止,并且在 O(1) 时间内完成.对于归纳步??骤,假设对于某些 h,此过程适用于大小为 h 的任何子堆,并考虑在大小为 h + 1 的两个堆上执行它时会发生什么. 当我们添加一个新根以将两个大小的子树连接在一起时h + 1,有三种可能:
My claim is that this procedure ends up producing a new max heap containing the elements of the two input max-heaps, and it does so in time O(h), where h is the height of the two heaps. The proof is an induction on the height of the heaps. As a base case, if the subheaps have size zero, then the algorithm terminates immediately with a singleton max-heap, and it does so in O(1) time. For the inductive step, assume that for some h, this procedure works on any subheaps of size h and consider what happens when you execute it on two heaps of size h + 1. When we add a new root to join together two subtrees of size h + 1, there are three possibilities:
新的根比两个子树的根都大.然后在这种情况下,我们有一个新的最大堆,因为根大于任一子树中的任何节点(通过传递)
The new root is larger than the roots of both subtrees. Then in this case we have a new max-heap, since the root is larger than any of the nodes in either subtree (by transitivity)
新的根比一个孩子大,比另一个小.然后我们将根与较大的子子交换并再次递归执行此过程,使用旧根和子树的两个子树,每个子树的高度为 h.根据归纳假设,这意味着我们交换的子树现在是一个最大堆.因此整个堆是一个最大堆,因为新的根比我们交换的子树中的所有东西都大(因为它比我们添加的节点大并且已经比那个子树中的所有东西都大),而且它也比所有东西都大在另一个子树中(因为它比根大,而且根比另一个子树中的所有东西都大).
The new root is larger than one child and smaller than the other. Then we swap the root with the larger subchild and recursively execute this procedure again, using the old root and the child's two subtrees, each of which are of height h. By the inductive hypothesis, this means that the subtree we swapped into is now a max-heap. Thus the overall heap is a max-heap, since the new root is larger than everything in the subtree we swapped with (since it's larger than the node we added and was already larger than everything in that subtree), and it's also larger than everything in the other subtree (since it's larger than the root and the root was larger than everything in the other subtree).
新的根比它的两个孩子都小.然后使用对上述分析稍加修改的版本,我们可以证明生成的树确实是一个堆.
The new root is smaller than both its children. Then using a slightly modified version of the above analysis, we can show that the resulting tree is indeed a heap.
此外,由于在每一步子堆的高度都会减少 1,因此该算法的总运行时间必须为 O(h).
Moreover, since at each step the heights of the child heaps decreases by one, the overall runtime for this algorithm must be O(h).
此时,我们有了一个简单的堆算法:
At this point, we have a simple algorithm for making a heap:
- 取大约一半的节点并创建单例堆.(您可以明确计算此处需要多少个节点,但大约是一半).
- 将这些堆配对,然后使用未使用的节点之一和上述过程将它们合并在一起.
- 重复第 2 步,直到只剩下一个堆.
因为在每一步我们都知道到目前为止我们拥有的堆是有效的最大堆,最终这会产生一个有效的整体最大堆.如果我们聪明地选择要创建多少个单例堆,这最终也会创建一个完整的二叉树.
Since at each step we know that the heaps we have so far are valid max-heaps, eventually this produces a valid overall max-heap. If we're clever with how we pick how many singleton heaps to make, this will end up creating a complete binary tree as well.
然而,这似乎应该在 O(n lg n) 时间内运行,因为我们进行 O(n) 合并,每个合并都在 O(h) 中运行,在最坏的情况下,树的高度我们正在合并是 O(lg n).但是这个界限并不严格,我们可以通过更精确的分析来做得更好.
However, it seems like this should run in O(n lg n) time, since we do O(n) merges, each of which runs in O(h), and in the worst case the height of the trees we're merging is O(lg n). But this bound is not tight and we can do a lot better by being more precise with the analysis.
特别是,让我们考虑一下我们合并的所有树的深度.大约一半的堆深度为 0,剩下的一半深度为 1,剩下的一半深度为 2,依此类推.如果我们总结一下,我们得到总和
In particular, let's think about how deep all the trees we merge are. About half the heaps have depth zero, then half of what's left has depth one, then half of what's left has depth two, etc. If we sum this up, we get the sum
0 * n/2 + 1 * n/4 + 2 * n/8 + ... + nk/(2k) = Σk = 0⌈log n⌉ (nk/2k) = n Σk = 0⌈log n⌉ (k/2k+1)
0 * n/2 + 1 * n/4 + 2 * n/8 + ... + nk/(2k) = Σk = 0⌈log n⌉ (nk / 2k) = n Σk = 0⌈log n⌉ (k / 2k+1)
这是交换次数的上限.每次交换最多需要两次比较.因此,如果我们将上述总和乘以 2,我们会得到以下总和,这是交换次数的上限:
This upper-bounds the number of swaps made. Each swap requires at most two comparisons. Therefore, if we multiply the above sum by two, we get the following summation, which upper-bounds the number of swaps made:
n Σk = 0∞ (k/2k)
n Σk = 0∞ (k / 2k)
这里的求和是求和 0/20 + 1/21 + 2/22 + 3/23 + ... .这是一个著名的总结,可以用多种不同的方式进行评估.给出一种评估方法 在这些讲座幻灯片中,幻灯片 45-47.最终结果正好是 2n,这意味着最终进行的比较次数肯定以 3n 为界.
The summation here is the summation 0 / 20 + 1 / 21 + 2 / 22 + 3 / 23 + ... . This is a famous summation that can be evaluated in multiple different ways. One way to evaluate this is given in these lecture slides, slides 45-47. It ends up coming out to exactly 2n, which means that the number of comparisons that end up getting made is certainly bounded from above by 3n.
希望这有帮助!
相关文章