把最胖的人从超载的飞机上扔下来.

2022-01-07 00:00:00 algorithm sorting c++ stl

假设您有一架飞机,但燃料不足.除非飞机的乘客重量下降 3000 磅,否则它将无法到达下一个机场.为了挽救最多的生命,我们想先把最重的人扔下飞机.

Let's say you've got an airplane, and it is low on fuel. Unless the plane drops 3000 pounds of passenger weight, it will not be able to reach the next airport. To save the maximum number of lives, we would like to throw the heaviest people off of the plane first.

哦,是的,飞机上有数百万人,我们想要一种最佳算法来找到最重的乘客,而不必对整个列表进行排序.

And oh yeah, there are millions of people on the airplane, and we would like an optimal algorithm to find the heaviest passengers, without necessarily sorting the entire list.

这是我尝试用 C++ 编写的代码的代理问题.我想按重量对乘客清单进行partial_sort",但我不知道我需要多少元素.我可以实现自己的partial_sort"算法(partial_sort_accumulate_until"),但我想知道是否有使用标准 STL 更简单的方法来做到这一点.

This is a proxy problem for something I'm trying to code in C++. I would like to do a "partial_sort" on the passenger manifest by weight, but I don't know how many elements I'm going to need. I could implement my own "partial_sort" algorithm ("partial_sort_accumulate_until"), but I'm wondering if there's any easier way to do this using standard STL.

推荐答案

一种方法是使用 min堆 (std::priority_queue在 C++ 中).假设您有一个 MinHeap 类,您可以这样做.(是的,我的例子是用 C# 编写的.我想你明白了.)

One way would be to use a min heap (std::priority_queue in C++). Here's how you'd do it, assuming you had a MinHeap class. (Yes, my example is in C#. I think you get the idea.)

int targetTotal = 3000;
int totalWeight = 0;
// this creates an empty heap!
var myHeap = new MinHeap<Passenger>(/* need comparer here to order by weight */);
foreach (var pass in passengers)
{
    if (totalWeight < targetTotal)
    {
        // unconditionally add this passenger
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
    else if (pass.Weight > myHeap.Peek().Weight)
    {
        // If this passenger is heavier than the lightest
        // passenger already on the heap,
        // then remove the lightest passenger and add this one
        var oldPass = myHeap.RemoveFirst();
        totalWeight -= oldPass.Weight;
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
}

// At this point, the heaviest people are on the heap,
// but there might be too many of them.
// Remove the lighter people until we have the minimum necessary
while ((totalWeight - myHeap.Peek().Weight) > targetTotal)
{
    var oldPass = myHeap.RemoveFirst();
    totalWeight -= oldPass.Weight; 
}
// The heap now contains the passengers who will be thrown overboard.

根据标准参考,运行时间应该与n log k成正比,其中n是乘客数量,k是堆上的最大项目数.如果我们假设乘客的体重通常为 100 磅或更多,那么堆在任何时候都不太可能包含超过 30 件物品.

According to the standard references, running time should be proportional to n log k, where n is the number of passengers and k is the maximum number of items on the heap. If we assume that passengers' weights will typically be 100 lbs or more, then it's unlikely that the heap will contain more than 30 items at any time.

最坏的情况是,如果乘客按从最低到最高的顺序出现.这将要求将每个乘客添加到堆中,并从堆中删除每个乘客.尽管如此,如果有 100 万乘客并假设最轻的重量为 100 磅,n log k 计算出的数字相当小.

The worst case would be if the passengers are presented in order from lowest weight to highest. That would require that every passenger be added to the heap, and every passenger be removed from the heap. Still, with a million passengers and assuming that the lightest weighs 100 lbs, the n log k works out to a reasonably small number.

如果您随机获取乘客的权重,则性能会好得多.我在推荐引擎中使用了类似的东西(我从数百万个列表中选择了前 200 个项目).我通常最终只会将 50,000 或 70,000 个项目实际添加到堆中.

If you get the passengers' weights randomly, performance is much better. I use something quite like this for a recommendation engine (I select the top 200 items from a list of several million). I typically end up with only 50,000 or 70,000 items actually added to the heap.

我怀疑您会看到非常相似的情况:您的大多数候选人都会被拒绝,因为他们比已经在堆中最轻的人要轻.而 Peek 是一个 O(1) 操作.

I suspect that you'll see something quite similar: the majority of your candidates will be rejected because they're lighter than the lightest person already on the heap. And Peek is an O(1) operation.

关于堆选择和快速选择的更多信息,参见当理论遇上实践.简短版本:如果您选择的项目少于总数的 1%,那么堆选择明显优于快速选择.超过 1%,然后使用快速选择或类似 Introselect 的变体.

For a more information about the performance of heap select and quick select, see When theory meets practice. Short version: if you're selecting fewer than 1% of the total number of items, then heap select is a clear winner over quick select. More than 1%, then use quick select or a variant like Introselect.

相关文章