`std::list<>::sort()` - 为什么突然切换到自上而下的策略?
我记得从一开始,实现 std::list<>::sort()
的最流行方法是在 自下而上的时尚(另见是什么让 gcc std::list 排序实现如此之快?).
我记得有人恰当地将这种策略称为洋葱链"方法.
至少在 GCC 的 C++ 标准库实现中是这样的(例如,请参见 此处).这就是 MSVC 标准库版本中旧 Dimkumware 的 STL 以及 MSVC 一直到 VS2013 的所有版本中的情况.
然而,VS2015 提供的标准库突然不再遵循这种排序策略.VS2015 附带的库使用了一种相当简单的自顶向下归并排序的递归实现.这让我觉得很奇怪,因为自上而下的方法需要访问列表的中点才能将其分成两半.由于 std::list<>
不支持随机访问,找到中点的唯一方法是逐字迭代列表的一半.此外,一开始就需要知道列表中元素的总数(在 C++11 之前不一定是 O(1) 运算).
尽管如此,VS2015 中的 std::list<>::sort()
正是这样做的.这是该实现的摘录,它定位中点并执行递归调用
<代码>...迭代器 _Mid = _STD next(_First, _Size/2);_First = _Sort(_First, _Mid, _Pred, _Size/2);_Mid = _Sort(_Mid, _Last, _Pred, _Size - _Size/2);...
如您所见,他们只是漫不经心地使用 std::next
遍历列表的前半部分并到达 _Mid
迭代器.
我想知道这个转换背后的原因是什么?我所看到的只是在每个递归级别重复调用 std::next
的效率低下.天真的逻辑说这是慢.如果他们愿意付出这样的代价,他们很可能希望得到一些回报.那他们得到什么?我并没有立即看到这个算法具有更好的缓存行为(与原始的自底向上方法相比).我并没有立即认为它在预先排序的序列上表现更好.
当然,由于C++11 std::list<>
基本上需要存储它的元素计数,这使得上面的效率稍微高一些,因为我们总是提前知道元素计数.但这似乎仍然不足以证明在每个递归级别上进行顺序扫描是合理的.
(诚然,我没有尝试在实现之间进行竞争.也许那里有一些惊喜.)
解决方案请注意,此答案已更新,以解决以下评论中和问题之后提到的所有问题,方法是从列表数组到一个迭代器数组,同时保留了更快的自底向上归并排序算法,并消除了自顶向下归并排序算法递归导致堆栈溢出的小概率.
我最初没有考虑迭代器的原因是由于 VS2015 更改为自顶向下,让我相信尝试更改现有自底向上算法以使用迭代器存在一些问题,需要切换到较慢的顶部下算法.只有当我尝试自己分析切换到迭代器时,我才意识到有一个自底向上算法的解决方案.
在@sbi 的评论中,他询问了自上而下方法的作者 Stephan T. Lavavej,为什么要进行更改.Stephan 的回应是避免内存分配和默认构造分配器".VS2015 引入了不可默认构造和有状态的分配器,这在使用先前版本的列表数组时会出现问题,因为列表的每个实例分配一个虚拟节点,并且需要进行更改以处理无默认分配器.
Lavavej 的解决方案是改用迭代器来跟踪原始列表中的运行边界,而不是内部列表数组.合并逻辑改为使用 3 个迭代器参数,第一个参数是左运行开始的迭代器,第二个参数是左运行结束的迭代器 == 右运行开始的迭代器,第三个参数是右运行结束的迭代器.合并过程使用 std::list::splice 在合并操作期间移动原始列表中的节点.这具有异常安全的额外好处.如果调用者的比较函数抛出异常,列表将被重新排序,但不会发生数据丢失(假设拼接不会失败).使用之前的方案,如果发生异常,部分(或大部分)数据将在列表的内部数组中,并且数据将从原始列表中丢失.
但是不需要切换到自上而下的归并排序.最初,我认为 VS2015 切换到自上而下有一些未知的原因,我专注于以与 std::list::splice 相同的方式使用内部接口.后来我决定研究自下而上的切换以使用迭代器数组.我意识到存储在内部数组中的运行顺序是最新的(数组 [0] = 最右边)到最旧的(数组 [最后] = 最左边),并且它可以使用与 VS2015 自上而下的方法相同的基于迭代器的合并逻辑.
对于自下而上的归并排序,array[i] 是一个迭代器,指向具有 2^i 个节点的已排序子列表的开头,或者它为空(使用 std::list::end 表示为空).每个排序子列表的结尾将是数组中下一个先前非空条目中排序子列表的开头,或者如果在数组的开头,则在本地迭代器中(它指向最新的结尾)跑).与自顶向下方法类似,迭代器数组仅用于跟踪原始链表内已排序的运行边界,而合并过程使用 std::list::splice 移动原始链表内的节点.>
如果一个链表很大,节点分散,就会有很多缓存未命中.自下而上将比自上而下快约 30%(相当于自上而下比自下而上慢约 42%).再说一次,如果有足够的内存,通常将列表移动到数组或向量,对数组或向量进行排序,然后从排序后的数组或向量创建一个新列表通常会更快.
示例 C++ 代码:
#define ASZ 32模板 void SortList(std::list<T> &ll){if (ll.size() < 2)//如果无事可做则返回返回;std::list::iterator ai[ASZ];//迭代器数组std::list::iterator mi;//中间迭代器 (end lft, bgn rgt)std::list::iterator ei;//结束迭代器size_t i;for (i = 0; i typename std::list<T>::iterator Merge(std::list<T> &ll,类型名称 std::list<T>::iterator li,typename std::list<T>::iterator mi,类型名称 std::list::iterator ei){std::list::iterator ni;(*mi <*li) ?ni = mi : ni = li;而(1){if(*mi <*li){ll.splice(li, ll, mi++);如果(mi == ei)返回 ni;} 别的 {if(++li == mi)返回 ni;}}}
VS2019 的 std::list::sort() 替换代码示例(合并逻辑被制作成一个单独的内部函数,因为它现在在两个地方使用).
私有:模板<class_Pr2>迭代器_Merge(_Pr2 _Pred, 迭代器_First, 迭代器_Mid, 迭代器_Last){迭代器_Newfirst = _First;for (bool _Initial_loop = true;;_Initial_loop = false) {//[_First, _Mid) 和 [_Mid, _Last) 已排序且非空if (_DEBUG_LT_PRED(_Pred, *_Mid, *_First)) {//消耗_Mid如果(_Initial_loop){_Newfirst = _Mid;//更新返回值}splice(_First, *this, _Mid++);如果(_Mid == _Last){返回_Newfirst;//耗尽 [_Mid, _Last);完毕}}else {//消耗 _First++_第一;如果(_First == _Mid){返回_Newfirst;//耗尽 [_First, _Mid);完毕}}}}模板<class_Pr2>void _Sort(迭代器_First,迭代器_Last,_Pr2 _Pred,size_type _Size) {//排序[_First, _Last),使用_Pred,先返回new//_Size 必须是从 _First 到 _Last 的距离如果 (_Size <2) {返回;//没事做}const size_t _ASZ = 32;//数组大小迭代器_Ai[_ASZ];//要运行的迭代器数组迭代器_Mi;//中间迭代器迭代器_Li;//最后(结束)迭代器尺寸_t_I;//索引到_Aifor (_I = 0; _I <_ASZ; _I++)//空"大批_Ai[_I] = _Last;//_Ai[] == _Last =>空条目//将节点合并到数组中for (_Li = _First; _Li != _Last;) {_Mi = _Li++;对于 (_I = 0; (_I <_ASZ) && _Ai[_I] != _Last; _I++) {_Mi = _Merge(_Pass_fn(_Pred), _Ai[_I], _Mi, _Li);_Ai[_I] = _Last;}如果(_I == _ASZ)_一世 - ;_Ai[_I] = _Mi;}//合并数组运行为单次运行对于 (_I = 0; _I <_ASZ && _Ai[_I] == _Last; _I++);_Mi = _Ai[_I++];而 (1) {对于 (; _I < _ASZ && _Ai[_I] == _Last; _I++);如果(_I == _ASZ)休息;_Mi = _Merge(_Pass_fn(_Pred), _Ai[_I++], _Mi, _Last);}}
这个答案的其余部分是历史性的.
我能够根据@IgorTandetnik 的演示重现该问题(旧类型无法编译,新类型有效):
#include #include <列表>#include <内存>模板 类 MyAlloc : public std::allocator{上市:MyAlloc(T) {}//取消默认构造函数模板MyAlloc(const MyAlloc<U>& other) : std::allocator<T>(other) {}模板struct rebind { typedef MyAlloc;其他;};};int main(){std::list>l(MyAlloc(0));l.push_back(3);l.push_back(0);l.push_back(2);l.push_back(1);l.sort();返回0;}
我早在 2016 年 7 月就注意到了这一变化,并于 2016 年 8 月 1 日通过电子邮件向 P.J. Plauger 发送了有关这一变化的电子邮件.他的回复摘要:
<块引用>有趣的是,我们的更改日志并未反映此更改.那可能意味着它是建议的"由我们的一位大客户和我在代码审查中得到了.我现在只知道改变已经到来大约在 2015 年秋天.当我查看代码时,第一个令我印象深刻的是这条线:
迭代器 _Mid = _STD next(_First, _Size/2);
当然,对于一个大列表,这可能需要很长时间.
代码看起来比我在 1995 年初写的更优雅(!),但肯定有更糟糕的时间复杂度.那个版本是建模的在原始 STL 中 Stepanov、Lee 和 Musser 的方法之后.他们在算法选择上很少被发现是错误的.
我现在要恢复到我们最新已知的原始代码的良好版本.
我不知道 P.J. Plauger 恢复原始代码是否解决了新的分配器问题,或者 Microsoft 是否或如何与 Dinkumware 交互.
为了比较自顶向下和自底向上方法,我创建了一个包含 400 万个元素的链表,每个元素由一个 64 位无符号整数组成,假设我最终会得到一个几乎按顺序排列的节点的双向链表(即使它们会被动态分配),用随机数填充它们,然后对它们进行排序.节点不会移动,只是链接发生了变化,但现在遍历列表以随机顺序访问节点.然后我用另一组随机数填充那些随机排序的节点并再次对它们进行排序.我将 2015 年自上而下的方法与之前为匹配 2015 年所做的其他更改而修改的自下而上的方法进行了比较(sort() 现在使用谓词比较函数调用 sort(),而不是具有两个单独的函数).这些是结果.更新 - 我添加了一个基于节点指针的版本,并注意到了从列表中简单地创建一个向量、排序向量、复制回来的时间.
顺序节点:2015版本1.6秒,之前版本1.5秒随机节点:2015 版本 4.0 秒,先前版本 2.8 秒随机节点:基于节点指针的版本 2.6 秒随机节点:从列表创建向量,排序,复制回 1.25 秒
对于顺序节点,先前版本只是快一点,但对于随机节点,先前版本快 30%,节点指针版本快 35%,并从列表中创建一个向量,对向量进行排序,然后复制回来的速度提高了 69%.
下面是 std::list::sort() 的第一个替换代码,我用来比较之前的自下而上的小数组 (_BinList[]) 方法与 VS2015 的自上而下的方法,我希望比较公平,所以我修改了 < 的副本列表 >.
void sort(){//排序序列,使用操作符<排序(少<>());}模板无效排序(_Pr2 _Pred){//订单序列,使用 _Pred如果 (2 > this->_Mysize())返回;const size_t _MAXBINS = 25;_Myt _Templist, _Binlist[_MAXBINS];而(!空()){//_Templist = 下一个元素_Templist._Splice_same(_Templist.begin(), *this, begin(),++开始(),1);//与更大的 bin 数组合并size_t _Bin;for (_Bin = 0; _Bin < _MAXBINS && !_Binlist[_Bin].empty();++_Bin)_Templist.merge(_Binlist[_Bin], _Pred);//不要越过数组的末尾如果(_Bin == _MAXBINS)_斌--;//使用合并列表更新 bin,空 _Templist_Binlist[_Bin].swap(_Templist);}//将 bin 合并回调用者列表对于 (size_t _Bin = 0; _Bin <_MAXBINS; _Bin++)if(!_Binlist[_Bin].empty())this->merge(_Binlist[_Bin], _Pred);}
我做了一些小改动.原始代码在名为 _Maxbin 的变量中跟踪实际最大 bin,但最终合并的开销足够小,我删除了与 _Maxbin 关联的代码.在数组构建期间,原始代码的内部循环合并到 _Binlist[] 元素,然后交换到 _Templist,这似乎毫无意义.我将内部循环更改为仅合并到 _Templist,仅在找到空的 _Binlist[] 元素后才进行交换.
下面是我用于另一个比较的 std::list::sort() 的基于节点指针的替换.这消除了与分配相关的问题.如果可能发生比较异常并且发生了比较异常,则必须将数组和临时列表 (pNode) 中的所有节点追加回原始列表,否则比较异常可能会被视为小于比较.
void sort(){//排序序列,使用操作符<排序(少<>());}模板无效排序(_Pr2 _Pred){//订单序列,使用 _Predconst size_t _NUMBINS = 25;_Nodeptr aList[_NUMBINS];//列表数组_Nodeptr pNode;_Nodeptr pNext;_Nodeptr pPrev;if (this->size() <2)//如果无事可做则返回返回;this->_Myhead()->_Prev->_Next = 0;//设置最后一个节点 ->_Next = 0pNode = this->_Myhead()->_Next;//将 ptr 设置为列表的开头size_t i;for (i = 0; i <_NUMBINS; i++)//零数组aList[i] = 0;while (pNode != 0)//将节点合并到数组中{pNext = pNode->_Next;pNode->_Next = 0;for (i = 0; (i <_NUMBINS) && (aList[i] != 0); i++){pNode = _MergeN(_Pred, aList[i], pNode);aList[i] = 0;}如果(我 == _NUMBINS)一世 - ;aList[i] = pNode;pNode = pNext;}pNode = 0;//将数组合并为一个列表for (i = 0; i <_NUMBINS; i++)pNode = _MergeN(_Pred, aList[i], pNode);this->_Myhead()->_Next = pNode;//更新哨兵节点链接pPrev = this->_Myhead();//和 _Prev 指针而 (pNode){pNode->_Prev = pPrev;pPrev = pNode;pNode = pNode->_Next;}pPrev->_Next = this->_Myhead();this->_Myhead()->_Prev = pPrev;}模板_Nodeptr _MergeN(_Pr2 &_Pred, _Nodeptr pSrc1, _Nodeptr pSrc2){_Nodeptr pDst = 0;//目标头指针_Nodeptr *ppDst = &pDst;//指向头部或上一个的指针->_下一个如果(pSrc1 == 0)返回 pSrc2;如果(pSrc2 == 0)返回 pSrc1;而 (1){if (_DEBUG_LT_PRED(_Pred, pSrc2->_Myval, pSrc1->_Myval)){*ppDst = pSrc2;pSrc2 = *(ppDst = &pSrc2->_Next);如果(pSrc2 == 0){*ppDst = pSrc1;休息;}}别的{*ppDst = pSrc1;pSrc1 = *(ppDst = &pSrc1->_Next);如果(pSrc1 == 0){*ppDst = pSrc2;休息;}}}返回 pDst;}
I remember that since the beginning of times the most popular approach to implementing std::list<>::sort()
was the classic Merge Sort algorithm implemented in bottom-up fashion (see also What makes the gcc std::list sort implementation so fast?).
I remember seeing someone aptly refer to this strategy as "onion chaining" approach.
At least that's the way it is in GCC's implementation of C++ standard library (see, for example, here). And this is how it was in old Dimkumware's STL in MSVC version of standard library, as well as in all versions of MSVC all the way to VS2013.
However, the standard library supplied with VS2015 suddenly no longer follows this sorting strategy. The library shipped with VS2015 uses a rather straightforward recursive implementation of top-down Merge Sort. This strikes me as strange, since top-down approach requires access to the mid-point of the list in order to split it in half. Since std::list<>
does not support random access, the only way to find that mid-point is to literally iterate through half of the list. Also, at the very beginning it is necessary to know the total number of elements in the list (which was not necessarily an O(1) operation before C++11).
Nevertheless, std::list<>::sort()
in VS2015 does exactly that. Here's an excerpt from that implementation that locates the mid-point and performs recursive calls
...
iterator _Mid = _STD next(_First, _Size / 2);
_First = _Sort(_First, _Mid, _Pred, _Size / 2);
_Mid = _Sort(_Mid, _Last, _Pred, _Size - _Size / 2);
...
As you can see, they just nonchalantly use std::next
to walk through the first half of the list and arrive at _Mid
iterator.
What could be the reason behind this switch, I wonder? All I see is a seemingly obvious inefficiency of repetitive calls to std::next
at each level of recursion. Naive logic says that this is slower. If they are willing to pay this kind of price, they probably expect to get something in return. What are they getting then? I don't immediately see this algorithm as having better cache behavior (compared to the original bottom-up approach). I don't immediately see it as behaving better on pre-sorted sequences.
Granted, since C++11 std::list<>
is basically required to store its element count, which makes the above slightly more efficient, since we always know the element count in advance. But that still does not seem to be enough to justify the sequential scan on each level of recursion.
(Admittedly, I haven't tried to race the implementations against each other. Maybe there are some surprises there.)
解决方案Note this answer has been updated to address all of the issues mentioned in the comments below and after the question, by making the same change from an array of lists to an array of iterators, while retaining the faster bottom up merge sort algorithm, and eliminating the small chance of stack overflow due to recursion with the top down merge sort algorithm.
The reason I didn't originally consider iterators was due to the VS2015 change to top down, leading me to believe there was some issue with trying to change the existing bottom up algorithm to use iterators, requiring a switch to the slower top down algorithm. It was only when I tried to analyze the switch to iterators myself that I realized there was a solution for bottom up algorithm.
In @sbi's comment, he asked the author of the top down approach, Stephan T. Lavavej, why the change was made. Stephan's response was "to avoid memory allocation and default constructing allocators". VS2015 introduced non-default-constructible and stateful allocators, which presents an issue when using the prior version's array of lists, as each instance of a list allocates a dummy node, and a change would be needed to handle no default allocator.
Lavavej's solution was to switch to using iterators to keep track of run boundaries within the original list instead of an internal array of lists. The merge logic was changed to use 3 iterator parameters, 1st parameter is iterator to start of left run, 2nd parameter is iterator to end of left run == iterator to start of right run, 3rd parameter is iterator to end of right run. The merge process uses std::list::splice to move nodes within the original list during merge operations. This has the added benefit of being exception safe. If a caller's compare function throws an exception, the list will be re-ordered, but no loss of data will occur (assuming splice can't fail). With the prior scheme, some (or most) of the data would be in the internal array of lists if an exception occurred, and data would be lost from the original list.
However the switch to top down merge sort was not needed. Initially, thinking there was some unknown to me reason for VS2015 switch to top down, I focused on using the internal interfaces in the same manner as std::list::splice. I later decided to investigate switching bottom up to use an array of iterators. I realized the order of runs stored in the internal array was newest (array[0] = rightmost) to oldest (array[last] = leftmost), and that it could use the same iterator based merge logic as VS2015's top down approach.
For bottom up merge sort, array[i] is an iterator to the start of a sorted sub-list with 2^i nodes, or it is empty (using std::list::end to indicate empty). The end of each sorted sub-list will be the start of a sorted sub-list in the next prior non-empty entry in the array, or if at the start of the array, in a local iterator (it points to end of newest run). Similar to the top down approach, the array of iterators is only used to keep track of sorted run boundaries within the original linked list, while the merge process uses std::list::splice to move nodes within the original linked list.
If a linked list is large and the nodes scattered, there will be a lot of cache misses. Bottom up will be about 30% faster than top down (equivalent to stating top down is about 42% slower than bottom up ). Then again, if there's enough memory, it would usually be faster to move the list to an array or vector, sort the array or vector, then create a new list from the sorted array or vector.
Example C++ code:
#define ASZ 32
template <typename T>
void SortList(std::list<T> &ll)
{
if (ll.size() < 2) // return if nothing to do
return;
std::list<T>::iterator ai[ASZ]; // array of iterators
std::list<T>::iterator mi; // middle iterator (end lft, bgn rgt)
std::list<T>::iterator ei; // end iterator
size_t i;
for (i = 0; i < ASZ; i++) // "clear" array
ai[i] = ll.end();
// merge nodes into array
for (ei = ll.begin(); ei != ll.end();) {
mi = ei++;
for (i = 0; (i < ASZ) && ai[i] != ll.end(); i++) {
mi = Merge(ll, ai[i], mi, ei);
ai[i] = ll.end();
}
if(i == ASZ)
i--;
ai[i] = mi;
}
// merge array into single list
ei = ll.end();
for(i = 0; (i < ASZ) && ai[i] == ei; i++);
mi = ai[i++];
while(1){
for( ; (i < ASZ) && ai[i] == ei; i++);
if (i == ASZ)
break;
mi = Merge(ll, ai[i++], mi, ei);
}
}
template <typename T>
typename std::list<T>::iterator Merge(std::list<T> &ll,
typename std::list<T>::iterator li,
typename std::list<T>::iterator mi,
typename std::list<T>::iterator ei)
{
std::list<T>::iterator ni;
(*mi < *li) ? ni = mi : ni = li;
while(1){
if(*mi < *li){
ll.splice(li, ll, mi++);
if(mi == ei)
return ni;
} else {
if(++li == mi)
return ni;
}
}
}
Example replacement code for VS2019's std::list::sort() (the merge logic was made into a separate internal function, since it's now used in two places).
private:
template <class _Pr2>
iterator _Merge(_Pr2 _Pred, iterator _First, iterator _Mid, iterator _Last){
iterator _Newfirst = _First;
for (bool _Initial_loop = true;;
_Initial_loop = false) { // [_First, _Mid) and [_Mid, _Last) are sorted and non-empty
if (_DEBUG_LT_PRED(_Pred, *_Mid, *_First)) { // consume _Mid
if (_Initial_loop) {
_Newfirst = _Mid; // update return value
}
splice(_First, *this, _Mid++);
if (_Mid == _Last) {
return _Newfirst; // exhausted [_Mid, _Last); done
}
}
else { // consume _First
++_First;
if (_First == _Mid) {
return _Newfirst; // exhausted [_First, _Mid); done
}
}
}
}
template <class _Pr2>
void _Sort(iterator _First, iterator _Last, _Pr2 _Pred,
size_type _Size) { // order [_First, _Last), using _Pred, return new first
// _Size must be distance from _First to _Last
if (_Size < 2) {
return; // nothing to do
}
const size_t _ASZ = 32; // array size
iterator _Ai[_ASZ]; // array of iterators to runs
iterator _Mi; // middle iterator
iterator _Li; // last (end) iterator
size_t _I; // index to _Ai
for (_I = 0; _I < _ASZ; _I++) // "empty" array
_Ai[_I] = _Last; // _Ai[] == _Last => empty entry
// merge nodes into array
for (_Li = _First; _Li != _Last;) {
_Mi = _Li++;
for (_I = 0; (_I < _ASZ) && _Ai[_I] != _Last; _I++) {
_Mi = _Merge(_Pass_fn(_Pred), _Ai[_I], _Mi, _Li);
_Ai[_I] = _Last;
}
if (_I == _ASZ)
_I--;
_Ai[_I] = _Mi;
}
// merge array runs into single run
for (_I = 0; _I < _ASZ && _Ai[_I] == _Last; _I++);
_Mi = _Ai[_I++];
while (1) {
for (; _I < _ASZ && _Ai[_I] == _Last; _I++);
if (_I == _ASZ)
break;
_Mi = _Merge(_Pass_fn(_Pred), _Ai[_I++], _Mi, _Last);
}
}
The remainder of this answer is historical.
I was able to reproduce the issue (old sort fails to compile, new one works) based on a demo from @IgorTandetnik:
#include <iostream>
#include <list>
#include <memory>
template <typename T>
class MyAlloc : public std::allocator<T> {
public:
MyAlloc(T) {} // suppress default constructor
template <typename U>
MyAlloc(const MyAlloc<U>& other) : std::allocator<T>(other) {}
template< class U > struct rebind { typedef MyAlloc<U> other; };
};
int main()
{
std::list<int, MyAlloc<int>> l(MyAlloc<int>(0));
l.push_back(3);
l.push_back(0);
l.push_back(2);
l.push_back(1);
l.sort();
return 0;
}
I noticed this change back in July, 2016 and emailed P.J. Plauger about this change on August 1, 2016. A snippet of his reply:
Interestingly enough, our change log doesn't reflect this change. That probably means it was "suggested" by one of our larger customers and got by me on the code review. All I know now is that the change came in around the autumn of 2015. When I reviewed the code, the first thing that struck me was the line:
iterator _Mid = _STD next(_First, _Size / 2);
which, of course, can take a very long time for a large list.
The code looks a bit more elegant than what I wrote in early 1995(!), but definitely has worse time complexity. That version was modeled after the approach by Stepanov, Lee, and Musser in the original STL. They are seldom found to be wrong in their choice of algorithms.
I'm now reverting to our latest known good version of the original code.
I don't know if P.J. Plauger's reversion to the original code dealt with the new allocator issue, or if or how Microsoft interacts with Dinkumware.
For a comparison of the top down versus bottom up methods, I created a linked list with 4 million elements, each consisting of one 64 bit unsigned integer, assuming I would end up with a doubly linked list of nearly sequentially ordered nodes (even though they would be dynamically allocated), filled them with random numbers, then sorted them. The nodes don't move, only the linkage is changed, but now traversing the list accesses the nodes in random order. I then filled those randomly ordered nodes with another set of random numbers and sorted them again. I compared the 2015 top down approach with the prior bottom up approach modified to match the other changes made for 2015 (sort() now calls sort() with a predicate compare function, rather than having two separate functions). These are the results. update - I added a node pointer based version and also noted the time for simply creating a vector from list, sorting vector, copy back.
sequential nodes: 2015 version 1.6 seconds, prior version 1.5 seconds
random nodes: 2015 version 4.0 seconds, prior version 2.8 seconds
random nodes: node pointer based version 2.6 seconds
random nodes: create vector from list, sort, copy back 1.25 seconds
For sequential nodes, the prior version is only a bit faster, but for random nodes, the prior version is 30% faster, and the node pointer version 35% faster, and creating a vector from the list, sorting the vector, then copying back is 69% faster.
Below is the first replacement code for std::list::sort() I used to compare the prior bottom up with small array (_BinList[]) method versus VS2015's top down approach I wanted the comparison to be fair, so I modified a copy of < list >.
void sort()
{ // order sequence, using operator<
sort(less<>());
}
template<class _Pr2>
void sort(_Pr2 _Pred)
{ // order sequence, using _Pred
if (2 > this->_Mysize())
return;
const size_t _MAXBINS = 25;
_Myt _Templist, _Binlist[_MAXBINS];
while (!empty())
{
// _Templist = next element
_Templist._Splice_same(_Templist.begin(), *this, begin(),
++begin(), 1);
// merge with array of ever larger bins
size_t _Bin;
for (_Bin = 0; _Bin < _MAXBINS && !_Binlist[_Bin].empty();
++_Bin)
_Templist.merge(_Binlist[_Bin], _Pred);
// don't go past end of array
if (_Bin == _MAXBINS)
_Bin--;
// update bin with merged list, empty _Templist
_Binlist[_Bin].swap(_Templist);
}
// merge bins back into caller's list
for (size_t _Bin = 0; _Bin < _MAXBINS; _Bin++)
if(!_Binlist[_Bin].empty())
this->merge(_Binlist[_Bin], _Pred);
}
I made some minor changes. The original code kept track of the actual maximum bin in a variable named _Maxbin, but the overhead in the final merge is small enough that I removed the code associated with _Maxbin. During the array build, the original code's inner loop merged into a _Binlist[] element, followed by a swap into _Templist, which seemed pointless. I changed the inner loop to just merge into _Templist, only swapping once an empty _Binlist[] element is found.
Below is a node pointer based replacement for std::list::sort() I used for yet another comparison. This eliminates allocation related issues. If a compare exception is possible and occurred, all the nodes in the array and temp list (pNode) would have to be appended back to the original list, or possibly a compare exception could be treated as a less than compare.
void sort()
{ // order sequence, using operator<
sort(less<>());
}
template<class _Pr2>
void sort(_Pr2 _Pred)
{ // order sequence, using _Pred
const size_t _NUMBINS = 25;
_Nodeptr aList[_NUMBINS]; // array of lists
_Nodeptr pNode;
_Nodeptr pNext;
_Nodeptr pPrev;
if (this->size() < 2) // return if nothing to do
return;
this->_Myhead()->_Prev->_Next = 0; // set last node ->_Next = 0
pNode = this->_Myhead()->_Next; // set ptr to start of list
size_t i;
for (i = 0; i < _NUMBINS; i++) // zero array
aList[i] = 0;
while (pNode != 0) // merge nodes into array
{
pNext = pNode->_Next;
pNode->_Next = 0;
for (i = 0; (i < _NUMBINS) && (aList[i] != 0); i++)
{
pNode = _MergeN(_Pred, aList[i], pNode);
aList[i] = 0;
}
if (i == _NUMBINS)
i--;
aList[i] = pNode;
pNode = pNext;
}
pNode = 0; // merge array into one list
for (i = 0; i < _NUMBINS; i++)
pNode = _MergeN(_Pred, aList[i], pNode);
this->_Myhead()->_Next = pNode; // update sentinel node links
pPrev = this->_Myhead(); // and _Prev pointers
while (pNode)
{
pNode->_Prev = pPrev;
pPrev = pNode;
pNode = pNode->_Next;
}
pPrev->_Next = this->_Myhead();
this->_Myhead()->_Prev = pPrev;
}
template<class _Pr2>
_Nodeptr _MergeN(_Pr2 &_Pred, _Nodeptr pSrc1, _Nodeptr pSrc2)
{
_Nodeptr pDst = 0; // destination head ptr
_Nodeptr *ppDst = &pDst; // ptr to head or prev->_Next
if (pSrc1 == 0)
return pSrc2;
if (pSrc2 == 0)
return pSrc1;
while (1)
{
if (_DEBUG_LT_PRED(_Pred, pSrc2->_Myval, pSrc1->_Myval))
{
*ppDst = pSrc2;
pSrc2 = *(ppDst = &pSrc2->_Next);
if (pSrc2 == 0)
{
*ppDst = pSrc1;
break;
}
}
else
{
*ppDst = pSrc1;
pSrc1 = *(ppDst = &pSrc1->_Next);
if (pSrc1 == 0)
{
*ppDst = pSrc2;
break;
}
}
}
return pDst;
}
相关文章