英特尔 TBB 的可扩展分配器如何工作?

2022-01-07 00:00:00 multithreading malloc c++ stl tbb

英特尔线程构建块中的 tbb::scalable_allocator 实际上在幕后做了什么?

What does the tbb::scalable_allocator in Intel Threading Building Blocks actually do under the hood ?

它肯定是有效的.我刚刚使用它通过更改单个 std::vector< 将应用程序的执行时间减少了 25%(并且在 4 核系统上看到 CPU 利用率从 ~200% 增加到 350%).T>std::vector>.另一方面,在另一个应用程序中,我看到它使已经很大的内存消耗加倍并将事物发送到交换城市.

It can certainly be effective. I've just used it to take 25% off an apps' execution time (and see an increase in CPU utilization from ~200% to 350% on a 4-core system) by changing a single std::vector<T> to std::vector<T,tbb::scalable_allocator<T> >. On the other hand in another app I've seen it double an already large memory consumption and send things to swap city.

英特尔自己的文档并没有提供太多内容(例如,在此末尾的一小部分 常见问题解答).在我亲自深入研究它的代码之前,谁能告诉我它使用了什么技巧?

Intel's own documentation doesn't give a lot away (e.g a short section at the end of this FAQ). Can anyone tell me what tricks it uses before I go and dig into its code myself ?

更新:只是第一次使用 TBB 3.0,并且从可扩展分配器中看到了我迄今为止最好的加速.将单个 vector 更改为 vector> 将某些东西的运行时间从 85 秒减少到 35 秒(Debian Lenny、Core2、TBB 3.0 测试).

UPDATE: Just using TBB 3.0 for the first time, and seen my best speedup from scalable_allocator yet. Changing a single vector<int> to a vector<int,scalable_allocator<int> > reduced the runtime of something from 85s to 35s (Debian Lenny, Core2, with TBB 3.0 from testing).

推荐答案

有一篇关于分配器的好论文:英特尔线程构建模块中可扩展多核软件的基础

There is a good paper on the allocator: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks

我的有限经验:我使用 tbb::scalable_allocator 为我的 AI 应用程序重载了全局新建/删除.但是时间配置文件几乎没有变化.不过我没有比较内存使用情况.

My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn't compare the memory usage though.

相关文章