英特尔 TBB 的可扩展分配器如何工作?
英特尔线程构建块中的 tbb::scalable_allocator
实际上在幕后做了什么?
What does the tbb::scalable_allocator
in Intel Threading Building Blocks actually do under the hood ?
它肯定是有效的.我刚刚使用它通过更改单个 std::vector< 将应用程序的执行时间减少了 25%(并且在 4 核系统上看到 CPU 利用率从 ~200% 增加到 350%).T>
到 std::vector
It can certainly be effective. I've just used it to take 25% off an apps' execution time (and see an increase in CPU utilization from ~200% to 350% on a 4-core system) by changing a single std::vector<T>
to std::vector<T,tbb::scalable_allocator<T> >
. On the other hand in another app I've seen it double an already large memory consumption and send things to swap city.
英特尔自己的文档并没有提供太多内容(例如,在此末尾的一小部分 常见问题解答).在我亲自深入研究它的代码之前,谁能告诉我它使用了什么技巧?
Intel's own documentation doesn't give a lot away (e.g a short section at the end of this FAQ). Can anyone tell me what tricks it uses before I go and dig into its code myself ?
更新:只是第一次使用 TBB 3.0,并且从可扩展分配器中看到了我迄今为止最好的加速.将单个 vector
更改为 vector
将某些东西的运行时间从 85 秒减少到 35 秒(Debian Lenny、Core2、TBB 3.0 测试).
UPDATE: Just using TBB 3.0 for the first time, and seen my best speedup from scalable_allocator yet. Changing a single vector<int>
to a vector<int,scalable_allocator<int> >
reduced the runtime of something from 85s to 35s (Debian Lenny, Core2, with TBB 3.0 from testing).
推荐答案
有一篇关于分配器的好论文:英特尔线程构建模块中可扩展多核软件的基础
There is a good paper on the allocator: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks
我的有限经验:我使用 tbb::scalable_allocator 为我的 AI 应用程序重载了全局新建/删除.但是时间配置文件几乎没有变化.不过我没有比较内存使用情况.
My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn't compare the memory usage though.
相关文章