C++11 线程与异步性能(VS2013)

我觉得我在这里遗漏了一些东西......

I feel like I'm missing something here...

我稍微修改了一些代码,从使用 std::thread 更改为 std::async 并注意到性能有显着提高.我编写了一个简单的测试,我认为它使用 std::thread 运行时几乎与使用 std::async 运行时一样.

I slightly altered some code to change from using std::thread to std::async and noticed a substantial performance increase. I wrote up a simple test which I assume should run nearly identically using std::thread as it does using std::async.

std::atomic<int> someCount = 0;
const int THREADS = 200;
std::vector<std::thread> threadVec(THREADS);
std::vector<std::future<void>> futureVec(THREADS);
auto lam = [&]()
{
    for (int i = 0; i < 100; ++i)
        someCount++;
};

for (int i = 0; i < THREADS; ++i)
    threadVec[i] = std::thread(lam);
for (int i = 0; i < THREADS; ++i)
    threadVec[i].join();

for (int i = 0; i < THREADS; ++i)
    futureVec[i] = std::async(std::launch::async, lam);
for (int i = 0; i < THREADS; ++i)
    futureVec[i].get();

我没有深入分析,但一些初步结果表明 std::async 代码的运行速度似乎快了 10 倍左右!关闭优化后结果略有不同,我也尝试切换执行顺序.

I didn't get too deep into analysis, but some preliminary results made it seem that std::async code ran around 10X faster! Results varied slightly with optimizations off, I also tried switching the execution order.

这是一些 Visual Studio 编译器问题吗?或者是否有一些我忽略的更深层次的实现问题会导致这种性能差异?我认为 std::asyncstd::thread 调用的包装器?

Is this some Visual Studio compiler issue? Or is there some deeper implementation issue I'm overlooking that would account for this performance difference? I thought that std::async was a wrapper around the std::thread calls?

同时考虑到这些差异,我想知道在这里获得最佳性能的方法是什么?(创建线程的不仅仅是 std::thread 和 std::async )

Also considering these differences, I'm wondering what would be the way to get the best performance here? (There are more than std::thread and std::async which create threads)

如果我想要分离的线程呢?(据我所知,std::async 不能这样做)

What about if I wanted detached threads? (std::async can't do that as far as I'm aware)

推荐答案

当您使用 async 时,您不是在创建新线程,而是重用线程池中可用的线程.创建和销毁线程是一项非常昂贵的操作,在 Windows 操作系统中需要大约 200 000 个 CPU 周期.最重要的是,请记住,线程数远大于 CPU 内核数意味着操作系统需要花费更多时间来创建它们并调度它们以使用每个内核中的可用 CPU 时间.

When you're using async you are not creating new threads, instead you reuse the ones available in a thread pool. Creating and destroying threads is a very expensive operation that requires about 200 000 CPU cycles in Windows OS. On top of that, remember that having a number of threads much bigger than the number of CPU cores means that the operating system needs to spend more time creating them and scheduling them to use the available CPU time in each of the cores.

更新:为了看到使用 std::async 使用的线程数比使用 std::thread 少很多,我修改了测试代码以计算使用的线程数以如下任一方式运行时使用的唯一线程 ID.结果在我的电脑上显示了这个结果:

UPDATE: To see that the numbers of threads being used using std::async is a lot smaller than using std::thread, I have modified the testing code to count the number of unique thread ids used when run either way as below. Results in my PC shows this result:

Number of threads used running std::threads = 200
Number of threads used to run std::async = 4

但是在我的 PC 中运行 std::async 的线程数显示从 2 到 4 的变化.这基本上意味着 std::async 将重用线程而不是每次都创建新线程.奇怪的是,如果我通过在 for 循环中将 100 次迭代替换为 1000000 次来增加 lambda 的计算时间,异步线程的数量会增加到 9 但始终使用原始线程给出 200.值得记住的是一旦一个线程完成,std::thread::id 的值可能会被另一个线程重用"

but the number of threads running std::async show variations from 2 to 4 in my PC. It basically means that std::async will reuse threads instead of creating new ones every time. Curiously, if I increase the computing time of the lambda by replacing 100 by 1000000 iterations in the for loop, the number of async threads increases to 9 but using raw threads it always gives 200. Worth keeping in mind that "Once a thread has finished, the value of std::thread::id may be reused by another thread"

这是测试代码:

#include <atomic>
#include <vector>
#include <future>
#include <thread>
#include <unordered_set>
#include <iostream>

int main()
{
    std::atomic<int> someCount = 0;
    const int THREADS = 200;
    std::vector<std::thread> threadVec(THREADS);
    std::vector<std::future<void>> futureVec(THREADS);

    std::unordered_set<std::thread::id> uniqueThreadIdsAsync;
    std::unordered_set<std::thread::id> uniqueThreadsIdsThreads;
    std::mutex mutex;

    auto lam = [&](bool isAsync)
    {
        for (int i = 0; i < 100; ++i)
            someCount++;

        auto threadId = std::this_thread::get_id();
        if (isAsync)
        {
            std::lock_guard<std::mutex> lg(mutex);
            uniqueThreadIdsAsync.insert(threadId);
        }
        else
        {
            std::lock_guard<std::mutex> lg(mutex);
            uniqueThreadsIdsThreads.insert(threadId);
        }
    };

    for (int i = 0; i < THREADS; ++i)
        threadVec[i] = std::thread(lam, false); 

    for (int i = 0; i < THREADS; ++i)
        threadVec[i].join();
    std::cout << "Number of threads used running std::threads = " << uniqueThreadsIdsThreads.size() << std::endl;

    for (int i = 0; i < THREADS; ++i)
        futureVec[i] = std::async(lam, true);
    for (int i = 0; i < THREADS; ++i)
        futureVec[i].get();
    std::cout << "Number of threads used to run std::async = " << uniqueThreadIdsAsync.size() << std::endl;
}

相关文章