与 win32 CRITICAL_SECTION 相比的 std::mutex 性能

2022-01-07 00:00:00 thread-safety synchronization mutex c++ stl

std::mutex 的性能与 CRITICAL_SECTION 相比如何?是一样的吗?

how does the performance of std::mutex compared to CRITICAL_SECTION? is it on par?

我需要轻量级同步对象(不需要是进程间对象)是否有任何 STL 类接近 CRITICAL_SECTION 除了 std::mutex ?

I need lightweight synchronization object (doesn't need to be an interprocess object) is there any STL class that close to CRITICAL_SECTION other than std::mutex ?

推荐答案

请看我在答案末尾的更新,自 Visual Studio 2015 以来情况发生了巨大变化.原始答案如下.

Please see my updates at the end of the answer, the situation has dramatically changed since Visual Studio 2015. The original answer is below.

我做了一个非常简单的测试,根据我的测量结果,std::mutexCRITICAL_SECTION 慢了大约 50-70 倍.

I made a very simple test and according to my measurements the std::mutex is around 50-70x slower than CRITICAL_SECTION.

std::mutex:       18140574us
CRITICAL_SECTION: 296874us

经过更多测试,结果证明它取决于线程数(拥塞)和 CPU 内核数.一般情况下,std::mutex 比较慢,具体多少,看使用情况.以下是更新的测试结果(在配备 Core i5-4258U、Windows 10、Bootcamp 的 MacBook Pro 上测试):

After some more tests it turned out it depends on number of threads (congestion) and number of CPU cores. Generally, the std::mutex is slower, but how much, it depends on use. Following are updated test results (tested on MacBook Pro with Core i5-4258U, Windows 10, Bootcamp):

Iterations: 1000000
Thread count: 1
std::mutex:       78132us
CRITICAL_SECTION: 31252us
Thread count: 2
std::mutex:       687538us
CRITICAL_SECTION: 140648us
Thread count: 4
std::mutex:       1031277us
CRITICAL_SECTION: 703180us
Thread count: 8
std::mutex:       86779418us
CRITICAL_SECTION: 1634123us
Thread count: 16
std::mutex:       172916124us
CRITICAL_SECTION: 3390895us

以下是产生此输出的代码.Visual Studio 2012编译,默认项目设置,Win32发布配置.请注意,这个测试可能并不完全正确,但它让我在将代码从使用 CRITICAL_SECTION 切换到 std::mutex 之前三思而后行.

Following is the code that produced this output. Compiled with Visual Studio 2012, default project settings, Win32 release configuration. Please note that this test may not be perfectly correct but it made me think twice before switching my code from using CRITICAL_SECTION to std::mutex.

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us 
";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us 
";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection( &g_critSec );

    std::cout << "Iterations: " << g_cRepeatCount << "
";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "
";
        testRound(i);
        Sleep(1000);
    }

    DeleteCriticalSection( &g_critSec );

    // Added 10/27/2017 to try to prevent the compiler to completely
    // optimize out the code around g_shmem if it wouldn't be used anywhere.
    std::cout << "Shared variable value: " << g_shmem << std::endl;
    getchar();
    return 0;
}

2017 年 10 月 27 日更新 (1):一些答案表明这不是一个现实的测试或不代表现实世界"的场景.确实如此,此测试试图测量 std::mutex 的开销,并不是要证明差异对于 99% 的应用程序可以忽略不计.

Update 10/27/2017 (1): Some answers suggest that this is not a realistic test or does not represent a "real world" scenario. That's true, this test tries to measure the overhead of the std::mutex, it's not trying to prove that the difference is negligible for 99% of applications.

2017 年 10 月 27 日更新 (2):自 Visual Studio 2015 (VC140) 以来,情况似乎已经改变,有利于 std::mutex.我使用了 VS2017 IDE,与上面的代码完全相同,x64 版本配置,禁用优化,我只是为每个测试切换了平台工具集".结果非常令人惊讶,我真的很好奇 VC140 中挂了什么.

Update 10/27/2017 (2): Seems like the situation has changed in favor for std::mutex since Visual Studio 2015 (VC140). I used VS2017 IDE, exactly the same code as above, x64 release configuration, optimizations disabled and I simply switched the "Platform Toolset" for each test. The results are very surprising and I am really curious what has hanged in VC140.

2020 年 2 月 25 日更新 (3):使用 Visual Studio 2019 (Toolset v142) 重新运行测试,情况仍然相同:std::mutexCRITICAL_SECTION 快两到三倍.

Update 02/25/2020 (3): Reran the test with Visual Studio 2019 (Toolset v142), and situation is still the same: std::mutex is two to three times faster than CRITICAL_SECTION.

相关文章