在 Visual Studio 中,与 std::async 一起使用时不会调用“thread_local"变量的析构函数,这是错误吗?

以下代码

#include #include <未来>#include <线程>#include <mutex>std::mutex m;结构 Foo {富(){std::unique_lock锁{m};std::cout <<"Foo 在线程中创建" <<std::this_thread::get_id() <<"
";}~Foo() {std::unique_lock锁{m};std::cout <<"Foo 在线程中删除了" <<std::this_thread::get_id() <<"
";}无效证明MyExistance(){std::unique_lock锁{m};std::cout <<"Foo this = " <<这<<<"
";}};int threadFunc() {静态thread_local Foo some_thread_var;//证明变量初始化some_thread_var.proveMyExistance();//线程运行了一段时间std::this_thread::sleep_for(std::chrono::milliseconds{100});返回 1;}int main() {auto a1 = std::async(std::launch::async, threadFunc);auto a2 = std::async(std::launch::async, threadFunc);auto a3 = std::async(std::launch::async, threadFunc);a1.wait();a2.wait();a3.wait();std::this_thread::sleep_for(std::chrono::milliseconds{1000});返回0;}

在 macOS 中编译并运行 width clang:

clang++ test.cpp -std=c++14 -pthread./a.out

得到结果

<块引用>

Foo 在线程 0x70000d9f2000 中创建Foo 在线程 0x70000daf8000 中创建Foo 在线程 0x70000da75000 中创建Foo 这 = 0x7fd871d00000Foo 这 = 0x7fd871c02af0富这个 = 0x7fd871e00000Foo 在线程 0x70000daf8000 中删除Foo 在线程 0x70000da75000 中删除Foo 已在线程 0x70000d9f2000 中删除

在 Visual Studio 2015 Update 3 中编译并运行:

<块引用>

Foo 在线程 7180 中创建富这个 = 00000223B3344120Foo 在线程 8712 中创建富这个 = 00000223B3346750Foo 在线程 11220 中创建富这个 = 00000223B3347E60

不调用析构函数.

这是一个错误还是一些未定义的灰色地带?

附言

如果最后的sleep std::this_thread::sleep_for(std::chrono::milliseconds{1000}); 不够长,你可能看不到全部3个Delete"有时会发消息.

当使用 std::thread 而不是 std::async 时,两个平台都会调用析构函数,并且总是会打印所有 3 个删除"消息.

解决方案

介绍性说明:我现在对此有了更多了解,因此重新编写了我的答案.感谢@super、@M.M 和(后来的)@DavidHaim 和@NoSenseEtAl 让我走上正轨.

tl;dr Microsoft 的 std::async 实现不符合标准,但他们有自己的理由,一旦您了解,他们所做的实际上可能很有用正确.

对于那些不想要的人来说,编写 std::async 的替代替代品并不太难,它在所有平台上都以相同的方式工作.我在这里发布了一个.

哇,现在 MS 是如何开放,我喜欢它,请参阅:https://github.com/MicrosoftDocs/cpp-docs/issues/308

<小时>

让我们从头开始.cppreference 有这样的说法(强调和删除我的):

<块引用>

模板函数async异步运行函数f(potentially可选地在一个单独的线程它可能是一个线程的一部分池).

但是,C++ 标准是这样说的:

<块引用>

如果在 policy 中设置了 launch::async,[std::async] 会调用 [函数 f] 作为如果在一个新的执行线程中 ...

那么哪个是正确的?正如 OP 所发现的那样,这两个语句具有非常不同的语义.好吧,当然标准是正确的,正如 clang 和 gcc 所显示的那样,那么为什么 Windows 实现会有所不同呢?就像很多事情一样,这归结为历史.

(老式)链接那个MM疏浚有这个意思,其中包括:

<块引用>

... Microsoft 以 std::async].aspx" rel="noreferrer">PPL(并行模式库)...... [并且] 我能理解这些公司渴望通过std::async 访问这些库,特别是如果它们可以显着提高性能...

... Microsoft 希望在使用 launch_policy::async 调用时更改 std::async 的语义.讨论...(基本原理如下,如果您想了解更多,请阅读链接,这是非常值得的).

而 PPL 基于 Windows 对 ThreadPools,所以@super 是对的.

那么 Windows 线程池有什么作用,它有什么用呢?嗯,它旨在以有效的方式管理频繁调度的、短期运行的任务,因此第 1 点是不要滥用它,但我的简单测试表明,如果这是您的用例,那么它可以提供显着的效率.它基本上做了两件事

  • 它回收线程,而不必总是为您启动的每个异步任务启动一个新线程.
  • 它限制了它使用的后台线程总数,之后对 std::async 的调用将被阻塞,直到线程空闲.在我的机器上,这个数字是 768.

了解了所有这些,我们现在可以解释 OP 的观察结果:

  1. main() 启动的三个任务中的每一个创建一个新线程(因为它们都不会立即终止).

  2. 这三个线程中的每一个都创建了一个新的线程局部变量Foo some_thread_var.

  3. 这三个任务都运行到完成,但它们运行的??线程仍然存在(休眠).

  4. 然后程序休眠一小会然后退出,留下 3 个线程局部变量未被破坏.

我进行了许多测试,除此之外,我还发现了一些关键信息:

  • 当线程被回收时,线程局部变量被重新使用.具体来说,它们不会销毁然后重新创建(您已被警告!).
  • 如果所有异步任务都完成并且您等待的时间足够长,线程池将终止所有关联的线程,然后销毁线程局部变量.(毫无疑问,实际规则比这更复杂,但这是我观察到的).
  • 随着新的异步任务被提交,线程池限制了创建新线程的速率,希望在它需要执行所有工作(创建新线程价格昂贵).因此,对 std::async 的调用可能需要一段时间才能返回(在我的测试中最多 300 毫秒).与此同时,它只是在附近闲逛,希望它的船能进来.这种行为已被记录在案,但我在此指出,以防它让您大吃一惊.

结论:

  1. Microsoft 对 std::async 的实现不符合标准,但它显然是为特定目的而设计的,该目的是为了充分利用 Win32 ThreadPool API.你可以因为他们公然蔑视标准而殴打他们,但这种方式已经存在很长时间了,他们可能有(重要的!)客户依赖它.我会要求他们在他们的文档中指出这一点.不做那就是犯罪.

  2. 在 Windows 上的 std::async 任务中使用 thread_local 变量不安全.只是不要这样做,它会以眼泪结束.

The following code

#include <iostream>
#include <future>
#include <thread>
#include <mutex>

std::mutex m;

struct Foo {
    Foo() {
        std::unique_lock<std::mutex> lock{m};
        std::cout <<"Foo Created in thread " <<std::this_thread::get_id() <<"
";
    }

    ~Foo() {
        std::unique_lock<std::mutex> lock{m};
        std::cout <<"Foo Deleted in thread " <<std::this_thread::get_id() <<"
";
    }

    void proveMyExistance() {
        std::unique_lock<std::mutex> lock{m};
        std::cout <<"Foo this = " << this <<"
";
    }
};

int threadFunc() {
    static thread_local Foo some_thread_var;

    // Prove the variable initialized
    some_thread_var.proveMyExistance();

    // The thread runs for some time
    std::this_thread::sleep_for(std::chrono::milliseconds{100}); 

    return 1;
}

int main() {
    auto a1 = std::async(std::launch::async, threadFunc);
    auto a2 = std::async(std::launch::async, threadFunc);
    auto a3 = std::async(std::launch::async, threadFunc);

    a1.wait();
    a2.wait();
    a3.wait();

    std::this_thread::sleep_for(std::chrono::milliseconds{1000});        

    return 0;
}

Compiled and run width clang in macOS:

clang++ test.cpp -std=c++14 -pthread
./a.out

Got result

Foo Created in thread 0x70000d9f2000
Foo Created in thread 0x70000daf8000
Foo Created in thread 0x70000da75000
Foo this = 0x7fd871d00000
Foo this = 0x7fd871c02af0
Foo this = 0x7fd871e00000
Foo Deleted in thread 0x70000daf8000
Foo Deleted in thread 0x70000da75000
Foo Deleted in thread 0x70000d9f2000

Compiled and run in Visual Studio 2015 Update 3:

Foo Created in thread 7180
Foo this = 00000223B3344120
Foo Created in thread 8712
Foo this = 00000223B3346750
Foo Created in thread 11220
Foo this = 00000223B3347E60

Destructor are not called.

Is this a bug or some undefined grey zone?

P.S.

If the sleep std::this_thread::sleep_for(std::chrono::milliseconds{1000}); at the end is not long enough, you may not see all 3 "Delete" messages sometimes.

When using std::thread instead of std::async, the destructors get called on both platform, and all 3 "Delete" messages will always be printed.

解决方案

Introductory Note: I have now learned a lot more about this and have therefore re-written my answer. Thanks to @super, @M.M and (latterly) @DavidHaim and @NoSenseEtAl for putting me on the right track.

tl;dr Microsoft's implementation of std::async is non-conformant, but they have their reasons and what they have done can actually be useful, once you understand it properly.

For those who don't want that, it is not too difficult to code up a drop-in replacement replacement for std::async which works the same way on all platforms. I have posted one here.

Edit: Wow, how open MS are being these days, I like it, see: https://github.com/MicrosoftDocs/cpp-docs/issues/308


Let's being at the beginning. cppreference has this to say (emphasis and strikethrough mine):

The template function async runs the function f asynchronously (potentially optionally in a separate thread which may be part of a thread pool).

However, the C++ standard says this:

If launch::async is set in policy, [std::async] calls [the function f] as if in a new thread of execution ...

So which is correct? The two statements have very different semantics as the OP has discovered. Well of course the standard is correct, as both clang and gcc show, so why does the Windows implementation differ? And like so many things, it comes down to history.

The (oldish) link that M.M dredged up has this to say, amongst other things:

... Microsoft has its implementation of [std::async] in the form of PPL (Parallel Pattern Library) ... [and] I can understand the eagerness of those companies to bend the rules and make these libraries accessible through std::async, especially if they can dramatically improve performance...

... Microsoft wanted to change the semantics of std::async when called with launch_policy::async. I think this was pretty much ruled out in the ensuing discussion ... (rationale follows, if you want to know more then read the link, it's well worth it).

And PPL is based on Windows' built-in support for ThreadPools, so @super was right.

So what does the Windows thread pool do and what is it good for? Well, it's intended to manage frequently-sheduled, short-running tasks in an efficient way so point 1 is don't abuse it, but my simple tests show that if this is your use-case then it can offer significant efficiencies. It does, essentially, two things

  • It recycles threads, rather than having to always start a new one for each asynchronous task you launch.
  • It limits the total number of background threads it uses, after which a call to std::async will block until a thread becomes free. On my machine, this number is 768.

So knowing all that, we can now explain the OP's observations:

  1. A new thread is created for each of the three tasks started by main() (because none of them terminates immediately).

  2. Each of these three threads creates a new thread-local variable Foo some_thread_var.

  3. These three tasks all run to completion but the threads they are running on remain in existence (sleeping).

  4. The program then sleeps for a short while and then exits, leaving the 3 thread-local variables un-destructed.

I ran a number of tests and in addition to this I found a few key things:

  • When a thread is recycled, the thread-local variables are re-used. Specifically, they are not destroyed and then re-created (you have been warned!).
  • If all the asynchonous tasks complete and you wait long enough, the thread pool terminates all the associated threads and the thread-local variables are then destroyed. (No doubt the actual rules are more complex than that but that's what I observed).
  • As new asynchonous tasks are submitted, the thread pool limits the rate at which new threads are created, in the hope that one will become free before it needs to perform all that work (creating new threads is expensive). A call to std::async might therefore take a while to return (up to 300ms in my tests). In the meantime, it's just hanging around, hoping that its ship will come in. This behaviour is documented but I call it out here in case it takes you by surprise.

Conclusions:

  1. Microsoft's implementation of std::async is non-conformant but it is clearly designed with a specific purpose, and that purpose is to make good use of the Win32 ThreadPool API. You can beat them up for blantantly flouting the standard but it's been this way for a long time and they probably have (important!) customers who rely on it. I will ask them to call this out in their documentation. Not doing that is criminal.

  2. It is not safe to use thread_local variables in std::async tasks on Windows. Just don't do it, it will end in tears.

相关文章