无堆栈协程与堆栈式协程有何不同?

背景:

我之所以这么问是因为我目前有一个包含许多(数百到数千)个线程的应用程序.大多数线程在大部分时间都处于空闲状态,等待将工作项放入队列中.当工作项可用时,它会通过调用一些任意复杂的现有代码进行处理.在某些操作系统配置上,应用程序会遇到控制最大用户进程数的内核参数,因此我想尝试减少工作线程数的方法.

I'm asking this because I currently have an application with many (hundreds to thousands) of threads. Most of those threads are idle a great portion of the time, waiting on work items to be placed in a queue. When a work item comes available, it is then processed by calling some arbitrarily-complex existing code. On some operating system configurations, the application bumps up against kernel parameters governing the maximum number of user processes, so I'd like to experiment with means to reduce the number of worker threads.

我提出的解决方案:

这似乎是一种基于协程的方法,我将每个工作线程替换为一个协程,这将有助于实现这一点.然后我可以有一个由实际(内核)工作线程池支持的工作队列.当一个项目被放置在特定协程的队列中进行处理时,一个条目将被放入线程池的队列中.然后它会恢复相应的协程,处理它的排队数据,然后再次挂起它,释放工作线程做其他工作.

It seems like a coroutine-based approach, where I replace each worker thread with a coroutine, would help to accomplish this. I can then have a work queue backed by a pool of actual (kernel) worker threads. When an item is placed in a particular coroutine's queue for processing, an entry would be placed into the thread pool's queue. It would then resume the corresponding coroutine, process its queued data, and then suspend it again, freeing up the worker thread to do other work.

实施细节:

在考虑如何做到这一点时,我无法理解无堆栈和有堆栈协程之间的功能差异.我有一些使用 Boost.Coroutine 的堆栈协程的经验图书馆.我发现从概念层面理解它相对容易:对于每个协程,它维护 CPU 上下文和堆栈的副本,??当您切换到协程时,它会切换到保存的上下文(就像内核模式调度程序会).

In thinking about how I would do this, I'm having trouble understanding the functional differences between stackless and stackful coroutines. I have some experience using stackful coroutines using the Boost.Coroutine library. I find it's relatively easy to comprehend from a conceptual level: for each coroutine, it maintains a copy of the CPU context and stack, and when you switch to a coroutine, it switches to that saved context (just like a kernel-mode scheduler would).

我不太清楚的是无堆栈协程与此有何不同.在我的应用程序中,与上述工作项排队相关的开销量非常重要.我见过的大多数实现,比如新的 CO2 库都表明无堆栈协程提供了更低开销的上下文切换.

What is less clear to me is how a stackless coroutine differs from this. In my application, the amount of overhead associated with the above-described queuing of work items is very important. Most implementations that I've seen, like the new CO2 library suggest that stackless coroutines provide much lower-overhead context switches.

因此,我想更清楚地了解stackless和stackful协程之间的功能差异.具体来说,我想到了这些问题:

Therefore, I'd like to understand the functional differences between stackless and stackful coroutines more clearly. Specifically, I think of these questions:

  • 像这样的参考文献表明区别在于您可以在堆栈式与无堆栈式协程中产生/恢复的位置.是这种情况吗?有没有一个简单的例子可以说明我可以在堆栈协程中执行但不能在无堆栈协程中执行的操作?

  • References like this one suggest that the distinction lies in where you can yield/resume in a stackful vs. stackless coroutine. Is this the case? Is there a simple example of something that I can do in a stackful coroutine but not in a stackless one?

使用自动存储变量(即堆栈上"的变量)是否有任何限制?

Are there any limitations on the use of automatic storage variables (i.e. variables "on the stack")?

我可以从无堆栈协程调用哪些函数有任何限制吗?

Are there any limitations on what functions I can call from a stackless coroutine?

如果一个无栈协程没有保存栈上下文,那么协程运行时自动存储变量去哪里了?

If there is no saving of stack context for a stackless coroutine, where do automatic storage variables go when the coroutine is running?

推荐答案

首先感谢您查看 CO2 :)

First, thank you for taking a look at CO2 :)

Boost.Coroutine doc很好的描述了stackful协程的优点:

The Boost.Coroutine doc describes the advantage of stackful coroutine well:

堆叠性

相对于无栈协程有栈协程可以从嵌套的堆栈框架内暂停.执行在与之前暂停的代码中的点完全相同.和无堆栈协程,只有顶级例程可能会被挂起.由该顶级例程调用的任何例程本身可能不会挂起.这禁止在例程中提供挂起/恢复操作一个通用库.

In contrast to a stackless coroutine a stackful coroutine can be suspended from within a nested stackframe. Execution resumes at exactly the same point in the code where it was suspended before. With a stackless coroutine, only the top-level routine may be suspended. Any routine called by that top-level routine may not itself suspend. This prohibits providing suspend/resume operations in routines within a general-purpose library.

一流的延续

一流的延续可以作为一个参数,由函数返回并存储在数据结构中稍后使用.在某些实现中(例如 C# yield)continuation 不能直接访问或直接操作.

A first-class continuation can be passed as an argument, returned by a function and stored in a data structure to be used later. In some implementations (for instance C# yield) the continuation can not be directly accessed or directly manipulated.

没有堆栈性和一流的语义,一些有用的执行不能支持控制流(例如合作多任务或检查点).

Without stackfulness and first-class semantics, some useful execution control flows cannot be supported (for instance cooperative multitasking or checkpointing).

这对你来说意味着什么?例如,假设您有一个接收访问者的函数:

What does that mean to you? for example, imagine you have a function that takes a visitor:

template<class Visitor>
void f(Visitor& v);

你想把它转化为迭代器,用stackful协程,你可以:

You want to transform it to iterator, with stackful coroutine, you can:

asymmetric_coroutine<T>::pull_type pull_from([](asymmetric_coroutine<T>::push_type& yield)
{
    f(yield);
});

但是对于无堆栈协程,没有办法这样做:

But with stackless coroutine, there's no way to do so:

generator<T> pull_from()
{
    // yield can only be used here, cannot pass to f
    f(???);
}

总的来说,stackful协程比stackless协程更强大.那么为什么我们需要无堆栈协程呢?简短回答:效率.

In general, stackful coroutine is more powerful than stackless coroutine. So why do we want stackless coroutine? short answer: efficiency.

Stackful 协程通常需要分配一定数量的内存来容纳其运行时堆栈(必须足够大),并且与无堆栈的上下文切换相比,上下文切换更昂贵,例如Boost.Coroutine 需要 40 个周期,而 CO2 在我的机器上平均只需要 7 个周期,因为无堆栈协程唯一需要恢复的是程序计数器.

Stackful coroutine typically needs to allocate a certain amount of memory to accomodate its runtime-stack (must be large enough), and the context-switch is more expensive compared to the stackless one, e.g. Boost.Coroutine takes 40 cycles while CO2 takes just 7 cycles in average on my machine, because the only thing that a stackless coroutine needs to restore is the program counter.

也就是说,在语言支持下,只要协程中没有递归,堆栈式协程可能也可以利用编译器计算的最大堆栈大小的优势,因此也可以提高内存使用率.

That said, with language support, probably stackful coroutine can also take the advantage of the compiler-computed max-size for the stack as long as there's no recursion in the coroutine, so the memory usage can also be improved.

说到stackless coroutine,请记住,这并不意味着根本没有运行时堆栈,它仅表示它使用与主机端相同的运行时堆栈,因此您也可以调用递归函数,只是所有的递归都将发生在主机的运行时堆栈上.相比之下,堆栈式协程,当你调用递归函数时,递归会发生在协程自己的堆栈上.

Speaking of stackless coroutine, bear in mind that it doesn't mean that there's no runtime-stack at all, it only means that it uses the same runtime-stack as the host side, so you can call recursive functions as well, just that all the recursions will happen on the host's runtime-stack. In contrast, with stackful coroutine, when you call recursive functions, the recursions will happen on the coroutine's own stack.

回答问题:

  • 使用自动存储变量有什么限制吗(即变量在堆栈上")?

没有.这是 CO2 的模拟限制.有了语言支持,自动存储变量对协程可见将被放置在协程的内部存储中.注意我强调对协程可见",如果协程调用一个在内部使用自动存储变量的函数,那么这些变量将被放置在运行时堆栈上.更具体地说,无堆栈协程只需保留恢复后可以使用的变量/临时变量.

No. It's the emulation limitation of CO2. With language support, the automatic storage variables visible to the coroutine will be placed on the coroutine's internal storage. Note my emphasis on "visible to the coroutine", if the coroutine calls a function that uses automatic storage variables internally, then those variables will be placed on the runtime-stack. More specifically, stackless coroutine only has to preserve the variables/temporaries that can be used after resumed.

明确地说,您也可以在 CO2 的协程主体中使用自动存储变量:

To be clear, you can use automatic storage variables in CO2's coroutine body as well:

auto f() CO2_RET(co2::task<>, ())
{
    int a = 1; // not ok
    CO2_AWAIT(co2::suspend_always{});
    {
        int b = 2; // ok
        doSomething(b);
    }
    CO2_AWAIT(co2::suspend_always{});
    int c = 3; // ok
    doSomething(c);
} CO2_END

只要定义不在任何await之前.

As long as the definition does not precede any await.

  • 对我可以从无堆栈协程?

没有

  • 如果没有为无堆栈协程保存堆栈上下文,协程时自动存储变量去哪里跑步?

上面已经回答了,无堆栈协程不关心被调用函数中使用的自动存储变量,它们只会被放置在正常的运行时堆栈上.

Answered above, a stackless coroutine doesn't care about the automatic storage variables used in the called functions, they'll just be placed on the normal runtime-stack.

如果您有任何疑问,只需检查 CO2 的源代码,它可能会帮助您了解引擎盖下的机制;)

If you have any doubt, just check CO2's source code, it may help you understand the mechanics under the hood ;)

相关文章