是否可以将硬件解复用用于高负载网络服务器?
例如,对于使用 TCP/IP(使用 POSIX poll/select 或更高级的 epoll、kqueue、poll_set、IOCP)的异步 IO,网络驱动程序通过不同(硬件解复用器) CPU 核心,接收消息并将它们转储到内核级别的单个(多路复用器)缓冲区中.然后,我们的线程接受器通过使用 epoll/kqueue/poll_set/IOCP 从这个单个缓冲区接收一个消息套接字的描述符列表,这些消息的套接字来来往往分散(多路分解器)跨线程(在线程池中)) 在不同的 CPU 核心上运行.
For example, for an asynchronous IO by using TCP/IP (using POSIX poll/select or more advanced epoll, kqueue, poll_set, IOCP), network driver starts by an interruption in different (hardware demultiplexer) CPU-cores, receives messages and dump them into a single (multiplexer) buffer at the kernel level. Then, our thread-acceptor by using epoll / kqueue / poll_set / IOCP receives from this single buffer a list of descriptors of sockets of messages which came and again scatters (demultiplexer) across threads (in thread-pool) running on different CPU-cores.
简而言之,方案看起来像:硬件中断(硬件解复用器)->内核空间中的网络驱动程序(多路复用器)->用户空间中的用户接受器通过使用epoll/kqueue/poll_set/IOCP(解复用器)
In short scheme looks like: hardware interruption (hardware demultiplexor) -> network driver in kernel space (multiplexor) -> user's acceptor in user space by using epoll / kqueue / poll_set / IOCP (demultiplexor)
去掉最后两个链接,只用硬件解复用器"不是更方便快捷吗?
Is not it easier and faster, to get rid of the last two links, and use only the "hardware demultiplexor"?
一个例子.如果网络数据包到达,网卡将中断 CPU.在当今的大多数系统上,这些中断分布在内核之间.IE.这项工作是一个硬件解复用器.收到这样的中断后,我们可以立即处理这个网络的消息,等待下一次中断.解复用的所有工作都是在硬件级别通过使用 CPU 中断完成的.
An example. If a network packet arrives, the network card will interrupt the CPU. On most systems today, these interrupts are distributed across cores. I.e. this work is a hardware demultiplexer. After receiving such an interruption, we can immediately process this network's message and wait for the next interrupt. All work for demultiplexing is done at the level of the hardware, by using a CPU interrupt.
在 Cortex-A5 MPCore 中:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0434b/CCHDBEBE.html
In Cortex-A5 MPCore: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0434b/CCHDBEBE.html
在所有 Linux 中是否可行,例如 QNX 中的实时 *nix,是否有使用这种方法的公共项目,可能是 ngnix?
Is it feasible an approach in all of Linux, in real-time *nix such as QNX, and are there public projects where this approach is used, may be ngnix?
更新:
对我的问题的简单回答 - 是的,我可以使用硬件解复用,方法是使用 /proc/irq/<N>/smp_affinity
:http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-在 linux 中处理
Simple answer to my question - yes I can use hardware demultiplexing by using /proc/irq/<N>/smp_affinity
: http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
但是第二个注意 - 这不是一件好事,因为一个数据包的不同部分可以由不同的内核处理,并且缓存同步可能需要时间(L1(CoreX)-> L3-> L1(CoreY))缓存一致性:http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing
But second notice - it is not such a good thing, because different part of one packet can handled by different cores, and it can take time to cache synchronization (L1(CoreX)->L3->L1(CoreY)) for cache coherency: http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing
解决方案:
- 将不同的以太网适配器(其 IRQ)硬绑定到不同的单个 CPU 内核
- 使用大数据包和小消息,当数据包通常完全包含整个消息时
问题:但是可能有一些更好的解决方案,例如当我们手动从网络适配器收到一批网络数据包时使用软 IRQ(没有硬件 IRQ),有吗?
QUESTION: But may be there are some better solutions, an example using soft-IRQ (without hardware-IRQ) when we recieve a batch of some network packets from network adapter manualy, are there?
推荐答案
简单回答我的问题 - 是的,我可以使用硬件多路分解,通过使用 /proc/irq/
:http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
Simple answer to my question - yes I can use hardware demultiplexing by using /proc/irq/<N>/smp_affinity
: http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
但是第二个注意 - 这不是一件好事,因为一个数据包的不同部分可以由不同的内核处理,并且缓存同步可能需要时间(L1(CoreX)-> L3-> L1(CoreY))缓存一致性:http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing
But second notice - it is not such a good thing, because different part of one packet can handled by different cores, and it can take time to cache synchronization (L1(CoreX)->L3->L1(CoreY)) for cache coherency: http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing
解决方案:
- 将不同的以太网适配器(其 IRQ)硬绑定到不同的单个 CPU 内核
- 使用大数据包和小消息,当数据包通常完全包含整个消息时
相关文章