使用 Boost.Lockfree 队列比使用互斥锁慢
直到现在我在我的项目中使用 std::queue
.我测量了此队列上的特定操作所需的平均时间.
Until now I was using std::queue
in my project. I measured the average time which a specific operation on this queue requires.
时间是在 2 台机器上测量的:我的本地 Ubuntu VM 和远程服务器.使用 std::queue
,两台机器上的平均值几乎相同:~750 微秒.
The times were measured on 2 machines: My local Ubuntu VM and a remote server.
Using std::queue
, the average was almost the same on both machines: ~750 microseconds.
然后我将std::queue
升级"到boost::lockfree::spsc_queue
,这样我就可以摆脱保护队列的互斥锁.在我的本地 VM 上,我可以看到巨大的性能提升,现在平均为 200 微秒.然而,在远程机器上,平均值上升到 800 微秒,比以前慢.
Then I "upgraded" the std::queue
to boost::lockfree::spsc_queue
, so I could get rid of the mutexes protecting the queue. On my local VM I could see a huge performance gain, the average is now on 200 microseconds. On the remote machine however, the average went up to 800 microseconds, which is slower than it was before.
首先我认为这可能是因为远程机器可能不支持无锁实现:
First I thought this might be because the remote machine might not support the lock-free implementation:
来自 Boost.Lockfree 页面:
并非所有硬件都支持相同的原子指令集.如果它在硬件中不可用,则可以使用防护在软件中进行模拟.然而,这有一个明显的缺点,即失去了无锁属性.
Not all hardware supports the same set of atomic instructions. If it is not available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the lock-free property.
为了查明是否支持这些指令,boost::lockfree::queue
有一个名为 bool is_lock_free(void) const;
的方法.但是,boost::lockfree::spsc_queue
没有这样的功能,对我来说,这意味着它不依赖于硬件,并且始终是无锁的 - 在任何机器上.
To find out if these instructions are supported, boost::lockfree::queue
has a method called bool is_lock_free(void) const;
.
However, boost::lockfree::spsc_queue
does not have a function like this, which, for me, implies that it does not rely on the hardware and that is is always lockfree - on any machine.
性能下降的原因可能是什么?
What could be the reason for the performance loss?
// c++11 compiler and boost library required
#include <iostream>
#include <cstdlib>
#include <chrono>
#include <async>
#include <thread>
/* Using blocking queue:
* #include <mutex>
* #include <queue>
*/
#include <boost/lockfree/spsc_queue.hpp>
boost::lockfree::spsc_queue<int, boost::lockfree::capacity<1024>> queue;
/* Using blocking queue:
* std::queue<int> queue;
* std::mutex mutex;
*/
int main()
{
auto producer = std::async(std::launch::async, [queue /*,mutex*/]()
{
// Producing data in a random interval
while(true)
{
/* Using the blocking queue, the mutex must be locked here.
* mutex.lock();
*/
// Push random int (0-9999)
queue.push(std::rand() % 10000);
/* Using the blocking queue, the mutex must be unlocked here.
* mutex.unlock();
*/
// Sleep for random duration (0-999 microseconds)
std::this_thread::sleep_for(std::chrono::microseconds(rand() % 1000));
}
}
auto consumer = std::async(std::launch::async, [queue /*,mutex*/]()
{
// Example operation on the queue.
// Checks if 1234 was generated by the producer, returns if found.
while(true)
{
/* Using the blocking queue, the mutex must be locked here.
* mutex.lock();
*/
int value;
while(queue.pop(value)
{
if(value == 1234)
return;
}
/* Using the blocking queue, the mutex must be unlocked here.
* mutex.unlock();
*/
// Sleep for 100 microseconds
std::this_thread::sleep_for(std::chrono::microseconds(100));
}
}
consumer.get();
std::cout << "1234 was generated!" << std::endl;
return 0;
}
推荐答案
无锁算法通常比基于锁的算法性能更差.这是它们几乎没有被频繁使用的一个关键原因.
Lock free algorithms generally perform more poorly than lock-based algorithms. That's a key reason they're not used nearly as frequently.
无锁算法的问题在于它们通过允许竞争线程继续竞争来最大化竞争.锁通过取消对竞争线程的调度来避免争用.无锁算法,大致来说,应该只在不可能对竞争线程进行调度时使用.这很少适用于应用程序级代码.
The problem with lock free algorithms is that they maximize contention by allowing contending threads to continue to contend. Locks avoid contention by de-scheduling contending threads. Lock free algorithms, to a first approximation, should only be used when it's not possible to de-schedule contending threads. That only rarely applies to application-level code.
让我给你一个非常极端的假设.想象一下,四个线程正在典型的现代双核 CPU 上运行.线程 A1 和 A2 正在操作集合 A.线程 B1 和 B2 正在操作集合 B.
Let me give you a very extreme hypothetical. Imagine four threads are running on a typical, modern dual-core CPU. Threads A1 and A2 are manipulating collection A. Threads B1 and B2 are manipulating collection B.
首先,让我们假设集合使用锁.这意味着如果线程 A1 和 A2(或 B1 和 B2)尝试同时运行,其中一个将被锁阻塞.因此,很快,一个 A 线程和一个 B 线程将运行.这些线程将运行得非常快并且不会竞争.任何时候线程试图竞争,冲突的线程都会被取消调度.耶.
First, let's imagine the collection uses locks. That will mean that if threads A1 and A2 (or B1 and B2) try to run at the same time, one of them will get blocked by the lock. So, very quickly, one A thread and one B thread will be running. These threads will run very quickly and will not contend. Any time threads try to contend, the conflicting thread will get de-scheduled. Yay.
现在,假设集合不使用锁.现在,线程 A1 和 A2 可以同时运行.这将导致持续的争用.集合的缓存线将在两个核心之间来回切换.核心间总线可能会饱和.性能会很糟糕.
Now, imagine the collection uses no locks. Now, threads A1 and A2 can run at the same time. This will cause constant contention. Cache lines for the collection will ping-pong between the two cores. Inter-core buses may get saturated. Performance will be awful.
再说一次,这太夸张了.但是你明白了.您想避免争用,而不是尽可能多地受苦.
Again, this is highly exaggerated. But you get the idea. You want to avoid contention, not suffer through as much of it as possible.
但是,现在再次运行这个思想实验,其中 A1 和 A2 是整个系统上仅有的线程.现在,无锁集合可能更好(尽管您可能会发现在这种情况下最好只有一个线程!).
However, now run this thought experiment again where A1 and A2 are the only threads on the entire system. Now, the lock free collection is probably better (though you may find that it's better just to have one thread in that case!).
几乎每个程序员都经历过这样一个阶段,他们认为锁是坏的,而避免锁会使代码运行得更快.最终,他们意识到是争用使事情变慢,锁定,如果使用得当,可以最大限度地减少争用.
Almost every programmer goes through a phase where they think that locks are bad and avoiding locks makes code go faster. Eventually, they realize that it's contention that makes things slow and locks, used correctly, minimize contention.
相关文章