使用布尔值向量是否比动态位集慢?
使用布尔值向量是否比动态位集慢?
Is using a vector of boolean values slower than a dynamic bitset?
我刚刚听说了 boost 的动态位集,我想知道它是否值得麻烦.我可以只使用布尔值向量吗?
I just heard about boost's dynamic bitset, and I was wondering is it worth the trouble. Can I just use vector of boolean values instead?
推荐答案
这里很大程度上取决于您使用的布尔值的数量.
A great deal here depends on how many Boolean values you're working with.
bitset 和 vector
通常使用打包表示,其中布尔值仅存储为单个位.
Both bitset and vector<bool>
normally use a packed representation where a Boolean is stored as only a single bit.
一方面,这会以位操作的形式强加一些开销来访问单个值.
On one hand, that imposes some overhead in the form of bit manipulation to access a single value.
另一方面,这也意味着更多的布尔值将适合您的缓存.
On the other hand, that also means many more of your Booleans will fit in your cache.
如果您使用大量布尔值(例如,实现 Eratosthenes 筛),将更多的布尔值放入缓存中几乎总能获得净收益.内存使用的减少将比位操作带来的损失更多.
If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.
大多数反对 std::vector
的论点都归结为它不是标准容器(即它不满足容器的要求)这一事实.IMO,这主要是一个期望问题――因为它说 vector
,许多人希望它是一个容器(其他类型的向量是),他们经常对 vector
不是容器.
Most of the arguments against std::vector<bool>
come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector
, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool>
isn't a container.
如果您以某种方式真正需要将矢量用作容器,那么您可能希望使用其他组合――deque
或 vector<;char>
可以正常工作.在你这样做之前考虑 - 有很多(糟糕的,IMO)建议通常应该避免vector
,很少或根本没有解释为什么它应该完全避免,或者在什么情况下它会对你产生真正的影响.
If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool>
or vector<char>
can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool>
should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.
是的,在某些情况下,其他方法会更好.如果您处于其中一种情况,使用其他方法显然是个好主意.但是,首先要确保您确实处于其中一种情况.任何告诉您(例如)Herb 说您应该使用 vector
"而没有对所涉及的权衡进行大量解释的人都不应该被信任.
Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>
" without a lot of explanation about the tradeoffs involved should not be trusted.
让我们举一个真实的例子.既然在评论中提到了,那么我们来考虑一下 Eratosthenes 的筛子:
Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:
#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>
unsigned long primes = 0;
template <class bool_t>
unsigned long sieve(unsigned max) {
std::vector<bool_t> sieve(max, false);
sieve[0] = sieve[1] = true;
for (int i = 2; i < max; i++) {
if (!sieve[i]) {
++primes;
for (int temp = 2 * i; temp < max; temp += i)
sieve[temp] = true;
}
}
return primes;
}
// Warning: auto return type will fail with older compilers
// Fine with g++ 5.1 and VC++ 2015 though.
//
template <class F>
auto timer(F f, int max) {
auto start = std::chrono::high_resolution_clock::now();
primes += f(max);
auto stop = std::chrono::high_resolution_clock::now();
return stop - start;
}
int main() {
using namespace std::chrono;
unsigned number = 100000000;
auto using_bool = timer(sieve<bool>, number);
auto using_char = timer(sieve<char>, number);
std::cout << "ignore: " << primes << "
";
std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "
";
std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "
";
}
我们使用了一个足够大的数组,我们可以预期它的很大一部分会占用主内存.为了确保在一个调用和另一个调用之间发生变化的唯一事情是使用 vector
与 >vector
.以下是一些结果.首先是 VC++ 2015:
We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char>
vs. vector<bool>
. Here are some results. First with VC++ 2015:
ignore: 34568730
Time using bool: 2623
Time using char: 3108
...然后是使用 g++ 5.1 的时间:
...then the time using g++ 5.1:
ignore: 34568730
Time using bool: 2359
Time using char: 3116
显然,vector
在这两种情况下都获胜――VC++ 的优势约为 15%,gcc 的优势超过 30%.另请注意,在这种情况下,我选择了以非常有利的方式显示 vector
的大小.例如,如果我将 number
从 100000000
减少到 10000000
,时间差会大:>
Obviously, the vector<bool>
wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char>
in quite favorable light. If, for example, I reduce number
from 100000000
to 10000000
, the time differential becomes much larger:
ignore: 3987474
Time using bool: 72
Time using char: 249
虽然我没有做很多工作来确认,但我猜想在这种情况下,使用 vector
的版本节省了足够的空间,数组完全适合缓存,而 vector
大到足以溢出缓存,并涉及大量的主内存访问.
Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool>
is saving enough space that the array fits entirely in the cache, while the vector<char>
is large enough to overflow the cache, and involve a great deal of main memory access.
相关文章