C++ 元组与结构
使用 std::tuple
和只使用数据的 struct
有什么区别吗?
Is there is any difference between using a std::tuple
and a data-only struct
?
typedef std::tuple<int, double, bool> foo_t;
struct bar_t {
int id;
double value;
bool dirty;
}
从网上查到的,发现主要有两个区别:struct
可读性更强,而tuple
有很多通用函数可以使用.是否应该有任何显着的性能差异?另外,数据布局是否相互兼容(可互换铸造)?
From what I have found online, I found that there are two major differences: the struct
is more readable, while the tuple
has many generic functions that can be used.
Should there be any significant performance difference?
Also, is the data layout compatible with each other (interchangeably casted)?
推荐答案
我们对元组和结构有类似的讨论,我在一位同事的帮助下编写了一些简单的基准测试,以确定元组之间在性能方面的差异和结构.我们首先从默认结构和元组开始.
We have a similar discussion about tuple and struct and I write some simple benchmarks with the help from one of my colleague to identify the differences in term of performance between tuple and struct. We first start with a default struct and a tuple.
struct StructData {
int X;
int Y;
double Cost;
std::string Label;
bool operator==(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
bool operator<(const StructData &rhs) {
return X < rhs.X || (X == rhs.X && (Y < rhs.Y || (Y == rhs.Y && (Cost < rhs.Cost || (Cost == rhs.Cost && Label < rhs.Label)))));
}
};
using TupleData = std::tuple<int, int, double, std::string>;
然后我们使用 Celero 来比较我们的简单结构和元组的性能.以下是使用 gcc-4.9.2 和 clang-4.0.0 收集的基准代码和性能结果:
We then use Celero to compare the performance of our simple struct and tuple. Below is the benchmark code and performance results collected using gcc-4.9.2 and clang-4.0.0:
std::vector<StructData> test_struct_data(const size_t N) {
std::vector<StructData> data(N);
std::transform(data.begin(), data.end(), data.begin(), [N](auto item) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, N);
item.X = dis(gen);
item.Y = dis(gen);
item.Cost = item.X * item.Y;
item.Label = std::to_string(item.Cost);
return item;
});
return data;
}
std::vector<TupleData> test_tuple_data(const std::vector<StructData> &input) {
std::vector<TupleData> data(input.size());
std::transform(input.cbegin(), input.cend(), data.begin(),
[](auto item) { return std::tie(item.X, item.Y, item.Cost, item.Label); });
return data;
}
constexpr int NumberOfSamples = 10;
constexpr int NumberOfIterations = 5;
constexpr size_t N = 1000000;
auto const sdata = test_struct_data(N);
auto const tdata = test_tuple_data(sdata);
CELERO_MAIN
BASELINE(Sort, struct, NumberOfSamples, NumberOfIterations) {
std::vector<StructData> data(sdata.begin(), sdata.end());
std::sort(data.begin(), data.end());
// print(data);
}
BENCHMARK(Sort, tuple, NumberOfSamples, NumberOfIterations) {
std::vector<TupleData> data(tdata.begin(), tdata.end());
std::sort(data.begin(), data.end());
// print(data);
}
使用 clang-4.0.0 收集的性能结果
Performance results collected with clang-4.0.0
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 196663.40000 | 5.08 |
Sort | tuple | Null | 10 | 5 | 0.92471 | 181857.20000 | 5.50 |
Complete.
以及使用 gcc-4.9.2 收集的性能结果
And performance results collected using gcc-4.9.2
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 219096.00000 | 4.56 |
Sort | tuple | Null | 10 | 5 | 0.91463 | 200391.80000 | 4.99 |
Complete.
从上面的结果我们可以清楚地看到
From the above results we can clearly see that
元组比默认结构体更快
clang 生成的二进制文件具有比 gcc 更高的性能.clang-vs-gcc 不是本次讨论的目的,所以我不会深入研究细节.
Binary produce by clang has higher performance that that of gcc. clang-vs-gcc is not the purpose of this discussion so I won't dive into the detail.
我们都知道写一个 == 或 <或 > 运算符为每个结构定义将是一项痛苦和错误的任务.让我们使用 std::tie 替换我们的自定义比较器并重新运行我们的基准测试.
We all know that writing a == or < or > operator for every single struct definition will be a painful and buggy task. Let replace our custom comparator using std::tie and rerun our benchmark.
bool operator<(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 200508.20000 | 4.99 |
Sort | tuple | Null | 10 | 5 | 0.90033 | 180523.80000 | 5.54 |
Complete.
现在我们可以看到使用 std::tie 使我们的代码更优雅,更难出错,但是,我们会损失大约 1% 的性能.我现在将继续使用 std::tie 解决方案,因为我还收到有关将浮点数与自定义比较器进行比较的警告.
Now we can see that using std::tie makes our code more elegant and it is harder to make mistake, however, we will loose about 1% performance. I will stay with the std::tie solution for now since I also receive a warning about comparing floating point numbers with the customized comparator.
到目前为止,我们还没有任何解决方案可以让我们的结构体代码运行得更快.让我们看看交换函数并重写它,看看我们是否可以获得任何性能:
Until now we have not has any solution to make our struct code run faster yet. Let take a look at the swap function and rewrite it to see if we can gain any performance:
struct StructData {
int X;
int Y;
double Cost;
std::string Label;
bool operator==(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
void swap(StructData & other)
{
std::swap(X, other.X);
std::swap(Y, other.Y);
std::swap(Cost, other.Cost);
std::swap(Label, other.Label);
}
bool operator<(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
};
使用 clang-4.0.0 收集的性能结果
Performance results collected using clang-4.0.0
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 176308.80000 | 5.67 |
Sort | tuple | Null | 10 | 5 | 1.02699 | 181067.60000 | 5.52 |
Complete.
以及使用gcc-4.9.2收集的性能结果
And the performance results collected using gcc-4.9.2
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 198844.80000 | 5.03 |
Sort | tuple | Null | 10 | 5 | 1.00601 | 200039.80000 | 5.00 |
Complete.
现在我们的结构体比元组略快(使用 clang 大约 3%,使用 gcc 不到 1%),但是,我们确实需要为所有结构体编写自定义交换函数.
Now our struct is slightly faster than that of a tuple now (around 3% with clang and less than 1% with gcc), however, we do need to write our customized swap function for all of our structs.
相关文章