ndarray 比 recarray 访问快吗?
问题描述
我能够将我的 recarray 数据复制到 ndarray,进行一些计算并返回带有更新值的 ndarray.
I was able to copy my recarray data to a ndarray, do some calculations and return the ndarray with updated values.
然后,我在 numpy.lib.recfunctions
中发现了 append_fields()
功能,并认为将 2 个字段简单地附加到我原来的 recarray 会更聪明保存我的计算值.
Then, I discovered the append_fields()
capability in numpy.lib.recfunctions
, and thought it would be a lot smarter to simply append 2 fields to my original recarray to hold my calculated values.
当我这样做时,我发现操作要慢得多.我不需要计时,基于 ndarray 的过程需要几秒钟,而使用 recarray 需要一分钟以上,而且我的测试数组很小,<10,000 行.
When I did this, I found the operation was much, much slower. I didn't have to time it, the ndarray based process takes a few seconds compared to a minute+ with recarray and my test arrays are small, <10,000 rows.
这是典型的吗?ndarray 访问比 recarray 快得多?我预计会由于按字段名称访问而导致性能下降,但不会这么严重.
Is this typical? ndarray access is much faster than recarray? I expected some performance degradation due to access by field name, but not this much.
解决方案
2018 年 11 月 15 日更新
我扩展了我的时序测试,以阐明 ndarray、结构化数组、recarray 和掩码数组(记录数组的类型?)的性能差异.每个都有细微的差别.请参阅此处的讨论:
numpy-discussion:structured-arrays-recarrays-and-record-arrays
这是我的性能测试结果.我构建了一个非常简单的示例(使用我的 HDF5 数据集之一)来比较存储在 4 种类型数组中的相同数据的性能:ndarray、结构化数组、recarray 和掩码数组.在构造数组之后,它们被传递给一个函数,该函数简单地遍历每一行并从每一行中提取 12 个值.这些函数从 timeit
函数调用一次(数字=1).该测试只测量数组读取函数,并避免所有其他计算.
下面给出了 9,000 行的结果:
Here are result of my performance tests. I built a very simple example (using 1 of my HDF5 data sets) to compare performance with the same data stored in 4 types of arrays: ndarray, structured array, recarray and masked array. After the arrays are constructed, they are passed to a function that simply loops thru each row and extracts 12 values from each row. The functions are called from the timeit
function with a single pass (number=1). This test only measures the array read function, and avoids all other calculations.
Results given below for 9,000 rows:
for ndarray: 0.034137165047070615
for structured array: 0.1306827116913577
for recarray: 0.446010040784266
for masked array: 31.33269560998199
根据此测试,访问性能随每种类型而降低.结构化数组和 recarray 的访问时间比 ndarray 访问慢 4 到 13 倍(但都只有几分之一秒).但是,ndarray 访问比掩码数组访问快 1000 倍.这解释了我在完整示例中看到的秒到分钟的差异.希望这些数据对遇到此问题的其他人有用.
Based on this test, access performance decreases with each type. Access times for structured array and recarray are 4x-13x slower than ndarray access (but all are only a fraction of second). However, ndarray access is 1000x faster than masked array access. That explains the seconds to minutes difference I see in my complete example. Hopefully this data is useful to others that encounter this issue.
相关文章