计算LogicalMatrix R/C++/RCPP全真行的最快方法

2022-06-06 00:00:00 matrix performance r c++ rcpp

我需要计算LogicalMatrix中全部`TRUE`的行数。

因为我需要在相对固定的基础上进行1-2.5亿次速度确实很重要：

我目前最好的：

我认为如何执行此操作的最有效/最快的单进程方法是在多少RCPP函数(hm2)中。

我有限的分析能力表明，大部分时间都花在了if(r_tll == xcolls){...上。我似乎想不出比这更快的其他算法(我尝试过在找到FALSE后立即中断循环，但速度要慢得多)。

可以假定的详细信息：

我可以假设：

矩阵的行数始终少于1,000万行。
来自上游的所有输出矩阵将具有相同数量的COL(对于给定的会话/进程/线程)。
每个矩阵的协议数永远不会超过2326个。

最小示例：

m <- matrix(sample(c(T,F),50000*10, replace = T),ncol = 10L)
head(m)
#>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
#> [1,] FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
#> [2,] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE
#> [3,] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
#> [4,]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE
#> [5,]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
#> [6,] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

  // [[Rcpp::export]]
int hm(const LogicalMatrix& x){
  const int xrows = x.nrow();
  const int xcols = x.ncol();
  int n_all_true = 0;

  for(size_t row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(size_t col = 0; col < xcols; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == xcols){
      n_all_true++;
    }
  }
  return n_all_true;
}

我不明白为什么，但在我的机器上，如果我烘焙COLE的数量会更快(如果有人能解释为什么会这样也很好)：

// [[Rcpp::export]]
int hm2(const LogicalMatrix& x){
  const int xrows = x.nrow();
  // const int xcols = x.ncol();
  int n_all_true = 0;

  for(size_t row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(size_t col = 0; col < 10; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == 10){
      n_all_true += 1;
    }
  }
  return n_all_true;
}

计时：

microbenchmark(hm(m), hm2(m), times = 1000)
#>  Unit: microseconds
#>   expr     min       lq     mean  median       uq      max neval
#>  hm(m) 597.828 599.0995 683.3482 605.397 643.8655 1659.711  1000
#> hm2(m) 236.847 237.6565 267.8787 238.748 253.5280  683.221  1000

解决方案

以下是您的函数，以及通过cppFunction编译它的输出：

require(Rcpp)
cppFunction('int hm(const LogicalMatrix& x)
{
  const int xrows = x.nrow();
  const int xcols = x.ncol();
  int n_all_true = 0;

  for(size_t row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(size_t col = 0; col < xcols; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == xcols){
      n_all_true++;
    }
  }
  return n_all_true;
}')
# file.*.cpp: In function ‘int hm(const LogicalMatrix&)’:
# file.*.cpp:12:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
#    for(size_t row = 0; row < xrows; row++) {
#                              ^
# file.*.cpp:14:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
#      for(size_t col = 0; col < xcols; col++) {
#                                ^

注意警告。通过对row和col使用int而不是size_t，我可以获得一些改进。除此之外，我找不到太大的改进空间。

下面是我的代码、基准和可重现的示例：

require(Rcpp)
require(microbenchmark)

cppFunction('int hm_jmu(const LogicalMatrix& x)
{
  const int xrows = x.nrow();
  const int xcols = x.ncol();
  int n_all_true = 0;

  for(int row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(int col = 0; col < xcols; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == xcols){
      n_all_true++;
    }
  }
  return n_all_true;
}')

hm3 <- function(m) {
  nc <- ncol(m)
  sum(rowSums(m) == nc)
}

set.seed(21)
m <- matrix(sample(c(T,F),50000*10, replace = T),ncol = 10L)
microbenchmark(hm(m), hm3(m), hm_jmu(m), times=1000)
# Unit: microseconds
#       expr      min        lq   median        uq       max neval
#      hm(m)  578.844  594.1460  607.357  636.4410   858.347  1000
#     hm3(m) 6389.014 6452.9595 6476.197 6735.5465 33720.870  1000
#  hm_jmu(m)  409.920  415.0395  424.401  449.0075   650.127  1000

相关文章