cv::remap segfaults 与 std::thread

我收到以下简单代码的段错误:

I am getting segfault with the following simple code:

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"

#include <iostream>
#include <thread>
#include <unistd.h>

void run() {
    sleep(1);  // see below
    cv::Mat source(10, 10, CV_32FC1, -1);

    cv::Mat result(10, 10, CV_32FC1);
    cv::Mat trX(result.rows, result.cols, CV_32FC1, 5);
    cv::Mat trY(result.rows, result.cols, CV_32FC1, 5);

    cv::remap(source, result, trX, trY, cv::INTER_LINEAR, cv::BORDER_TRANSPARENT);
    std::cout << "done" << std::endl;
}

int main(int argc, char* argv[]) {
    std::thread t1(run);
    t1.join();
    std::thread t2(run);
    t2.join();
    return 0;
}

如果我直接从 main() 调用 run() 两次,根本不使用线程,它运行良好.如果我交换 t1.join();std::thread t2(run); (也就是说,在第一个线程完成之前启动第二个线程;这就是 sleep 变得很重要),它也运行良好.

If I call run() twice directly from main(), without using threads at all, it works well. If I swap t1.join(); and std::thread t2(run); (that is, start the second thread before the first finishes; this is where sleep becomes important), it also runs well.

此外,如果我将 main 更改为

Moreover, if I change main to

int main(int argc, char* argv[]) {
    std::thread t1(run);
    std::thread t2(run);
    t1.join();
    t2.join();
    std::thread t3(run);
    t3.join();
    return 0;
}

它在第三个线程中出现段错误,但(奇怪的是)并非总是如此:大约一次运行 2-3 次成功通过.但是,我无法成功运行具有上述两个线程的程序.

it segfaults in the third thread, but (strangely) not always: roughly one run of 2-3 passes successfully. However, I was not able to get a successfull run for the program with two threads above.

sourcetrXtrY 中的特定值似乎并不重要.

It seems that particular values in source, trX and trY are not important.

我正在做的大程序在12月份运行正常,之后我没有时间去处理它,但已经更新了几次系统.现在大程序因完全相同的段错误而失败,所以我认为它应该与较新版本的 opencv 和/或 g++ 和/或 libstdc++ 相关.

The big program that I was working on was running properly in December, after which I did not have time to work on it, but have updated the system several times. Now the big program fails with exactly the same segfault, so I think it should be something with newer versions of opencv and/or g++ and/or libstdc++.

是我的系统或代码有问题吗?或者它是一些已知的问题?或者我应该在哪里更好地报告?

Is it some problem with my system or my code? Or is it some known problem? Or where should I better report it?

我正在使用 g++ 6.2.0 运行最新的 Ubuntu 16.10(我也尝试了 4.9,结果相同).可能感兴趣的特定版本的软件包是:

I'm running up-to-date Ubuntu 16.10, with g++ 6.2.0 (I've also tried 4.9 with the same result). The specific versions of packages that may be of interest are:

$ dpkg-query -W -f='${binary:Package}	${Version}
' | grep -E '(g++|c++|opencv)'
g++     4:6.1.1-1ubuntu2
g++-4.9 4.9.4-2ubuntu1
g++-5   5.4.1-2ubuntu2
g++-6   6.2.0-5ubuntu12
lib32stdc++6    6.2.0-5ubuntu12
libflac++6v5:amd64      1.3.1-4
libopencv-calib3d2.4v5:amd64    2.4.9.1+dfsg-2.1
libopencv-contrib2.4v5:amd64    2.4.9.1+dfsg-2.1
libopencv-core-dev:amd64        2.4.9.1+dfsg-2.1
libopencv-core2.4v5:amd64       2.4.9.1+dfsg-2.1
libopencv-features2d2.4v5:amd64 2.4.9.1+dfsg-2.1
libopencv-flann2.4v5:amd64      2.4.9.1+dfsg-2.1
libopencv-highgui-dev:amd64     2.4.9.1+dfsg-2.1
libopencv-highgui2.4-deb0:amd64 2.4.9.1+dfsg-2.1
libopencv-imgproc-dev:amd64     2.4.9.1+dfsg-2.1
libopencv-imgproc2.4v5:amd64    2.4.9.1+dfsg-2.1
libopencv-legacy2.4v5:amd64     2.4.9.1+dfsg-2.1
libopencv-ml2.4v5:amd64 2.4.9.1+dfsg-2.1
libopencv-objdetect2.4v5:amd64  2.4.9.1+dfsg-2.1
libopencv-video2.4v5:amd64      2.4.9.1+dfsg-2.1
libsigc++-2.0-0v5:amd64 2.8.0-2
libstdc++-4.9-dev:amd64 4.9.4-2ubuntu1
libstdc++-5-dev:amd64   5.4.1-2ubuntu2
libstdc++-6-dev:amd64   6.2.0-5ubuntu12
libstdc++6:amd64        6.2.0-5ubuntu12
libstdc++6:i386 6.2.0-5ubuntu12

我使用以下命令构建代码:

I use the following command to build the code:

g++ --std=c++14 test.cpp -lpthread -lopencv_highgui -lopencv_core -lopencv_imgproc -o test

Valgrind 输出:

Valgrind outputs:

==18499== Thread 2:
==18499== Invalid read of size 8
==18499==    at 0x690F0BA: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
==18499==    by 0x690F18A: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
==18499==    by 0x6910CE7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
==18499==    by 0x690F691: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
==18499==    by 0x690A01F: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
==18499==    by 0x6908164: tbb::internal::allocate_root_with_context_proxy::allocate(unsigned long) const (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
==18499==    by 0x51D9E21: cv::parallel_for_(cv::Range const&, cv::ParallelLoopBody const&, double) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.9)
==18499==    by 0x55AE8A1: cv::remap(cv::_InputArray const&, cv::_OutputArray const&, cv::_InputArray const&, cv::_InputArray const&, int, int, cv::Scalar_<double> const&) (in /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.2.4.9)
==18499==    by 0x1094AC: run() (in /home/petr/osm/draw/test/test)
==18499==    by 0x10A360: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (in /home/petr/osm/draw/test/test)
==18499==    by 0x10A2ED: std::_Bind_simple<void (*())()>::operator()() (in /home/petr/osm/draw/test/test)
==18499==    by 0x10A2BD: std::thread::_State_impl<std::_Bind_simple<void (*())()> >::_M_run() (in /home/petr/osm/draw/test/test)
==18499==  Address 0xfffffffffffffff7 is not stack'd, malloc'd or (recently) free'd

推荐答案

OpenCV 提供了一个 parallel_for_ 函数,允许使用并行框架轻松并行部分代码 (英特尔 TBB,Pthreadsa> 等)在计算机上可用.

OpenCV provides a parallel_for_ function that allows to easily parallel a portion of code using the parallel framework (Intel TBB, Pthreads, etc.) available on the computer.

在您的情况下,您拥有的 OpenCV 版本似乎是 2.4.9,其中 TBB 用作默认 parallel_for_ 后端.

It seems that in your case, the version of OpenCV you have is 2.4.9 with TBB used as the default parallel_for_ backend.

这里从多个线程调用 TBB 的另一个问题.从源代码构建 OpenCV 时,解决方案可能是禁用 TBB 并使用 Pthreads(禁用 TBB 并在 CMake 中启用 Pthreads).

Here another issue with TBB called from multiple threads. The solution could be to disable TBB and use instead Pthreads (disable TBB and enable Pthreads in CMake) when building OpenCV from source.

您的解决方案也应该没问题.使用 setNumThreads(0),文档说:

Your solution should be fine also. With setNumThreads(0), the doc says:

如果线程 == 0,OpenCV 将禁用线程优化并运行它的所有功能都是按顺序执行的.

If threads == 0, OpenCV will disable threading optimizations and run all it’s functions sequentially.

我猜 setNumThreads(1) 也应该没问题吧?

I guess that setNumThreads(1) should be fine also?

很遗憾,我无法准确确定问题的根源:

Unfortunately, I cannot determine exactly the source of the issue:

  • cv::remap 通常不是线程安全的,还是仅使用 TBB?
  • TBB 版本有问题?(不确定 this 是否是相关)
  • 使用的 OpenCV 版本有问题吗?
  • cv::remap is not thread safe in general or only with TBB?
  • an issue with the version of TBB? (not sure if this is related)
  • an issue with the version of OpenCV used?

我做了两个测试:

  • 在 Ubuntu 16.04 上从源代码构建 OpenCV 3.2,并将 Pthreads 用作 parallel_for_ 后端
  • 在 Ubuntu 16.04 上从源代码构建 OpenCV 3.2 并改用 TBB
  • build OpenCV 3.2 from source on Ubuntu 16.04 with Pthreads used as parallel_for_ backend
  • build OpenCV 3.2 from source on Ubuntu 16.04 and use TBB instead

在这两种情况下,我都没有遇到任何问题.所以我希望它已经在较新的 OpenCV 版本或较新的 TBB 版本中得到解决.

In both cases, I did not get any issues. So I hope it has been solved in newer OpenCV version or maybe in newer TBB version.

注意:你可以使用

std::cout <<"getBuildInformation: " <<cv::getBuildInformation() <<std::endl;

std::cout << "getBuildInformation: " << cv::getBuildInformation() << std::endl;

打印 OpenCV 信息.在我的第二次测试中,我得到:

to print the OpenCV information. In my second tests, I get:

并行框架:TBB(ver 4.4 interface 9003)

相关文章