如何估计具有 3d 到 2d 点对应的相机姿势(使用 opencv)

您好,我的目标是开发用于飞机(模拟器)驾驶舱的头部跟踪功能,以提供 AR 以支持民用飞行员在视觉条件不佳的情况下着陆和飞行.

我的方法是检测我知道其 3D 坐标的特征点(在黑暗的模拟器 LED 中),然后计算估计的(头戴相机的)姿势 [R|t](旋转与平移连接).

我遇到的问题是估计的姿势似乎总是错误的,并且我的 3D 点的投影(我也用来估计姿势)与2D 图像点(或不可见).

我的问题是:

如何使用一组给定的 2D 到 3D 点对应关系估计相机位姿.

为什么我尝试它时它不起作用,哪里可能出现错误?

测量(3D 和 2D 点以及相机矩阵)必须有多精确才能使理论解决方案在现实生活环境中发挥作用?

理论上该方法是否适用于共面点(x,y 轴发生变化)?

我使用的硬件是 Epson BT-200.

在飞机中,我定义了一个固定的纵坐标,我希望我的程序会产生相对平移和旋转.该程序检测(唯一)LED 的图像坐标,并将它们与相应的 3D 坐标相匹配.使用我使用 open-cv 示例 android 代码 (:0.png

1.png

编辑 22.03.2015:我终于能够找到我犯的错误了.

  1. 我在 for 循环中修改了一个 Mat 对象,因为 OpenCV 工作量很大通过引用调用,我在这里不够小心.所以重新投影的 tvec 和 rvec 不正确.
  2. 我在测试环境中的一个观点有(在图片中坐标),由于轴方向混淆而被标记为错误.

所以我的方法总体上是正确的.我的测试数据集中至少(通常)没有收到有效的重投影.

不幸的是,OpenCV PnP 算法:迭代、P3P、EPNP"返回各种结果,即使使用非常不准确但接近的内在猜测,结果也只是有时是正确的.P3P算法应该提供3种解决方案,但是OpenCV 只提供了一个.EPNP 应该返回良好的结果,但使用 EPNP OpenCV 返回最差结果,根据我的人类观察评估.

现在的问题是,如何过滤不准确的值或确保 OpenCV 函数返回有效值.(也许我应该修改本机代码以接收 3 个 PnP 解决方案).

此处的压缩图像 (37MB),请显示我当前的结果(使用迭代 PnP 求解器),内在猜测为零旋转和向上 75 厘米.打印输出有一个向前的 x 轴、向左的 y 轴和向下的 z 轴,以及对应的滚动角、俯仰角和偏航角.

解决方案

我在尝试实现我的头部跟踪系统时学到的一件事是,你应该从简单的问题开始,而不是转向更复杂的问题.你的问题很长,不幸的是我没有时间分析它并在你的代码中搜索错误或逻辑错误,所以至少我会尝试给你一些提示和工作示例.

这里是OpenCV查找对象平移和旋转的教程.它是用 Python 编写的,如果有问题 这里 部分我的旧 c++ 项目.
我的项目使用 solvePnP 或 solvePnPRansac 函数执行相同的任务(您可以更改模式).请注意,我的代码是一些旧的游乐场"的一部分.项目,所以即使在我进行了清理之后,它也很混乱.当你运行它时,向相机显示打印的棋盘,按'p'开始位置和旋转估计,'m'改变模式(0-ransac,1-pnp,2-posit,这似乎不起作用......)或d"使用色散系数打开/关闭.
这两个项目都依赖于寻找棋盘图案,但应该很容易修改它们以使用其他对象.

相机校准 - 虽然我一直在研究我的头部跟踪系统,但我从来没有成功地校准过两次相同结果的相机......所以我决定使用我在 github 上找到的一些校准文件效果很好 - 这里你可以找到更多有关该文件的链接的信息.

尝试从在某些(甚至简单的)情况下产生良好结果的尽可能简单的解决方案开始.在我看来,一个好的开始点是用教程中的打印棋盘替换测试环境中的一张纸(this one) 并使其工作.从这个转向你的问题比从你的问题开始要容易得多.尝试使用任何编程语言制作任何可行的解决方案 - 考虑使用 Python 或 C++ 版本的 OpenCV - 教程/示例比 Java 版本多得多,并且将代码的结果与某些工作代码的结果进行比较会更容易.当您有一些可行的解决方案时,请尝试修改它以适应您的测试环境.有很多事情可能导致它现在无法正常工作 - 点数不足、代码中的错误甚至 OpenCV Java 包装器中的错误、对结果的错误解释等等......

edit2:

使用您的代码中的点,我设法得到以下结果:

<块引用>

rvec = [[-158.56293283],[1.46777938],[-17.32569125]]
tvec = [[ -36.23910413],[-82.83704819],[266.03157578]]

不幸的是,对我来说,很难说结果是否好...唯一可能错误对我来说是 2 个角度不同于 0(或 180).但是,如果您将 points2d 的最后一行从 (355,37), (353,72), (353,101) 更改为

<块引用>

(355,37), (355,72), (355,101)

(我猜这是你的错误,不是正确的结果)你会得到:

<块引用>

rvec = [[-159.34101842],[1.04951033],[-11.43731376]]
tvec = [[ -25.74308282],[-82.58461674],[268.12321097]]

这可能更接近正确的结果.更改相机矩阵会大大改变结果,因此请考虑测试 这篇文章中的值.

请注意,所有 rvec 值都乘以 180.0/3.14 - 在 c++ 中,solvePnPRansac 返回的 python rvec 向量包含以弧度表示的角度.

Hello my goal is to develop head-tracking functionality to be used in an aircraft (simulator) cockpit, in order to provide AR to suport civilian pilots to land and fly with bad visual conditions.

My approach is to detect characteristic points (in the dark simulator LEDs) of which I know the 3D coordinates and than compute the estimated (head worn camera's) pose [R|t] (rotation concatinated with translation).

The problem I do have is that the estimated pose seems to be always wrong and a projection of my 3D points (which I also used to estimate the pose) does not overlap with the 2D image points (or is not visible).

My questions are:

How can I estimate the camera pose with a given set of 2D-to-3D point correspondences.

Why does it not work how I try it and where might be sources of error?

How accurate must be the measurements (of 3D and 2D points and the camera matrix) to get the theoretical solution working in a real life environment?

Will the approach work for coplanar points (x,y axis changed) in theory?

The hardware I use is the Epson BT-200.

In the aircraft I defined a fixed ordinate to which I expect relative translations and rotations as result of my program. The program detects the image coordinates of (unique) LEDs and matches them to their corresponding 3D coordinate. With a camera matrix I obtained using the open-cv sample android code (https://github.com/Itseez/opencv/tree/master/samples/android/camera-calibration) I try to estimate the pose using solvePnP.

My camera matrix and distortion varries slightly. Here are some values I received from the procedure. I made sure that the circle-distance of my printed out circle pattern is the same as written down in the source-code (measured in Meters).

Here are some examples and how I create the OpenCV Mat of it.

//  protected final double[] DISTORTION_MATRIX_VALUES = new double[]{
//          /*This matrix should have 5 values*/
//          0.04569467373955304,
//          0.1402980385369059,
//          0,
//          0,
//          -0.2982135315849994
//  };

//  protected final double[] DISTORTION_MATRIX_VALUES = new double[]{
//          /*This matrix should have 5 values*/
//          0.08245931646421553,
//          -0.9893762277047577,
//          0,
//          0,
//          3.23553287438898
//  };

//  protected final double[] DISTORTION_MATRIX_VALUES = new double[]{
//          /*This matrix should have 5 values*/
//          0.07444480392067945,
//          -0.7817175834131075,
//          0,
//          0,
//          2.65433773093283
//  };
    protected final double[] DISTORTION_MATRIX_VALUES = new double[]{
            /*This matrix should have 5 values*/
            0.08909941096327206,
            -0.9537960457721699,
            0,
            0,
            3.449728790843752
    };

    protected final double[][] CAMERA_MATRIX_VALUES = new double[][]{
            /*This matrix should have 3x3 values*/
//          {748.6595405553738, 0, 319.5},
//          {0, 748.6595405553738, 239.5},
//          {0, 0, 1}
//          {698.1744297982436, 0, 320},
//          {0, 698.1744297982436, 240},
//          {0, 0, 1}
//          {707.1226937511951, 0, 319.5},
//          {0, 707.1226937511951, 239.5},
//          {0, 0, 1}
            {702.1458656346429, 0, 319.5},
            {0, 702.1458656346429, 239.5},
            {0, 0, 1}
    };

    private void initDestortionMatrix(){
        distortionMatrix = new MatOfDouble();
        distortionMatrix.fromArray(DISTORTION_MATRIX_VALUES);
    }

    private void initCameraMatrix(){
        cameraMatrix = new Mat(new Size(3,3), CvType.CV_64F);
        for(int i=0;i<CAMERA_MATRIX_VALUES.length; i++){
            cameraMatrix.put(i, 0, CAMERA_MATRIX_VALUES[i]);
        }
    }

To estimate the camera pose I do use solvePnP (and solvePnPRansac) as described in several locations (1,2,3,4). The result of solvePnP I use as input for the Projection (Calib3d.projectPoints). The inverse of the concatinated result [R|t] I do use as estimated pose.

Because my results in the productive environment were too bad I created a testing environment. In that environment I place the camera (which is because of it's 3D-shape (it's a glass) slightly rotated downwards at a table's edge. This edge I do use as ordinate of the world-coordinate system. I searched how the open-cv coordinate system might be oriented and found different answers (one on stackoverflow and one in an official youtube-talk about opencv). Anyways I tested if I got the coordinate system right by projection 3D points (described in that coordinate system) on an image and checked if the given world shape stays constant.

So I came up wiht z pointing foreward, y downward and x to the right.

To get closer to my solution I estimated the pose in my testing environment. The translation vector-output and euler angel output refers to the inverse of [R|t]. The euler angels might not be displayed correct (they might be swaped or wrong, if we take order into account) because I compute it with the convetional (I assume refering to the airplane coordinate system) equations, using an open-cv coordinate system. (The computation happens in the class Pose which I will attach). But anyways even the translation vector (of the inverse) appeard to be wrong (in my simple test).

In one test with that Image I had a roll (which might be pitch in airplane coordinates) of 30° and a translation upwards of 50cm. That appeard to be more reasonable. So I assumed because my points are coplanar, I might get ambiguous results. So I realized an other test with a point which changed in the Z-Axis. But with this test even the projection failed.

For solvePnP I tried all different solving-algorithm-flags and different parameters for the ransac algorithm.

Maybe you can somehow help me to find my mistake, or showing me a good path to solve my initial problem. I am going to attach also my debugging source-code with many println statements and the debugging images. This code contains my point measurements.

Thanks for your help in advance.

Class Main.java: Class Pose.java: 0.png

1.png

EDIT 22.03.2015: Finally I have been able to find mistakes I made.

  1. I modified a Mat object in a for-loop, because OpenCV works a lot with call by reference, and I was not careful enough here. So the tvec and rvec for the reprojection were not right.
  2. One of my points in the testing environment had (in the image coordinates), was tagged wrong due to an axis-direction confusion.

So my approach in general was right. I am not receiving at least (often) valid reprojections in my test-dataset.

Unfortunately the OpenCV PnP algorithms: "ITERATIVE, P3P, EPNP" return various results, and even with using a very unaccurate but close intrinsic guess, the results are only sometimes correct. The P3P algorithm is supposed to provide 3 solutions, but OpenCV only provides one. EPNP is supposed to return good results, but with EPNP OpenCV returns the worst results, evaluated from my human obersation.

The problem now is, how to filter the inaccurate values or ensure the OpenCV function returns valid ones. (Maybe I shuold modify the native code to receive 3 solutions for PnP).

The compressed images here (37MB), do show my current results (with the ITERATIVE PnP-Solver) , with an intrinsic guess of zero rotation and 75 cm upwards. The print-out has an x-axis foreward, y-axis to the left and z-down, and corrosponding roll, pitch, and yaw angles.

解决方案

One thing that i've learned during trying to implement my head tracking system is that you should start from simple problem and than move to more complicated one. Your question is quite ong and unfortunetely i don't have time to analyze it and search for a bug or logical mistake in your code, so at least i will try to give you some hints and working examples.

Here is OpenCV tutorial for finding object translation and rotation. It's written in Python, if it is a problem here part of my old c++ project.
My project performs the same task using solvePnP or solvePnPRansac function (you can change mode). Note that my code it's a part of some old "playground" project, so even after cleaining which i performed it's quite messy. When you run it, show printed chessboard to the camera, press 'p' to start position and rotation estimation, 'm' to change mode (0-ransac, 1-pnp, 2-posit which seems to not work...) or 'd' to turn on/off using dissortion coefficients.
Both projects relies on finding chessboard pattern, but it shoud be easy to modify them to use some other objects.

Camera calibration - while i've been working on my head tracking system i've never managed to calibrate camera twice with the same results... So i decided to use some calibartion file which i've found on github and it worked well - here you can found a litte more information about that an link to this file.

edit:

Try to start with as simple as possible solution that gives good results in some (even simple) situation. A good point to start in my opinion is to replace a sheet of paper from your testing environment with printed chessboard from tutorial (this one) and make it working. Moving from this to your problem will be much easier than beginning with you problem. Try to make any working solution in any programming language - consider using Python or C++ version of OpenCV - there is much more tutorials/examples than to Java version and comparing results from your code with results from some working code will make it much easier. When you will have some working solution try to modify it to work with your testing environment. There is a lot of things which may cause it not working right now - not enough points, bug in your code or even in OpenCV Java wrapper, bad interpretation of results, etc...

edit2:

Using points from your code i've managed to get following results:

rvec = [[-158.56293283], [ 1.46777938], [ -17.32569125]]
tvec = [[ -36.23910413], [ -82.83704819], [ 266.03157578]]

Unfortunetely, for me it's hard to say whether results are good or not... The only thing that might be wrong to me is that 2 angles are different from 0 (or 180). But if you change last row of points2d from (355,37), (353,72), (353,101) to

(355,37), (355,72), (355,101)

(i guess it's your mistake, not a correct result) you will get:

rvec = [[-159.34101842], [ 1.04951033], [ -11.43731376]]
tvec = [[ -25.74308282], [ -82.58461674], [ 268.12321097]]

which might be much closer to the correct result. Changing camera matrix changes results much, so consider testing values from this post.

Note that all rvec values are multiplied by 180.0/3.14 - in c++ and python rvec vector returned by solvePnPRansac contains angles in radians.

相关文章