来自 cv::solvePnP 的世界坐标中的相机位置

2021-12-10 00:00:00 opencv opengl computer-vision c++ pose-estimation

我有一个校准的相机(内在矩阵和失真系数)，我想知道相机的位置，知道图像中的一些 3d 点及其对应点(2d 点).

I have a calibrated camera (intrinsic matrix and distortion coefficients) and I want to know the camera position knowing some 3d points and their corresponding points in the image (2d points).

我知道 cv::solvePnP 可以帮助我，并且在阅读了 this 和this 我知道solvePnP rvec 和tvec 的输出是对象在相机坐标系中的旋转和平移.

I know that cv::solvePnP could help me, and after reading this and this I understand that I the outputs of solvePnP rvec and tvec are the rotation and translation of the object in camera coordinate system.

所以我需要在世界坐标系中找出相机的旋转/平移.

So I need to find out the camera rotation/translation in the world coordinate system.

从上面的链接看来，代码很简单，在 python 中:

From the links above it seems that the code is straightforward, in python:

found,rvec,tvec = cv2.solvePnP(object_3d_points, object_2d_points, camera_matrix, dist_coefs) rotM = cv2.Rodrigues(rvec)[0] cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)

我不知道 python/numpy 的东西(我使用的是 C++)但这对我来说没有多大意义:

I don't know python/numpy stuffs (I'm using C++) but this does not make a lot of sense to me:

rvec, tvec 从solvePnP 输出是3x1 矩阵，3 个元素向量
cv2.Rodrigues(rvec) 是一个 3x3 矩阵
cv2.Rodrigues(rvec)[0] 是一个 3x1 矩阵，3 个元素向量
cameraPosition 是一个 3x1 * 1x3 矩阵乘法，它是一个.. 3x3 矩阵.如何通过简单的 glTranslatef 和 glRotate 调用在 opengl 中使用它?

rvec, tvec output from solvePnP are 3x1 matrix, 3 element vectors

cv2.Rodrigues(rvec) is a 3x3 matrix

cv2.Rodrigues(rvec)[0] is a 3x1 matrix, 3 element vectors

cameraPosition is a 3x1 * 1x3 matrix multiplication that is a.. 3x3 matrix. how can I use this in opengl with simple glTranslatef and glRotate calls?

推荐答案

如果用世界坐标"表示对象坐标"，则必须得到 pnp 算法给出的结果的逆变换.

If with "world coordinates" you mean "object coordinates", you have to get the inverse transformation of the result given by the pnp algorithm.

有一个反转变换矩阵的技巧，它允许您保存反转操作，这通常很昂贵，并且解释了 Python 中的代码.给定一个变换 [R|t]，我们有 inv([R|t]) = [R'|-R'*t]，其中 R' 是 R 的转置.因此，您可以编写代码(未经测试):

There is a trick to invert transformation matrices that allows you to save the inversion operation, which is usually expensive, and that explains the code in Python. Given a transformation [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. So, you can code (not tested):

cv::Mat rvec, tvec; solvePnP(..., rvec, tvec, ...); // rvec is 3x1, tvec is 3x1 cv::Mat R; cv::Rodrigues(rvec, R); // R is 3x3 R = R.t(); // rotation of inverse tvec = -R * tvec; // translation of inverse cv::Mat T = cv::Mat::eye(4, 4, R.type()); // T is 4x4 T( cv::Range(0,3), cv::Range(0,3) ) = R * 1; // copies R into T T( cv::Range(0,3), cv::Range(3,4) ) = tvec * 1; // copies tvec into T // T is a 4x4 matrix with the pose of the camera in the object frame

更新:稍后，要将 T 与 OpenGL 一起使用，您必须牢记 OpenCV 和 OpenGL 的相机框架轴不同.

Update: Later, to use T with OpenGL you have to keep in mind that the axes of the camera frame differ between OpenCV and OpenGL.

OpenCV 使用计算机视觉中常用的引用:X 指向右侧，Y 向下，Z 指向前面(如这张图片).OpenGL 中相机的框架是:X 指向右侧，Y 向上，Z 指向后(如这张图片).因此，您需要绕 X 轴旋转 180 度.这个旋转矩阵的公式在维基百科.

OpenCV uses the reference usually used in computer vision: X points to the right, Y down, Z to the front (as in this image). The frame of the camera in OpenGL is: X points to the right, Y up, Z to the back (as in the left hand side of this image). So, you need to apply a rotation around X axis of 180 degrees. The formula of this rotation matrix is in wikipedia.

// T is your 4x4 matrix in the OpenCV frame cv::Mat RotX = ...; // 4x4 matrix with a 180 deg rotation around X cv::Mat Tgl = T * RotX; // OpenGL camera in the object frame

这些转换总是令人困惑，我可能在某些步骤上是错的，所以请持保留态度.

These transformations are always confusing and I may be wrong at some step, so take this with a grain of salt.

最后，考虑到 OpenCV 中的矩阵以行优先顺序存储在内存中，而 OpenGL 中的矩阵以列优先顺序存储.

Finally, take into account that matrices in OpenCV are stored in row-major order in memory, and OpenGL ones, in column-major order.

相关文章