Python中许多图像的快速而强大的图像拼接算法?
问题描述
我有一个固定的相机,它可以快速拍摄连续移动的产品,但在相同角度的固定位置(平移透视).我需要将所有图像拼接成全景图片.我尝试过使用 Stitcher 类.它有效,但计算需要很长时间.我还尝试使用另一种方法,使用 SIFT 检测器 FNNbasedMatcher,找到 Homography 然后扭曲图像.如果我只使用两个图像,这种方法可以正常工作.对于多个图像,它仍然无法正确拼接它们.有谁知道这种情况下最好最快的图像拼接算法吗?
I have a stationary camera which takes photos rapidly of the continuosly moving product but in a fixed position just of the same angle (translation perspective). I need to stitch all images into a panoramic picture. I've tried by using the class Stitcher. It worked, but it took a long time to compute. I also tried to use another method by using the SIFT detector, FNNbasedMatcher, finding Homography and then warping the images. This method works fine if I only use two images. For multiple images it still doesn't stitch them properly. Does anyone know the best and fastest image stitching algorithm for this case?
这是我使用 Stitcher 类的代码.
This is my code which uses the Stitcher class.
import time
import cv2
import os
import numpy as np
import sys
def main():
# read input images
imgs = []
path = 'pics_rotated/'
i = 0
for (root, dirs, files) in os.walk(path):
images = [f for f in files]
print(images)
for i in range(0,len(images)):
curImg = cv2.imread(path + images[i])
imgs.append(curImg)
stitcher = cv2.Stitcher.create(mode= 0)
status ,result = stitcher.stitch(imgs)
if status != cv2.Stitcher_OK:
print("Can't stitch images, error code = %d" % status)
sys.exit(-1)
cv2.imwrite("imagesout/output.jpg", result)
cv2.waitKey(0)
if __name__ == '__main__':
start = time.time()
main()
end = time.time()
print("Time --->>>>>", end - start)
cv2.destroyAllWindows()enter code here
解决方案
简报
虽然 OpenCV Stitcher
类 提供执行拼接的方法和选项很多,但由于复杂性,我发现很难使用它.因此,我将尝试提供最小和最快的方法来执行拼接.如果您想知道更复杂的方法,例如曝光补偿,我强烈建议您查看 详细的示例代码.作为旁注,如果有人可以将以下函数转换为使用 Stitcher 类,我将不胜感激.
Briefing
Although OpenCV Stitcher
class provides lots of methods and options to perform stitching, I find it hard to use it because of the complexity.
Therefore, I will try to provide the minimum and fastest way to perform stitching.
In case you are wondering more sophisticated approachs such as exposure compensation, I highly recommend looking at the detailed sample code.
As a side note, I will be grateful if someone can convert the following functions to use Stitcher class.
为了将多张图片组合成同一个视角,需要进行如下操作:
In order to combine multiple images into the same perspective, the following operations are needed:
- 检测并匹配特征.
- 计算单应性(帧之间的透视变换).
- 将一个图像扭曲到另一个视角.
- 结合基础图像和变形图像,同时跟踪原点的变化.
- 给定组合图案,拼接多张图片.
什么是功能?它们是可区分的部分,如正方形的角,在图像中保留.为获得这些特征点提出了不同的算法,如 Harris、ORB、SIFT、SURF 等.请参阅 cv::Feature2d
了解完整信息列表.我将使用 SIFT,因为它准确且足够快.
What are features?
They are distinguishable parts, like corners of a square, that are preserved across images.
There are different algorithms proposed for obtaining these characteristic points, like Harris, ORB, SIFT, SURF, etc.
See cv::Feature2d
for the full list.
I will use SIFT because it is accurate and sufficiently fast.
一个特征由一个 KeyPoint 组成,它是在图像和一个描述符,它是一组表示特征属性的数字(例如 128 维向量).
A feature consists of a KeyPoint, which is the location in the image, and a descriptor, which is a set of numbers (e.g. a 128-D vector) that represents the properties of the feature.
在图像中找到不同的点后,我们需要匹配对应的点对.请参阅 cv::DescriptionMatcher
.我将使用基于 Flann 的描述符匹配器.
After finding distinct points in images, we need to match the corresponding point pairs.
See cv::DescriptionMatcher
.
I will use Flann-based descriptor matcher.
首先,我们初始化描述符和匹配器类.
First, we initialize the descriptor and matcher classes.
descriptor = cv.SIFT.create()
matcher = cv.DescriptorMatcher.create(cv.DescriptorMatcher.FLANNBASED)
然后,我们找到每个图像中的特征.
Then, we find the features in each image.
(kps, desc) = descriptor.detectAndCompute(image, mask=None)
现在我们找到对应的点对.
Now we find the corresponding point pairs.
if (desc1 is not None and desc2 is not None and len(desc1) >=2 and len(desc2) >= 2):
rawMatch = matcher->knnMatch(desc2, desc1, k=2)
matches = []
# ensure the distance is within a certain ratio of each other (i.e. Lowe's ratio test)
ratio = 0.75
for m in rawMatch:
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
matches.append((m[0].trainIdx, m[0].queryIdx))
单应性计算
单应性是从一种视图到另一种视图的透视转换.一个视图中的平行线在另一个视图中可能不平行,就像一条通往日落的道路.我们需要至少有 4 个对应点对.越多意味着必须分解或消除的冗余数据.
Homography computation
Homography is the perspective transformation from one view to another. The parallel lines in one view may not be parallel in another, like a road to sunset. We need to have at least 4 corresponding point pairs. The more means redundant data that have to be decomposed or eliminated.
将初始视图中的点转换为其扭曲位置的单应矩阵.它是由 直接线性变换算法 计算的 3x3 矩阵.自由度有 8 个,矩阵的最后一个元素是 1.
Homography matrix that transforms the point in the initial view to its warped position. It is a 3x3 matrix that is computed by Direct Linear Transform algorithm. There are 8 DoF and the last element in the matrix is 1.
[pt2] = H * [pt1]
现在我们有了对应的点匹配,我们计算单应性.我们用来处理冗余数据的方法是 RANSAC,它随机选择 4 个点对并使用最佳拟合结果.请参阅 cv::findHomography
了解更多选择.
Now that we have corresponding point matches, we compute the homography.
The method we use to handle redundant data is RANSAC, which randomly selects 4 point pairs and uses the best fitting result.
See cv::findHomography
for more options.
if len(matches) > 4:
(H, status) = cv.findHomography(pts1, pts2, cv.RANSAC)
透视变形
通过计算单应性,我们知道源图像中的哪个点对应于目标图像中的哪个点.为了不丢失源图像的信息,我们需要按照变换点落在负区域的量来填充目标图像.同时,我们需要跟踪原点的偏移量,以便拼接多张图像.
Warping to perspective
By computing homography, we know which point in the source image corresponds to which point in the destination image. In order not to lose information from the source image, we need to pad the destination image by the amount that the transformed point falls to negative regions. At the same time, we need to keep track of the shift amount of the origin for stitching multiple images.
# find the ROI of a transformation result
def warpRect(rect, H):
x, y, w, h = rect
corners = [[x, y], [x, y + h - 1], [x + w - 1, y], [x + w - 1, y + h - 1]]
extremum = cv.transform(corners, H)
minx, miny = np.min(extremum[:,0]), np.min(extremum[:,1])
maxx, maxy = np.max(extremum[:,0]), np.max(extremum[:,1])
xo = int(np.floor(minx))
yo = int(np.floor(miny))
wo = int(np.ceil(maxx - minx))
ho = int(np.ceil(maxy - miny))
outrect = (xo, yo, wo, ho)
return outrect
# homography matrix is translated to fit in the screen
def coverH(rect, H):
# obtain bounding box of the result
x, y, _, _ = warpRect(rect, H)
# shift amount to the first quadrant
xpos = int(-x if x < 0 else 0)
ypos = int(-y if y < 0 else 0)
# correct the homography matrix so that no point is thrown out
T = np.array([[1, 0, xpos], [0, 1, ypos], [0, 0, 1]])
H_corr = T.dot(H)
return (H_corr, (xpos, ypos))
# pad image to cover ROI, return the shift amount of origin
def addBorder(img, rect):
x, y, w, h = rect
tl = (x, y)
br = (x + w, y + h)
top = int(-tl[1] if tl[1] < 0 else 0)
bottom = int(br[1] - img.shape[0] if br[1] > img.shape[0] else 0)
left = int(-tl[0] if tl[0] < 0 else 0)
right = int(br[0] - img.shape[1] if br[0] > img.shape[1] else 0)
img = cv.copyMakeBorder(img, top, bottom, left, right, cv.BORDER_CONSTANT, value=[0, 0, 0])
orig = (left, top)
return img, orig
def size2rect(size):
return (0, 0, size[1], size[0])
变形函数
def warpImage(img, H):
# tweak the homography matrix to move the result to the first quadrant
H_cover, pos = coverH(size2rect(img.shape), H)
# find the bounding box of the output
x, y, w, h = warpRect(size2rect(img.shape), H_cover)
width, height = x + w, y + h
# warp the image using the corrected homography matrix
warped = cv.warpPerspective(img, H_corr, (width, height))
# make the external boundary solid black, useful for masking
warped = np.ascontiguousarray(warped, dtype=np.uint8)
gray = cv.cvtColor(warped, cv.COLOR_RGB2GRAY)
_, bw = cv.threshold(gray, 1, 255, cv.THRESH_BINARY)
# https://stackoverflow.com/a/55806272/12447766
major = cv.__version__.split('.')[0]
if major == '3':
_, cnts, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
else:
cnts, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
warped = cv.drawContours(warped, cnts, 0, [0, 0, 0], lineType=cv.LINE_4)
return (warped, pos)
结合变形图像和目标图像
这是涉及图像增强(例如曝光补偿)的步骤.为了简单起见,我们将使用均值混合.最简单的解决方案是覆盖目标图像中的现有数据,但平均操作对我们来说不是负担.
Combining warped and destination images
This is the step where image enhancement such as exposure compensation becomes involved. In order to keep things simple, we will use mean value blending. The easiest solution would be overriding the existing data in the destination image but averaging operation is not a burden for us.
# only the non-zero pixels are weighted to the average
def mean_blend(img1, img2):
assert(img1.shape == img2.shape)
locs1 = np.where(cv.cvtColor(img1, cv.COLOR_RGB2GRAY) != 0)
blended1 = np.copy(img2)
blended1[locs1[0], locs1[1]] = img1[locs1[0], locs1[1]]
locs2 = np.where(cv.cvtColor(img2, cv.COLOR_RGB2GRAY) != 0)
blended2 = np.copy(img1)
blended2[locs2[0], locs2[1]] = img2[locs2[0], locs2[1]]
blended = cv.addWeighted(blended1, 0.5, blended2, 0.5, 0)
return blended
def warpPano(prevPano, img, H, orig):
# correct homography matrix
T = np.array([[1, 0, -orig[0]], [0, 1, -orig[1]], [0, 0, 1]])
H_corr = H.dot(T)
# warp the image and obtain shift amount of origin
result, pos = warpImage(prevPano, H_corr)
xpos, ypos = pos
# zero pad the result
rect = (xpos, ypos, img.shape[1], img.shape[0])
result, _ = addBorder(result, rect)
# mean value blending
idx = np.s_[ypos : ypos + img.shape[0], xpos : xpos + img.shape[1]]
result[idx] = mean_blend(result[idx], img)
# crop extra paddings
x, y, w, h = cv.boundingRect(cv.cvtColor(result, cv.COLOR_RGB2GRAY))
result = result[y : y + h, x : x + w]
# return the resulting image with shift amount
return (result, (xpos - x, ypos - y))
给定组合模式拼接多个图像
# base image is the last image in each iteration
def blend_multiple_images(images, homographies):
N = len(images)
assert(N >= 2)
assert(len(homographies) == N - 1)
pano = np.copy(images[0])
pos = (0, 0)
for i in range(N - 1):
img = images[i + 1]
# get homography matrix
H = homographies[i]
# warp pano onto image
pano, pos = warpPano(pano, img, H, pos)
return (pano, pos)
上述方法随后将先前组合的图像(称为全景)扭曲到下一张图像上.然而,一个图案可能具有最佳拼接视图的连接点.
The method above warps the previously combined image, called pano, onto the next image subsequently. A pattern, however, may have conjunction points for the best stitching view.
例如
1 2 3
4 5 6
组合这些图像的最佳模式是
The best pattern to combine these images is
1 -> 2 <- 3
|
V
4 -> 5 <- 6
因此,我们需要最后一个函数来组合 1 &2
与 2 &3
,或 1235
与 456
在节点 5
.
Therefore, we need one last function to combine 1 & 2
with 2 & 3
, or 1235
with 456
at node 5
.
from operator import sub
# no warping here, useful for combining two different stitched images
# the image at given origin coordinates must be the same
def patchPano(img1, img2, orig1=(0,0), orig2=(0,0)):
# bottom right points
br1 = (img1.shape[1] - 1, img1.shape[0] - 1)
br2 = (img2.shape[1] - 1, img2.shape[0] - 1)
# distance from orig to br
diag2 = tuple(map(sub, br2, orig2))
# possible pano corner coordinates based on img1
extremum = np.array([(0, 0), br1,
tuple(map(sum, zip(orig1, diag2))),
tuple(map(sub, orig1, orig2))])
bb = cv.boundingRect(extremum)
# patch img1 to img2
pano, shift = addBorder(img1, bb)
orig = tuple(map(sum, zip(orig1, shift)))
idx = np.s_[orig[1] : orig[1] + img2.shape[0] - orig2[1],
orig[0] : orig[0] + img2.shape[1] - orig2[0]]
subImg = img2[orig2[1] : img2.shape[0], orig2[0] : img2.shape[1]]
pano[idx] = mean_blend(pano[idx], subImg)
return (pano, orig)
对于快速演示,您可以运行 GitHub 中的 Python 代码.如果你想在 C++ 中使用上述方法,你可以看看 缝合库.欢迎对这篇文章进行任何 PR 或编辑.
For a quick demo, you can run the Python code in GitHub. If you want to use the above methods in C++, you can have a look at Stitch library. Any PR or edit to this post is welcome.
相关文章