根据文本方向检测图像方向角度
问题描述
我正在执行一项 OCR 任务,以从多个身份证明文件中提取信息.一个挑战是扫描图像的方向.需要固定 PAN、Aadhaar、驾驶执照或任何身份证明的扫描图像的方向.
已经在 Stackoverflow 和其他论坛上尝试过所有建议的方法,例如 OpenCV minAreaRect、霍夫线变换、FFT、单应性、具有 psm 0 的 tesseract osd.没有一个有效.
逻辑应返回文本方向的角度 - 0、90 和 270 度.附上0、90、270度的图片.这与确定偏度无关.
解决方案这是一种基于大部分文本偏向一侧的假设的方法.这个想法是我们可以根据主要文本区域的位置来确定角度
- 将图像转换为灰度和高斯模糊
- 获取二值图像的自适应阈值
- 使用轮廓区域查找轮廓和过滤
- 在蒙版上绘制过滤轮廓
- 根据方向水平或垂直分割图像
- 计算每一半的像素数
转换为灰度和高斯模糊后,我们自适应阈值得到二值图像
从这里我们找到轮廓并使用轮廓区域进行过滤以去除小的噪声颗粒和大的边界.我们将通过此过滤器的任何轮廓绘制到蒙版上
为了确定角度,我们根据图像的尺寸将图像分成两半.如果 <代码> 宽度 >height 那么它必须是水平图像,所以我们垂直分成两半.如果 <代码> 高度 >宽度 那么它必须是垂直图像所以我们水平分割成两半
现在我们有两半,我们可以使用 cv2.countNonZero()
来确定每一半的白色像素的数量.以下是确定角度的逻辑:
如果是水平的如果左 >= 右度->0别的度->180如果垂直如果顶部 >= 底部度->270别的度->90
<块引用>
离开9703
右 3975
因此图像是 0 度.这是其他方向的结果
<块引用>离开 3975
右 9703
我们可以得出结论,图像翻转了 180 度
这是垂直图像的结果.注意因为它是一个垂直的图像,我们水平分割
<块引用>前 3947 个
底部 9550
因此结果是90度
导入 cv2将 numpy 导入为 npdef 检测角度(图像):掩码 = np.zeros(image.shape,dtype=np.uint8)灰色 = cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(灰色, (3,3), 0)自适应 = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)cnts = cv2.findContours(自适应,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否则 cnts[1]对于 cnts 中的 c:面积 = cv2.contourArea(c)如果面积 <45000 和区域 >20:cv2.drawContours(掩码,[c],-1,(255,255,255),-1)掩码 = cv2.cvtColor(掩码,cv2.COLOR_BGR2GRAY)h, w = mask.shape# 水平的如果 w >H:左 = 掩码[0:h, 0:0+w//2]右 = 掩码 [0:h, w//2:]left_pixels = cv2.countNonZero(左)right_pixels = cv2.countNonZero(右)如果 left_pixels >= right_pixels 则返回 0 否则 180# 垂直的别的:顶部 = 掩码[0:h//2, 0:w]底部 = 掩码[h//2:, 0:w]top_pixels = cv2.countNonZero(top)bottom_pixels = cv2.countNonZero(底部)如果 bottom_pixels >= top_pixels 则返回 90,否则返回 270如果 __name__ == '__main__':图像 = cv2.imread('1.png')角度 = 检测角度(图像)打印(角度)
I am working on a OCR task to extract information from multiple ID proof documents. One challenge is the orientation of the scanned image. The need is to fix the orientation of the scanned image of PAN, Aadhaar, Driving License or any ID proof.
Already tried all suggested approaches on Stackoverflow and other forums such as OpenCV minAreaRect, Hough Lines Transforms, FFT, homography, tesseract osd with psm 0. None are working.
The logic should return the angle of the text direction - 0, 90 and 270 degrees. Attached are the images of 0, 90 and 270 degrees. This is not about determining the skewness.
解决方案Here's an approach based on the assumption that the majority of the text is skewed onto one side. The idea is that we can determine the angle based on the where the major text region is located
- Convert image to grayscale and Gaussian blur
- Adaptive threshold to get a binary image
- Find contours and filter using contour area
- Draw filtered contours onto mask
- Split image horizontally or vertically based on orientation
- Count number of pixels in each half
After converting to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image
From here we find contours and filter using contour area to remove the small noise particles and the large border. We draw any contours that pass this filter onto a mask
To determine the angle, we split the image in half based on the image's dimension. If width > height
then it must be a horizontal image so we split in half vertically. if height > width
then it must be a vertical image so we split in half horizontally
Now that we have two halves, we can use cv2.countNonZero()
to determine the amount of white pixels on each half. Here's the logic to determine angle:
if horizontal
if left >= right
degree -> 0
else
degree -> 180
if vertical
if top >= bottom
degree -> 270
else
degree -> 90
left 9703
right 3975
Therefore the image is 0 degrees. Here's the results from other orientations
left 3975
right 9703
We can conclude that the image is flipped 180 degrees
Here's results for vertical image. Note since its a vertical image, we split horizontally
top 3947
bottom 9550
Therefore the result is 90 degrees
import cv2
import numpy as np
def detect_angle(image):
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
adaptive = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)
cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 45000 and area > 20:
cv2.drawContours(mask, [c], -1, (255,255,255), -1)
mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
h, w = mask.shape
# Horizontal
if w > h:
left = mask[0:h, 0:0+w//2]
right = mask[0:h, w//2:]
left_pixels = cv2.countNonZero(left)
right_pixels = cv2.countNonZero(right)
return 0 if left_pixels >= right_pixels else 180
# Vertical
else:
top = mask[0:h//2, 0:w]
bottom = mask[h//2:, 0:w]
top_pixels = cv2.countNonZero(top)
bottom_pixels = cv2.countNonZero(bottom)
return 90 if bottom_pixels >= top_pixels else 270
if __name__ == '__main__':
image = cv2.imread('1.png')
angle = detect_angle(image)
print(angle)
相关文章