识别相似图像的好方法?

我在 PHP 中开发了一种简单快速的算法来比较图像的相似性.

I've developed a simple and fast algorithm in PHP to compare images for similarity.

它的散列速度很快(800x600 图像每秒约 40 个),未经优化的搜索算法可以在 22 分钟内遍历 3,000 个图像,将每张图像与其他图像进行比较(3/秒).

Its fast (~40 per second for 800x600 images) to hash and a unoptimised search algorithm can go through 3,000 images in 22 mins comparing each one against the others (3/sec).

基本概述是您获取图像,将其重新缩放为 8x8,然后将这些像素转换为 HSV.然后将色相、饱和度和数值截断为 4 位,变成一个大的十六进制字符串.

The basic overview is you get a image, rescale it to 8x8 and then convert those pixels for HSV. The Hue, Saturation and Value are then truncated to 4 bits and it becomes one big hex string.

比较图像基本上是沿着两个字符串走,然后添加它发现的差异.如果总数低于 64,则它是相同的图像.不同的图像通常在 600 - 800 左右.低于 20 并且极其相似.

Comparing images basically walks along two strings, and then adds the differences it finds. If the total number is below 64 then its the same image. Different images are usually around 600 - 800. Below 20 and extremely similar.

我可以使用这个模型有什么改进吗?我还没有查看不同组件(色调、饱和度和值)与比较的相关性.色调可能很重要,但其他的呢?

Are there any improvements upon this model I can use? I havent looked at how relevant the different components (hue, saturation and value) are to the comparison. Hue is probably quite important but the others?

为了加快搜索速度,我可能可以将每个部分的 4 位分成两半,并将最高有效位放在首位,这样如果检查失败,则根本不需要检查 lsb.我不知道一种有效的方式来存储这样的位,但仍然可以轻松地搜索和比较它们.

To speed up searches I could probably split the 4 bits from each part in half, and put the most significant bits first so if they fail the check then the lsb doesnt need to be checked at all. I dont know a efficient way to store bits like that yet still allow them to be searched and compared easily.

我一直在使用包含 3,000 张照片(大部分是独一无二的)的数据集,并且没有出现任何误报.它完全不受调整大小的影响,并且对亮度和对比度变化具有相当的抵抗力.

I've been using a dataset of 3,000 photos (mostly unique) and there havent been any false positives. Its completely immune to resizes and fairly resistant to brightness and contrast changes.

推荐答案

你要使用的是:

  1. 特征提取
  2. 散列
  3. 本地感知的布隆哈希.

<小时>

  1. 大多数人都使用 SIFT 功能,尽管我在非尺度不变功能方面有过更好的体验.基本上,您使用边缘检测器来找到有趣的点,然后将图像块围绕这些点居中.这样您还可以检测子图像.

  1. Most people use SIFT features, although I've had better experiences with not scale-invariant ones. Basically you use an edge detector to find interesting points and then center your image patches around those points. That way you can also detect sub-images.

你实现的是一个哈希方法.有很多东西可以尝试,但你的应该可以正常工作:)

What you implemented is a hash method. There's tons to try from, but yours should work fine :)

让它快速的关键步骤是散列你的哈希值.您将值转换为一元表示,然后将位的随机子集作为新哈希.使用 20-50 个随机样本执行此操作,您将获得 20-50 个哈希表.如果任何特征匹配这 50 个哈希表中的 2 个或更多,则该特征将与您已经存储的特征非常相似.这允许您转换 abs(x-y)

The crucial step to making it fast is to hash your hashes. You convert your values into unary representation and then take a random subset of the bits as the new hash. Do that with 20-50 random samples and you get 20-50 hash tables. If any feature matches 2 or more out of those 50 hash tables, the feature will be very similar to one you already stored. This allows you to convert the abs(x-y)

希望对您有所帮助,如果您想尝试我自行开发的图像相似度搜索,请在 hajo 给我发邮件 鲱鱼

Hope it helps, if you'd like to try out my self-developed image similarity search, drop me a mail at hajo at spratpix

相关文章