如何使用多线程优化人脸检测?

问题描述

我有一段代码,它使用CSV文件中的图像URL列表,然后对这些图像执行面部检测,然后加载一些模型并对这些图像进行预测。

我做了一些负载测试,发现代码中的Get_Face函数占用了生成结果所需的大部分时间,额外的时间被为预测创建的Pickle文件占用。

问题:是否有可能通过在线程中运行这些进程来减少时间,以及如何以多线程方式实现这一点?

以下是代码示例:

from __future__ import division
import numpy as np

from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize

df = pd.read_csv('/home/instaurls.csv')
detector = dlib.get_frontal_face_detector()
img_width, img_height = 139, 139
confidence = 0.8

def get_face():
    output = None
    data1 = []
    for row in df.itertuples():
        img = io.imread(row[1])
        dets = detector(img, 1)
        for i, d in enumerate(dets):
            img = img[d.top():d.bottom(), d.left():d.right()]
            img = resize(img, (img_width, img_height))
            output = np.expand_dims(img, axis=0)
            break
        data1.append(output)
    data1 = np.concatenate(data1)
    return data1

get_face()

CSV样本

data
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg

解决方案

以下是您可以尝试并行执行的方法:

from __future__ import division
import numpy as np

from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize
from csv import DictReader

df = DictReader(open('/home/instaurls.csv')) # DictReader is iterable
detector = dlib.get_frontal_face_detector() 
img_width, img_height = 139, 139
confidence = 0.8

def get_face(row):
    """
    Here row is dictionary where keys are CSV header names
    and values are values from current CSV row.
    """
    output = None

    img = io.imread(row[1]) # row[1] has to be changed to row['data']?
    dets = detector(img, 1)
    for i, d in enumerate(dets):
        img = img[d.top():d.bottom(), d.left():d.right()]
        img = resize(img, (img_width, img_height))
        output = np.expand_dims(img, axis=0)
        break

    return output

if __name__ == '__main__':
    pool = Pool() # default to number CPU cores
    data = list(pool.imap(get_face, df))
    print np.concatenate(data)

注意get_face和它已有的论点。还有,它的回报是什么。这就是我说的小块工作的意思。现在get_face处理CSV中的一行。

运行此脚本时,pool将引用Pool的实例,然后为df.itertuples()中的每个行/元组调用get_face

完成所有操作后,data保留处理数据,然后np.concatenate对其执行操作。

相关文章