Tensorflow detection API GPU медленная обработка

Медленно работает Tensorflow detection API на GPU.

GPU видите, видео память нагружает. Но на обработку одной фотографии уходит порядка 7 секунд.

Код из примера ssd_mobilenet_v1_coco

import numpy as np import tensorflow as tf import cv2 as cv  # Read the graph. with tf.gfile.FastGFile('frozen_inference_graph.pb', 'rb') as f:     graph_def = tf.GraphDef()     graph_def.ParseFromString(f.read())  with tf.Session() as sess:     # Restore session     sess.graph.as_default()     tf.import_graph_def(graph_def, name='')      # Read and preprocess an image.     img = cv.imread('example.jpg')     rows = img.shape[0]     cols = img.shape[1]     inp = cv.resize(img, (300, 300))     inp = inp[:, :, [2, 1, 0]]  # BGR2RGB      # Run the model     out = sess.run([sess.graph.get_tensor_by_name('num_detections:0'),                     sess.graph.get_tensor_by_name('detection_scores:0'),                     sess.graph.get_tensor_by_name('detection_boxes:0'),                     sess.graph.get_tensor_by_name('detection_classes:0')],                    feed_dict={'image_tensor:0': inp.reshape(1, inp.shape[0], inp.shape[1], 3)})      # Visualize detected bounding boxes.     num_detections = int(out[0][0])     for i in range(num_detections):         classId = int(out[3][0][i])         score = float(out[1][0][i])         bbox = [float(v) for v in out[2][0][i]]         if score > 0.3:             x = bbox[1] * cols             y = bbox[0] * rows             right = bbox[3] * cols             bottom = bbox[2] * rows             cv.rectangle(img, (int(x), int(y)), (int(right), int(bottom)), (125, 255, 51), thickness=2) 

Характеристики

Windows 10

i7 8700

GTX 1060 GB

Где может быть ошибка?

Trying to modularise OpenCV detection algorithms

I have made a project involving the use of OpenCV to detect faces. The project has yet to grow tremendously over the course of this year, therefore the fact that I am failing to modularise my code to make it cleaner and easier to read is worrisome.

I use a camera feed to detect faces, as well as haarcascades, therefore OpenCV must do its analysis frame by frame, in a loop, like this:

project_dir = dirname(dirname(__file__)) face_cascade_path = join(project_dir, "haarcascades/haarcascade_frontalface_default.xml")  face_cascade = cv2.CascadeClassifier(face_cascade_path)  while camera.view.isOpened():     ret_val, frame = camera.view.read()     frame = cv2.resize(frame, (camera.width, camera.height))       gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)     faces = face_cascade.detectMultiScale(gray, 1.05, 6)     for (x,y,w,h) in faces:         cv2.rectangle(frame, (x, y), (x+w, y+h), (255,0,0), 2)         roi_gray = gray[y:y+h, x:x+w]         roi_color = frame[y:y+h, x:x+w]      cv2.imshow(camera.name, frame)     cv2.startWindowThread()     cv2.namedWindow(camera.name, cv2.WINDOW_NORMAL)      # q to quit     if cv2.waitKey(1) & 0xFF == ord('q'):         break 

As may be evident here, I am already trying to modularise this as much as possible by using classes for the camera object in OpenCV (camera.view) as well as assigning the class some other parameters (such as camera.width and camera.height). However, the fact that OpenCV for videos must run the algorithm in a while loop for every frame makes me feel extremely limited. For example, if the code is to include also eye detection, it would look something like this:

project_dir = dirname(dirname(__file__)) face_cascade_path = join(project_dir, "haarcascades/haarcascade_frontalface_default.xml") eye_cascade_path = join(project_dir, "haarcascades/haarcascade_eye.xml")  face_cascade = cv2.CascadeClassifier(face_cascade_path) eye_cascade = cv2.CascadeClassifier(eye_cascade_path)  while camera.view.isOpened():     ret_val, frame = camera.view.read()     frame = cv2.resize(frame, (camera.width, camera.height))       gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)     faces = face_cascade.detectMultiScale(gray, 1.05, 6)     for (x,y,w,h) in faces:         cv2.rectangle(frame, (x, y), (x+w, y+h), (255,0,0), 2)         roi_gray = gray[y:y+h, x:x+w]         roi_color = frame[y:y+h, x:x+w]          eyes = eye_cascade.detectMultiScale(roi_gray)         for (ex, ey, ew, eh) in eyes:             cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0,255, 0), 2) 

It seems to me that if I further apply analysis for other objects or any other transformations with the frame, the code will get boggier and harder to use. Is there any way to modularise this, to separate the code and make it more maintainable?