OpenCV  4.5.0
Open Source Computer Vision
How to run deep networks in browser

Prev Tutorial: YOLO DNNs

Next Tutorial: Custom deep learning layers support

Introduction

This tutorial will show us how to run deep learning models using OpenCV.js right in a browser. Tutorial refers a sample of face detection and face recognition models pipeline.

Face detection

Face detection network gets BGR image as input and produces set of bounding boxes that might contain faces. All that we need is just select the boxes with a strong confidence.

Face recognition

Network is called OpenFace (project https://github.com/cmusatyalab/openface). Face recognition model receives RGB face image of size 96x96. Then it returns 128-dimensional unit vector that represents input face as a point on the unit multidimensional sphere. So difference between two faces is an angle between two output vectors.

Sample

All the sample is an HTML page that has JavaScript code to use OpenCV.js functionality. You may see an insertion of this page below. Press Start button to begin a demo. Press Add a person to name a person that is recognized as an unknown one. Next we'll discuss main parts of the code.

  1. Run face detection network to detect faces on input image.
    function detectFaces(img) {
    var blob = cv.blobFromImage(img, 1, {width: 192, height: 144}, [104, 117, 123, 0], false, false);
    netDet.setInput(blob);
    var out = netDet.forward();
    var faces = [];
    for (var i = 0, n = out.data32F.length; i < n; i += 7) {
    var confidence = out.data32F[i + 2];
    var left = out.data32F[i + 3] * img.cols;
    var top = out.data32F[i + 4] * img.rows;
    var right = out.data32F[i + 5] * img.cols;
    var bottom = out.data32F[i + 6] * img.rows;
    left = Math.min(Math.max(0, left), img.cols - 1);
    right = Math.min(Math.max(0, right), img.cols - 1);
    bottom = Math.min(Math.max(0, bottom), img.rows - 1);
    top = Math.min(Math.max(0, top), img.rows - 1);
    if (confidence > 0.5 && left < right && top < bottom) {
    faces.push({x: left, y: top, width: right - left, height: bottom - top})
    }
    }
    blob.delete();
    out.delete();
    return faces;
    };
    You may play with input blob sizes to balance detection quality and efficiency. The bigger input blob the smaller faces may be detected.
  2. Run face recognition network to receive 128-dimensional unit feature vector by input face image.
    function face2vec(face) {
    var blob = cv.blobFromImage(face, 1.0 / 255, {width: 96, height: 96}, [0, 0, 0, 0], true, false)
    netRecogn.setInput(blob);
    var vec = netRecogn.forward();
    blob.delete();
    return vec;
    };
  3. Perform a recognition.
    function recognize(face) {
    var vec = face2vec(face);
    var bestMatchName = 'unknown';
    var bestMatchScore = 0.5; // Actually, the minimum is -1 but we use it as a threshold.
    for (name in persons) {
    var personVec = persons[name];
    var score = vec.dot(personVec);
    if (score > bestMatchScore) {
    bestMatchScore = score;
    bestMatchName = name;
    }
    }
    vec.delete();
    return bestMatchName;
    };
    Match a new feature vector with registered ones. Return a name of the best matched person.
  4. The main loop.
    var isRunning = false;
    const FPS = 30; // Target number of frames processed per second.
    function captureFrame() {
    var begin = Date.now();
    cap.read(frame); // Read a frame from camera
    cv.cvtColor(frame, frameBGR, cv.COLOR_RGBA2BGR);
    var faces = detectFaces(frameBGR);
    faces.forEach(function(rect) {
    cv.rectangle(frame, {x: rect.x, y: rect.y}, {x: rect.x + rect.width, y: rect.y + rect.height}, [0, 255, 0, 255]);
    var face = frameBGR.roi(rect);
    var name = recognize(face);
    cv.putText(frame, name, {x: rect.x, y: rect.y}, cv.FONT_HERSHEY_SIMPLEX, 1.0, [0, 255, 0, 255]);
    });
    cv.imshow(output, frame);
    // Loop this function.
    if (isRunning) {
    var delay = 1000 / FPS - (Date.now() - begin);
    setTimeout(captureFrame, delay);
    }
    };
    A main loop of our application receives a frames from a camera and makes a recognition of an every detected face on the frame. We start this function ones when OpenCV.js was initialized and deep learning models were downloaded.