L-1015e.A1 phyCORE-i.MX 8M Plus AI Kit Guide

Table of Contents

L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head
Document TitleL-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head
Document TypeSoftware Guide
Article NumberL-1015e.Ax
Release DateXXXX/XX/XX
Is Branch of


This guide describes the tools and provides a few key elements to get your phyBOARD-Pollux AI kit started. These include:

  • creating a celebrity look-a-like demo and run inference on an NPU
  • quantize the Deep Learning model to run on an NPU


If you want to build our BSP please continue to read here: Build the Phytec BSP

If you want to know how the celebrity demo is built up, help with our own development, or get inspired, start reading here: Build the Demo from Scratch

If you have your own model and need to quantize it to run inference on the phyBOARD-Pollux, continue reading here: How to Quantize a Model

The main topics and other articles can also be found on Towards Data Science if you prefer reading it there.

All articles can be found on Jan Werths landing page


For further information on the phyBOARD-Pollux, go to  https://www.phytec.de/produkte/system-on-modules/phycore-imx-8m-plus/#downloads/. There you will find the following documentation:

  • QS Guide: A short guide on how to set up and boot a phyCORE board along with brief information on building a BSP, the device tree, and accessing peripherals.
  • Hardware Manual:  A detailed description of the System on Module and accompanying carrier board. 
  • Yocto Guide:  A comprehensive guide for the Yocto version the phyCORE uses. This guide contains an overview of Yocto; introducing, installing, and customizing the PHYTEC BSP; how to work with programs like Poky and Bitbake; and much more.
  • BSP Manual:  A manual specific to the BSP version of the phyCORE. Information such as how to build the BSP, booting, updating software, device tree, and accessing peripherals can be found here.
  • Development Environment Guide:  This guide shows how to work with the Virtual Machine (VM) Host PHYTEC has developed and prepared to run various Development Environments. There are detailed step-by-step instructions for Eclipse and Qt Creator, which are included in the VM. There are instructions for running demo projects for these programs on a phyCORE product as well. Information on how to build a Linux host PC yourself is also a part of this guide.
  • Pinout Table:  phyCORE SOMs have an accompanying pin table (in Excel format). This table will show the complete default signal path, from processor to carrier board. The default device tree muxing option will also be included. This gives a developer all the information needed in one location to make muxing changes and design options when developing a specialized carrier board or adapting a PHYTEC phyCORE SOM to an application. 

On top of these standard manuals and guides, PHYTEC will also provide Product Change Notifications, Application Notes, and Technical Notes. These will be done on a case-by-case basis. 

For more information or details regarding the phyCORE-i.MX 8M Plus / phyBOARD-Pollux, please go to our phyCORE i.MX 8M Plus Product page or contact the PHYTEC Sales department.

phyBOARD-Pollux Quickstart

For instructions on how to connect, boot up, and begin the demo on your kit, head to: L-1016e.A0 phyCORE-i.MX 8M Plus AI Kit Quickstart.


Before you begin with this manual, there are a few requirements that must be met.


Your PC will need to be using:


There are a few files that can be downloaded to help with running the phyBOARD-Pollux AI kit:

  • The environment file to clone the environment can be found here (latest: TF2.3envfile.yml).
  • You can find the model and installation instructions here.
  • The demo code, as described here, can be found here.
  • The demo as preinstalled on the device, running optimized in a GUI, can be found here.

Building the PHYTEC BSP

The BSP shipped with the phyCORE-i.MX 8M Plus AI Kit is based on the standard PHYTEC BSP of the phyCORE-i.MX 8M Plus. This means building the BSP with the help of our phyLinux script is quite similar to the standard BSP.


This kit is provided with a special SD-Card image, which might differ from our general purpose evaluation kit SD-Card Image based on the standard PHYTEC BSP. So if you want to use any of the other manuals provided for this kit, you might need to download the standard phytec image in order to ensure the functions described in those other manuals are available.

For the documentation of our phyLinux script, please check our PHYTEC Yocto Manual. (COMING SOON).


  • Create a fresh project directory:
host$ mkdir ~/yocto
  • Download and run the phyLinux script on the manifest file:
host$ cd ~/yocto
host$ wget https://download.phytec.de/Software/Linux/Yocto/Tools/phyLinux
host$ chmod +x phyLinux
host$ MACHINE=phyboard-pollux-imx8mp-1 DISTRO=yogurt-vendor-xwayland ./phyLinux init -p topic -r PD-BSP-Yocto-CelebrityFaceMatch-i.MX8MP-v0.2

Start the Build

After you downloaded all the metadata with phyLinux init, you have to set up the shell environment variables. This needs to be done every time you open a new shell for starting builds. We use the shell script provided by Poky in its default configuration. From the root of your project directory type:

host$ source sources/poky/oe-init-build-env

The abbreviation for the source command is a single dot.

host$ . sources/poky/oe-init-build-env

The current working directory of the shell should change to build/. Before building for the first time, you should take a look at the main configuration file:

host$ vim conf/local.conf

Your local modifications for the current build are stored here. For the phyCORE-i.MX 8M Plus AI Kit you need to accept the GPU and VPU binary license agreements for Freescale/NXP processors. To do so you have to uncomment the corresponding line.

# Uncomment to accept NXP EULA                                                   
# EULA can be found under ../sources/meta-freescale/EULA                         

Now you are ready to build the image.

host$ bitbake phytec-facematch-image

The first compile process takes about 40 minutes on a modern Intel Core i7. All subsequent builds will use the filled caches and should take about 3 minutes.

For more information and documentation of the BSP and our Yocto Distribution please take a look at the following two manuals:

  • Yocto Manual (COMING SOON)
  • i.MX 8MP BSP Manual (COMING SOON)

The Celebrity Face Match Demo

Celebrity Face Match Demo Visual

The idea of the demo is to find images of celebrities who look similar to yourself based on facial features. If you look into the block diagram below, you will see the three different blocks the demo is made of:

  • Preparation
  • Live Stream
  • After-button Press

Block Diagram

Each block has a different function. The Preparation block is needed to set up the demo. The Live Stream and After-button Press are part of the running demo.

For more general information on the process we used, please check the section Further Reading.


Preparation Flow

As shown in the block diagram above, we are using a pre-trained network. We are using a pre-trained network as the task of facial recognition has been very well accomplished by different research groups. We used the network from Refik Can Malli (rcmalli), which was originally trained by Q. Cao on the FaceVGG2 dataset. As rcmalli's model was written with TensorFlow 1.14.0 and Keras 2.2.4, we updated it to TensorFlow version 2.2.0. You can find the updated model here. However, we are still using the weights from the original model.

What are Embeddings

To identify a human face, we need to identify specific facial features such as the length of the nose, the distance between the eyes, the angle between nose and mouth, etc.

People have perfected recognizing these features and can identify subconsciously millions of them. However, to program that by hand would be impossible. Therefore, we are using rcmallis ResNet50 network to find a good representation of those facial features and their combinations, which can be projected into a lower-dimensional space. This lower-dimensional space which incorporates the high-dimensional space of the facial feature information is called embedding.

More information about embedding can be found here.

Creating Embeddings

The pre-trained network we are using gives us output values for 8631 classes as it was trained on this amount of classes.

ResNet50 Network Output

If we run this model on an image of ours now, we could get the predictions that fit best from the 8631 classes we have. However, our goal here is to find an image that looks most alike to a specific image of a celebrity. To do this, we need to use a truncated network. The idea is that the network learned how a face is composed using over 3.3 million faces. Generally, each layer of a network obtains more detailed information.

Below, you see the input image we used with our network and the [1,10,50,101,150,170] layer output. The output is organized in blocks of 3x3 images per layer, showing different filter/neuron outputs per image.

Input Image to rcmallis ResNet50

[1,10,50,101,150,170] layer output as a block of six images, each block showing the 1st to 6th filter/neuron output

As you can see, the information gets more detailed the deeper you go into the network. The following layer inputs are composed of combinations of the previous layer outputs. in the last layers (170 of 176), you can see the information reached almost pixel-level details.

When cutting the network, we remove the last part of the network which is responsible for the final prediction. We do not want a prediction, however, but low-dimensional information about a face. The new output of the model now has 2048 outputs instead of 8631 classes. These outputs are our embedding. For each image of a new face we put into the network, we get 2048 values describing the face.

With this truncated network, we can now create a library of embeddings of celebrity faces or faces of our known employees and compare them to new faces later.


You can install the rcmalli model as described on their git. It will work the same way as the following method, however, you would have to work with a TensorFlow version < 1.15.3 and install Keras v2.2.4. We will continue with our updated version.

Install the updated version using:

pip install git+https://github.com/JanderHungrige/tf.keras-vggface

The model and pre-processing libraries can be imported and loaded with:

1  from keras_vggface_TF.vggfaceTF import VGGFace
2  from keras_vggface_TF import utils
3  pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max

Gist File

Quantize your Model to int8


If you are not planning to run your model on an embedded device, you can proceed to section Create a Database.

More details on quantization can be found in the section Quantize Your Deep Learning Model to Run on an NPU.

For our model, we are using the NPU of NXP's i.MX 8M Plus. The NXP NPU requires the model to be a TFlite or PyTorch model. Either must be fully quantized to int8.

As we created a TensorFlow model, we have to make a TensorFlow Lite model, while at the same time do a full int8 quantization. If you load the model with the right weights you can directly quantize the model as follows:

1   from keras_vggface_TF.vggfaceTF import VGGFace
2   from keras_vggface_TF.utils import preprocess_input
3   import tensorflow as tf
4   import numpy as np
6   tfVersion=tf.version.VERSION.replace(".", "")# can be used as savename
7   print(tf.version.VERSION)
9   pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')
11   converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model)
13   folderpath='./All_croped_images/'
15   def prepare(img):
16   img = np.expand_dims(img,0).astype(np.float32)
17   img = preprocess_input(img, version=2)
18   return img
20   repDatagen=tf.keras.preprocessing.image.ImageDataGenerator(preprocessing_function=prepare)
21   datagen=repDatagen.flow_from_directory(folderpath,target_size=(224,224),batch_size=1)
23   def representative_dataset_gen():
24   for _ in range(10):
25   img = datagen.next()
26   yield [img[0]]
28   converter.optimizations = [tf.lite.Optimize.DEFAULT]
29   converter.representative_dataset = representative_dataset_gen
30   converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
31   converter.experimental_new_converter = True
33   converter.target_spec.supported_types = [tf.int8]
34   quantized_tflite_model = converter.convert()
36   open('quant_model.tflite' , "wb").write(quantized_tflite_model)

Post Training Quant

In line 11, we set the converter to use a tf.keras model for conversion. Then in Line 15-26, we create a generator function to load images from our data set (which we create below). The generator is needed in line 29 to tell the converter how the image will look. This is used to reduce the error introduced during quantization by calibrating the model to the (min, max) values. In lines 28 to 34, we set the quantization to int8 and convert the model.

As said above, we need a representative data set, which we need to create our embedding for later comparison.

For more information on details on quantization, using different TensorFlow versions, or the ins and outs to be quantized, go to Quantize Your Deep Learning Model to Run on an NPU.

Create a Database

Do you need license free images?

If you are looking to use this commercially, please read this article to learn how to tune a license-free dataset to perform better.

For the database, we need to correct around 10k images for the top 1k celebrities from IMDb. You could collect any type of image for facial comparison purposes, for example, employee images. A simple Google scraper can be used for the database. First, get a .csv file with the top celebrity names (https://www.imdb.com/list/ls058011111/) and export the list via the export option on the site. Then you can run your Google or Bing crawler.

1   import pandas as pd
2   from icrawler.builtin import GoogleImageCrawler
3   from icrawler.builtin import BingImageCrawler
4   from pathlib import Path
6   mitFilter=True
7   #Set the filter to creative vommons license and set if th eimage is either photo, face, clipart, linedrawing, or animated
8   filters = dict(
9       type='photo',
10      license='commercial,modify',
11  howmany= 10
12  names=pd.read_csv('Top 1000 Actors and Actresses.csv', encoding = "ISO-8859-1")
14  subset=names.Name
16  for keyword in subset:
17      crawler = BingImageCrawler(
18          parser_threads=5,
19          downloader_threads=5,
20          storage={'root_dir': 'Celebs/{}'.format(keyword)}
21      ) 
22      if mitFilter==True:
23          crawler.crawl(keyword=keyword, filters=filters, max_num=howmany, min_size=(500, 500))
24      else:
25          crawler.crawl(keyword=keyword, max_num=howmany, min_size=(500, 500))

Image Crawler with Bing


Be aware that you should only scrape images labeled for reuse or free of any license if you are planning a commercial use. Use the filter setting:

license='commercial, modify'

If you want to make sure you have a good, license-free dataset, go to How to Create and Tune Your Own, Data Set for Facial Recognition Using Neural Networks.

Now we have 10 images per celebrity equaling 10k images.

Prepare the Dataset and Create an "Only Faces Dataset"

The model expects 224x224 sized images. We are also looking for facial embedding, so we first extract the faces from our dataset and resize them to 224x224. You can use any facial detection algorithm. A good example is MTCNN. As we will later use openCV for facial detection, we will also use an OpenCV variant with a haarcascade classifier here. You can find and download the classifier here.

1   import cv2
2   from pathlib import Path
3   import os
5   #Allocate face detection classifier
6   face_cascade = cv2.CascadeClassifier('./haarcascade_frontalface_alt.xml')
8   #set file path
9   path = Path.cwd() / 'Celebs/' 
10  savefolder = Path(Path.cwd() / 'All_croped_images/')
11  savefolder.mkdir(parents=True, exist_ok=True)
13  #set variables
14  p = 50 #Buffer for space around detected face to croping
15  width = 224
16  height = width
18  Folderlist = next(os.walk(path))[1] #get all folder names
19  #print(Folderlist)
21  for celeb in Folderlist: # now go throug all folders
22      filelist = next(os.walk(Path(path / celeb)))[2]
23      print(celeb)
24      for f in filelist: #Listing jpg files in this directory tree
25          img = cv2.imread(str(Path(path / celeb / f)), cv2.IMREAD_COLOR) # read each image
26          print(f)
28          #Detect face
29          faces_detected = face_cascade.detectMultiScale(img,scaleFactor=1.1,minNeighbors=4)
30          if len(faces_detected) != 0: # only if the cascader detected a face, otherwise error
31              (x, y, w, h) = faces_detected[0] # coordinates of box around face
32              #create folderstructure with a new folder for each celebrity
33              croppedpath = Path(savefolder / celeb)
34              os.makedirs(croppedpath, exist_ok=True) # differnt way than above to create folders
35              filename = f'{croppedpath}/{f}'
36              #Crop image to face
37              img = img[y - p + 1:y + h + p, x - p + 1:x + w + p] #use only the detected face; crop it
38              if img.shape > (width, height) and img.size is not 0:
39                   img = cv2.resize(img, (width, height), interpolation=cv2.INTER_LINEAR) #resize the image to desired dimensions e.g., 256x256
40                   #Save croped image in folder
41                   cv2.imwrite(filename, img) #save image in folder
42              else:
43                  print('image to small or facebox out of image')
44          else:
45              print('no face detected')

Create Cropped Faces

Now we have a data set with cropped faces with dimensions of 224x224. This dataset can also be used for the representative_dataset() for the quantization method mentioned earlier.

Create Embeddings of Each Image in Your Database

For the final part of the preparation, we need to create embedding for each of the 10k facial images. We can use the quantized model we created earlier so that the embeddings are calculated with the same model we will use later for the live steam analysis.

The following code block creates embeddings for each image in our dataset and saves them with the name and filename in a CSV and JSON file. We only use the CSV later as most embedded systems have pandas not included in their BSP, but the CSV reader is included in python3. using pandas on a JSON file is an easy way to use the embeddings further.

1   from pathlib import Path
2   import os
3   import cv2
4   import pandas as pd
5   import numpy as np
6   from numpy import expand_dims
7   import tensorflow as tf
8   from keras_vggface_TF.utils import preprocess_input
9   import sys
11  model='quantized_model'
12  modelpath='./' #if you have saved the model in a subfolder, add this folder path here
13  modelpath= modelpath + model + '.tflite'
14  PFAD = Path(Path.cwd() / 'All_croped_images/')
15  CelebFolders = next(os.walk(PFAD))[1]
16  EMBEDDINGS = pd.DataFrame()
17  ce=0
18  np.set_printoptions(threshold=sys.maxsize)# is needed to avoid ellipsis
20  # Load TFLite model and allocate tensors.Beide modelle funktionieren
21  #Depending on the version of TF running, check where lite is set :
22  print(tf.__version__)
23  if tf.__version__.startswith ('1.'):
24       print('lite in dir(tf.contrib)' + str('lite' in dir(tf.contrib)))
26  elif tf.__version__.startswith ('2.'):
27       print('lite in dir(tf)? ' + str('lite' in dir(tf)))
29  try: 
30       interpreter = tf.lite.Interpreter(str(modelpath)) # input() # To let the user see the error message
31  except ValueError as e:
32       print("Error: Modelfile could not be found. Check if you are in the correct workdirectory. Errormessage: " + str(e))
33       import sys
34       sys.exit()
36  #prepare the tflite model
37  interpreter.allocate_tensors()
38  # Get input and output tensors.
39  input_details = interpreter.get_input_details()
40  output_details = interpreter.get_output_details()
42  for celeb in CelebFolders:
43      n= 0 #just for printing
44      ce += 1 #just for printing
45      print('-------------')
46      print(str(celeb) + ' ' + str(ce) +' of '+str(len(CelebFolders))+ ' (' +str(ce*100/len(CelebFolders))+'%)')
47      print('')
48      filelist = next(os.walk(Path(PFAD / celeb)))[2] # go through each folder
49      for f in filelist:
50          n += 1 #just for printing
51          img = cv2.imread(str(Path(PFAD / celeb / f)), cv2.IMREAD_COLOR).astype('float32')
52          # Make images the same as they were trained on in the VGGface2 Model
53          # convert one face into samples
54          img = expand_dims(img, axis=0)#part of preprocessing
55          img = preprocess_input(img, version=2)#part of preprocessing
56          interpreter.set_tensor(input_details[0]['index'], img) # allocate tensor
57          interpreter.invoke() 
58          features = np.ravel(interpreter.get_tensor(output_details[0]['index'])) #calculate embeddings
59          #now collect all the embeddings with the filenames and celeb names
60          if EMBEDDINGS.empty:
61              EMBEDDINGS = EMBEDDINGS.append({
62                  'Name': celeb, 
63                  'File': f, 
64                  'Embedding': features
65                           },
66                  ignore_index=True,
67                  sort=False) 
68             Only_embeddings =list([features])
69             Only_name = list([celeb])
70             Only_file = list([f])
71          else:
72             EMBEDDINGS = EMBEDDINGS.append(
73                 {
74                     'Name': celeb,
75                     'File': f,
76                     'Embedding': features
77                 }, 
78                 ignore_index=True,
79                 sort=False)
80             Only_embeddings.append(features)
81             Only_name.append(celeb)
82             Only_file.append(f)
83          if n==1:
84              print('finished ' + str(n) + ' of ' + str(len(filelist)))
85          else:
86              print(' ' + str(n) + ' of ' + str(len(filelist)))
88  filename_csv='EMBEDDINGS_' + model + '.csv'
89  filename_json='EMBEDDINGS_' + model + '.json'
90  EMBEDDINGS.to_csv(Path(Path.cwd() / filename_csv), index=False)
91  EMBEDDINGS.to_json(Path(Path.cwd() / filename_json)

Create Embeddings Database

This completes the preparations. To run the demo, we copy the embeddings, the quantized model, the cascade classifier file, the face images dataset, and the demo code (which we will create later) on the embedded device.

  1. Boot your device.
  2. Connect a monitor, mouse, and keyboard.
  3. Open a console and get your IP address with the Linux command:

    ip a
  4. Use ssh to go to your device and copy data to it.
  5. Go into the folder where your files are stored, then use scp command to copy the files to the device. ip_address is the address you retrieved.

    scp -r user@ip_address:./

    This will copy the entire folder (due to -r) to the device home device.

Live Stream Analysis

Live Stream Analysis Flow

After preparation, we can combine everything and merge the look-alike demo.

This will involve the following steps:

  • Read in your live stream
  • Detect any face in a live stream
  • If a trigger is hit, detect if there is a face detected, and find the most middle face if there is more than one face detected
  • Crop the face and create a 224x224 image
  • Use your model to create embeddings of detected and cropped face
  • Run euclidean distance on the embeddings and the previously saved embeddings
  • Find the minimum distance and return that value, corresponding celebrity image index, and celebrity name
  • Plot everything

First, check if you are on an x86 (PC) setup or an ARM system (embedded device) to set some variables like data paths or video devices accordingly.

1  # Check System Architecture
2  #-----------------------------------------------------------------------------
3  import subprocess
4  Architekture=((subprocess.check_output("lscpu | grep Architecture ", shell=True).strip()).decode())
5  if "aarch46" in Architekture:
6      Runningsystem='ARM'
7  elif 'x86_64' or 'x86_32' in Architekture:
8      Runningsystem='PC'

Check Architecture

Before we analyze the live stream, there is some setup:

  1. Load the embeddings from JSON
  2. Load the model
  3. Load the cascade classifier
  4. Define the pre-processing function
  5. Split your data
  6. Set the video pipeline

Load the Embeddings from JSON

We use pandas to read the CSV, however, pandas are more difficult to implement via Yocto Linux on the embedded system. The JSON library is already implemented:

1  import json 
2  f = open((embeddingpath + embeddingsfile),'r') 
3  ImportedData =json.load(f)
4  dataE=[np.array(ImportedData['Embedding'][str(i)]) for i in range(len(ImportedData['Name']))]
5  dataN=[np.array(ImportedData['Name'][str(i)]) for i in range(len(ImportedData['Name']))]
6  dataF=[np.array(ImportedData['File'][str(i)]) for i in range(len(ImportedData['Name']))]


Load the Model

1  try: 
2      interpreter = tf.lite.Interpreter(model_path) 
3  except ValueError as e:
4      print("Error: Modelfile could not be found. Check if you are in the correct workdirectory. Errormessage: " + str(e))
5      #Depending on the version of TF running, check where lite is set :
6      if tf.__version__.startswith ('1.'):
7          print('lite in dir(tf.contrib)' + str('lite' in dir(tf.contrib)))
8      elif tf.__version__.startswith ('2.'):
9          print('lite in dir(tf)? ' + str('lite' in dir(tf)))
11  interpreter.allocate_tensors()
13  # Get input and output tensors.
14  input_details = interpreter.get_input_details()
15  output_details = interpreter.get_output_details() 

Load the Model

Point to the Cascade Classifier

1  #face_cascade = cv2.CascadeClassifier(cascaderpath + 'haarcascade_frontalface_alt.xml')
2  face_cascade = cv2.CascadeClassifier(cascaderpath + 'lbpcascade_frontalface_improved.xml')


We use the local binary pattern (LBP) classifier. If you stay with OpenCV, you can choose the haar-classifier which is better in performance but demands more resources. For an embedded device, we recommend the LBP version. However, optimizing the haar-cascader can also run smoothly on our system.

Set Pre-Processing Function

The pre-processing function that was used by rcmalli used a center pixel-mean algorithm. On your PC, you can import this function. However, if you are planning to run this on an embedded device, we recommend writing the function out so that there is less trouble including the function in your board support package (BSP).

The function simply subtracts the mean of the training data from the new input image.

1  def preprocess_input(x, data_format): #Choose version same as in " 2-Create embeddings database.py or jupyter"
2      x_temp = np.copy(x)
3      if data_format is None:
4          data_format = tf.keras.backend.image_data_format()
5      assert data_format in {'channels_last', 'channels_first'}
7      if data_format == 'channels_first':
8         x_temp = x_temp[:, ::-1, ...]
9         x_temp[:, 0, :, :] -= 91.4953
10        x_temp[:, 1, :, :] -= 103.8827
11        x_temp[:, 2, :, :] -= 131.0912
12     else:
13        x_temp = x_temp[..., ::-1]
14        x_temp[..., 0] -= 91.4953
15        x_temp[..., 1] -= 103.8827
16        x_temp[..., 2] -= 131.0912
18     return x_temp


Split Your Data

To split up the comparison of our embeddings vs the celebrity embeddings, we can split the data into x chunks. The embeddings from those threads have to be collected.

1  def splitDataFrameIntoSmaller(df, chunkSize):
2      listOfDf = list()
3      numberChunks = len(df) // chunkSize + 1
4      for i in range(numberChunks):
5          listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
6      return listOfDf
8  def faceembedding(YourFace,CelebDaten):
9      Dist=[]
10     for i in range(len(CelebDaten.File)):
11         Celebs=np.array(CelebDaten.Embedding[i]) 
12         Dist.append(np.linalg.norm(YourFace-Celebs))
13     return Dist
15  def faceembeddingNP(YourFace,CelebDaten):
16      Dist=[]
17      for i in range(len(CelebDaten)):
18          Celebs=np.array(CelebDaten[i]) 
19          Dist.append(np.linalg.norm(YourFace-Celebs))
20      return Dist
22  # Split data for threadding
23  #-----------------------------------------------------------------------------
24  celeb_embeddings=splitDataFrameIntoSmaller(dataE, int(np.ceil(len(dataE)/4))) 

Split Embeddings

Setting the Video Pipeline

If you are not on an embedded system, you can read in your webcam stream via OpenCV.

1  cap= cv2.VideoCapture(0)
3  if not cap.isOpened():
4      print('Error: VideoCapture not opened')
5      sys.exit(0)

Read Webcam

On an ARM system using a MIPI camera, as included on the phyBOARD-Pollux, you have to read the stream via OpenCV and a Gstreamer pipeline, which converts the Bayer image to RGB and sets the size of the image.

We check the Gstreamer support, set the camera state, and create a pipeline:

1    videodev = 'video0'
2    buildinfo = cv2.getBuildInformation() 
3    if buildinfo.find("GStreamer") < 0: 
4        print('no GStreamer support in OpenCV') 
5        exit(0) 
7    path = os.path.join('/sys/bus/i2c/devices', '2-0010', 'driver')
8    if not os.path.exists(path):
9        return None
11   cmd = 'media-ctl -V "31:0[fmt:SGRBG8_1X8/1280x800 (4,4)/1280x800]"' 
12   ret = subprocess.call(cmd, shell=True)
13   cmd = 'media-ctl -V "22:0[fmt:SGRBG8_1X8/1280x800]"' 
14   ret = subprocess.call(cmd, shell=True)
15   cmd = 'v4l2-ctl -d0 -v width=1280,height=800,pixelformat=GRBG' #set size and format
16   ret = subprocess.call(cmd, shell=True)
17   cmd = 'v4l2-ctl -d0 -c vertical_flip=1' #If the image is flipped.
18   ret = subprocess.call(cmd, shell=True)
19   cmd = 'v4l2-ctl -c horizontal_blanking=2500'
20   ret = subprocess.call(cmd, shell=True)
21   cmd = 'v4l2-ctl -c digital_gain_red=1400' #If needed color corrections
22   ret = subprocess.call(cmd, shell=True)
23   cmd = 'v4l2-ctl -c digital_gain_blue=1700' #If needed color corrections
24   ret = subprocess.call(cmd, shell=True)
26   pipeline = 'v4l2src device=/dev/{video} ! appsink'.format(video=videodev)

Gist File1

Start the Live Stream

Call the Video Pipeline

We can now call the video pipeline with OpenCV and have a constant video stream. As the video stream from the MIPI camera is in the Bayer format, it has to be converted to RGB. You can also use the Gstreamer pipeline. However, we have seen the OpenCV is much faster.

1  cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
3  while(True):
5      ret, frame = cap.read() 
6      frame=cv2.cvtColor(frame, cv2.COLOR_BAYER_GB2RGB)
7      cv2.namedWindow('frame', cv2.WND_PROP_FULLSCREEN)
8      cv2.setWindowProperty('frame', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
9      cv2.imshow('frame', frame)

Get Live Stream

Find Faces in the Live Stream

Each frame is then analyzed for faces:

2      faces_detected = face_cascade.detectMultiScale(frame, scaleFactor=1.2, minNeighbors=5)#, Size(50,50))
3      for (x,y,w,h) in faces_detected:
4           rechteck=cv2.rectangle(frame, (x-p, y-p+2), (x+w+p, y+h+p+2), (0, 255, 0), 2) 
5           #rechteck=cv2.rectangle(frame, (x-p, y-p+2), (x+int(np.ceil(height))+p, y+int(np.ceil(height))+p+2), (0, 0, 100), 2) 
6           cv2.imshow('frame', rechteck) 

Analyzed Faces

After-button Press

Finding and Cropping Middle Faces

As soon as a button is pressed, we start to find the most middle face and use it by cropping it to 224x224:

2       key = cv2.waitKey(1)
3       if key == 27: #Esc key
4            cap.release()
5            cv2.destroyAllWindows()
6            break
7       if key ==32: 
8          mittleres_Gesicht_X=()
9          mitte=()
10         if len(faces_detected) !=0: # only if the cascader detected a face, otherwise error
11            start1 = time()
13            for (x,y,w,h) in faces_detected:
14                mitte=np.append(mitte,(x+w/2)) 
15            mittleres_Gesicht_X = (np.abs(mitte - framemitte)).argmin()
16            print('detect middel face ' ,time()-start1)
18            start2=time()
19            #print(faces_detected[mittleres_Gesicht_X])
20            (x, y, w, h) = faces_detected[mittleres_Gesicht_X]
21            img=frame[y-p+2:y+h+p-2, x-p+2:x+w+p-2] #use only the detected face; crop it +2 to remove frame # CHECK IF IMAGE EMPTY (OUT OF IMAGE = EMPTY) 
23            if len(img) != 0: # Check if face is out of the frame, then img=[], throwing error
24                print('detect face ',time()-start2)
26   # CROP IMAGE 
27                start3=time()
28                if img.shape > (width,height): #downsampling
29                    img_small=cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA) #resize the image to desired dimensions e.g., 256x256 
30                elif img.shape < (width,height): #upsampling
31                    img_small=cv2.resize(img, (width, height), interpolation=cv2.INTER_CUBIC) #resize the image to desired dimensions e.g., 256x256 
32                cv2.imshow('frame',img_small)
33                cv2.waitKey(1) #hit any key
34                end3=time()
35                print('face crop', end3-start3)

Face Crop

Further Pre-Processing Found Face

After we crop the face, we have to do the same preprocessing as done on the training data and then feed it to the model to create the embedding:

2                   start4=time()
3                   if inputtype=='int':
4                       samples = np.expand_dims(img_small, axis=0)
5                       samples = preprocess_input(samples, data_format=None, version=3).astype('int8')#data_format= None, 'channels_last', 'channels_first' . If None, it is determined automatically from the backend
6                   else:
7                       pixels = img_small.astype('float32')
8                       samples = np.expand_dims( pixels, axis=0)
9                       samples = preprocess_input(samples, data_format=None, version=2)#data_format= None, 'channels_last', 'channels_first' . If None, it is determined automatically from the backend
10                  #now using the tflight model
11                  print('preprocess data for model' , time()-start4)
13                  if Loadtype=='armNN':
14                      prep=time()
15                      input_tensors = ann.make_input_tensors([input_binding_info], [samples])
16                      # Get output binding information for an output layer by using the layer name.
17                      output_binding_info = parser.GetNetworkOutputBindingInfo(0, 'model/output')
18                      output_tensors = ann.make_output_tensors([output_binding_info])
19                      runtime.EnqueueWorkload(0, input_tensors, output_tensors)
20                      print('ANN preperation ',time()-prep)
21                      start42=time()
22                      EMBEDDINGS=ann.workload_tensors_to_ndarray(output_tensors)
23                  elif Loadtype=='TL':
24                      prep=time()
25                      input_shape = input_details[0]['shape']
26                      input_data = samples
27                      interpreter.set_tensor(input_details[0]['index'], input_data)
28                      interpreter.invoke()
29                      print('ANN preperation ',time()-prep)
30                      start42=time()
31                      EMBEDDINGS = interpreter.get_tensor(output_details[0]['index'])
32                  print('create face embeddings' , time()-start42)

Preprocess 2

Compare Embeddings

With the resulting embeddings, we can now compare them to existing embeddings. We do this using threading. This uses all four cores of our device and speeds up the process. This is one of the most demanding tasks in this demo.

2                   start_EU=time()
3                   EuDist=[]
4                   with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
5                       ergebniss_1=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[0]))
6                       ergebniss_2=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[1]))
7                       ergebniss_3=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[2]))
8                       ergebniss_4=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[3]))
10                  if ergebniss_1.done() & ergebniss_2.done() & ergebniss_3.done() & ergebniss_4.done():
11                      EuDist.extend(ergebniss_1.result())
12                      EuDist.extend(ergebniss_2.result())
13                      EuDist.extend(ergebniss_3.result())
14                      EuDist.extend(ergebniss_4.result())
15                  print('Create_EuDist', time()-start_EU)
17                  start_Min=time()
18                  idx = np.argpartition(EuDist, 5) 
19                  folder_idx= dataN[idx[0]]
20                  image_idx = dataF[idx[0]] 
21                  print('find minimum for facematch', time()-start_Min)

Get Minimum

Plot Results

Finally, we stitch our face together with the best matches and plot it. You can also implement a GUI here as well. For simplicity and better understanding, we have done a more "raw" version:

2                    start6=time()
3                    path=Path.cwd()
5                    if Gesichter == False:
6                        pfad=str(Path.cwd() / 'Data/sizeceleb_224_224' / str(folder_idx) / str(image_idx))
7                    elif Gesichter == True:
8                        pfad=str(Path.cwd() / 'Data/Celebs_faces' / str(folder_idx) / str(image_idx)) 
10                   Beleb=cv2.imread(pfad)
11                   if np.shape(Beleb) != (width,height): 
12                       Beleb=cv2.resize(Beleb, (np.shape(img_small)[0] ,np.shape(img_small)[1]), interpolation=cv2.INTER_AREA)
14                   if largeImg==True:
15                       larg=time()
16                       img_small2=cv2.resize(img_small, ImgSize, interpolation=cv2.INTER_LINEAR)
17                       Beleb2=cv2.resize(Beleb, (np.shape(img_small2)[0] ,np.shape(img_small2)[1]), interpolation=cv2.INTER_LINEAR)
18                       print('images upscaled ',time()-larg)
19                       numpy_horizontal2 = np.hstack((img_small2, Beleb2))
22                   numpy_horizontal = np.hstack((img_small, Beleb))
23                   cv2.namedWindow('ItsYou',cv2.WND_PROP_FULLSCREEN)
24                   cv2.setWindowProperty('ItsYou', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
25                   #Text=str(dataN[np.argmin(EuDist)])+ ' EuDist: ' + str(np.argmin(EuDist))
26                   Text=str(dataN[idx[0].round(2)])+ ' EuDist: ' + str(EuDist[idx[0]])
28                   # FONT_HERSHEY_SIMPLEX = 0, //!< normal size sans-serif font
29                   # FONT_HERSHEY_PLAIN = 1, //!< small size sans-serif font
30                   # FONT_HERSHEY_DUPLEX = 2, //!< normal size sans-serif font (more complex than FONT_HERSHEY_SIMPLEX)
31                   # FONT_HERSHEY_COMPLEX = 3, //!< normal size serif font
32                   # FONT_HERSHEY_TRIPLEX = 4, //!< normal size serif font (more complex than FONT_HERSHEY_COMPLEX)
33                   font = 2 
34                   org = (5,17) 
35                   fontScale = 0.5
36                   # Blue color in BGR 
37                   # color = (116, 161, 142) #orig Demo
38                   color = (0, 0, 1)
39                   thickness = 1
40                   numpy_horizontal = cv2.putText(numpy_horizontal, Text, org, font, fontScale, color, thickness, cv2.LINE_AA) 
42                   if largeImg==True:
43                       cv2.imshow('ItsYou', numpy_horizontal2)
44                   # else:
45                   cv2.imshow('ItsYou', numpy_horizontal) 
46                   print('print found image', time()-start6)
47                   print('-------------------------------------')
48                   print('time after keypress',time()-start1)
49                   #print('totaltime ', time()-start1)
50                   print('-------------------------------------') 
51                   print('Distance value: ', EuDist[idx[0]].round(2), ' | ' , 'Name: ', str(dataN[idx[0]]),' | ' ,' Filename: ', str(dataF[idx[0]]))
52                   print('Top five celeb images: ')
53                   for i in range(5):
54                       print(dataN[idx[i]], 'Values: ',EuDist[idx[i]].round(2))

Plot Results

This finishes the demo description. On the platform itself, this demo is implemented with a GUI and the use of object-oriented programming. However, the basics are exactly the same.

Porting to Embedded Hardware

If you use a phyBOARD-Pollux kit, the needed software from NXP is already included in the BSP. NXP created eIQ, which facilitates the connection between the onboard NPU and the peripheral components. This is done in core with a tuned google NNAPI, which is capable of understanding TensorFlow lite models and Pytorch models. Therefore, after converting your model to TFlite and quantizing it, eIQ takes over.
If you have created your own AI model, you only have to copy the model and otherwise required files of your application onto the board. Here we suggest using the ssh protocol.

If you want to include your AI application or specific libraries into our BSP using Yocto Linux, that is of course also possible. 

Quantize Your Deep Learning Model to Run on an NPU

In this section, we explain which steps you have to take to transform and quantize your model with different TensorFlow versions. We are only looking into post-training quantization. The phyBOARD-Pollux incorporates i.MX 8M Plus, which features a dedicated neural network accelerator IP from VeriSilicon (Vivante VIP8000).

NXP i.MX 8M Plus Block Diagram

As the neural processing unit (NPU) from NXP needs a fully int8 quantized model, we have to look into the full int8 quantization of a TensorFlow lite or PyTorch model. Both libraries are supported by the eIQ library from NXP. This manual only works with the TensorFlow variant. The general overview of how to do the post-training quantization can be found on the TensorFlow website.

Why does the NPU utilize int8 when most ANNs are trained in float32?

The operations for floating-point are more complex than for integers (arithmetic, avoiding overflow, etc.). This results in the ability to use only the much simpler and smaller arithmetic units instead of the larger floating-point units.

The physical space needed for float32 operation is much larger than for int8. This results in:

  • Lower power consumption
  • Less heat development
  • The ability to join more calculation units decreases inference time

Post-training Quantization with TensorFlow Version 2.x


Before you begin, make sure that you have met all of the Prerequisites needed.

After you have created and trained a model via tf.keras, there are three possible ways of quantizing the model.

Method One - Directly Quantizing a Trained Model

The trained TensorFlow model has to be converted into a TFlite model and can be directly quantized as described in the following code block. For the trained model, we explicitly used the updated tf.keras_vggface model based on the work or rcmalli. The transformation starts at line 28.

1   from keras_vggface_TF.vggfaceTF import VGGFace
2   from keras_vggface_TF.utils import preprocess_input
3   import tensorflow as tf
4   import numpy as np
5   tfVersion=tf.version.VERSION.replace(".", "")# can be used as savename
6   print(tf.version.VERSION)
9   pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max
12  folderpath='./All_croped_images/'
14  def prepare(img):
15  img = np.expand_dims(img,0).astype(np.float32)
16  img = preprocess_input(img, version=2)
17  return img
19  repDatagen=tf.keras.preprocessing.image.ImageDataGenerator(preprocessing_function=prepare)
20  datagen=repDatagen.flow_from_directory(folderpath,target_size=(224,224),batch_size=1)
22  def representative_dataset_gen():
23  for _ in range(10):
24  img = datagen.next()
25  yield [img[0]]
28  converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model)
30  converter.optimizations = [tf.lite.Optimize.DEFAULT]
31  converter.representative_dataset = representative_dataset_gen
32  converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
33  converter.experimental_new_converter = True
35  converter.target_spec.supported_types = [tf.int8]
36  converter.inference_input_type = tf.int8 
37  converter.inference_output_type = tf.int8 
38  quantized_tflite_model = converter.convert()
41  open('quant_model.tflite' , "wb").write(quantized_tflite_model)

Post Training Quant 2

After loading/training your model, you first have to create a representative data set. The representative data set is used by the converter to get the min and max values to e able to estimate the scaling factor. This limits the error introduced by the quantization from float32 to intX. The error comes from the different number-space to integer values between -128 and 127. Calibrating the model on the dynamic range of the input limits this error.

Here you can simply loop through your images or create a generator as in our example. We used the tf.keras.preprocessing.image.ImageDataGenerator() to yield images and do the necessary prepossessing on the images. As a generator, you can of course also use the tf.data.Dataset.from_tensors() or ...from.tensor_slices(). Just keep in mind to do the same pre-processing on your data here as you did on the data you trained your network with (normalization, resizing, de-noising, etc.) This can all be packed into the preprocessing_function call of the generator (line19).

The conversion starts at line 28. A simple TensorFlow lite conversion would look like this:

1   converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model)
2   tflite_model = converter.convert()
4   open('model.tflite' , "wb").write(tflite_model)

Convert Model to tflite

The quantization part fits between line 1 and line 4:

1   converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model)
3   converter.optimizations = [tf.lite.Optimize.DEFAULT]
4   converter.representative_dataset = representative_dataset_gen
5   converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
6   converter.experimental_new_converter = True
8   converter.target_spec.supported_types = [tf.int8]
9   converter.inference_input_type = tf.int8 
10  converter.inference_output_type = tf.int8 
11  quantized_tflite_model = converter.convert()
13  open('quant_model.tflite' , "wb").write(quantized_tflite_model)

Conversion with Quantization

  • Line 3:  Optimizations other than the default are deprecated. No other options are available at the moment (as of 2020)
  • Line 4:  Here we set the representative data set
  • Line 5:  Here we make sure that we have a full conversion to int8. Without this option, only weights and biases would be converted but not the activations. This is used when we only want to reduce the model size. However, our NPU needs full int8 quantization. Having the activations still in floating-point would result in an overall floating-point and could not run on the NPU.
  • Line 6:  Enables MLIR-based conversion instead of TOGO conversion, which enables RNN support, easier error tracking, and more
  • Line 7:  Sets the internal constant value to int8. The target_spec corresponds with the TFLITE_BUILTINS from line 5
  • Line 9 and 10:  Also set the input to int8. This is fully available from TF2.3

If we now convert the model using TF2.3 with:

  • experimental_new_converter=True
  • inference_input_type=tf.int8
  • inference_output_type=tf.int8

we receive the following model:

TF2.3 Converted Model

However, if we do not set the inference_input_type and inference_output_type, our model changes to:

TF2.3 Converted Model without Set Inference

The effect is that you can determine which input data type the model accepts and returns. This can be important if you work with an embedded camera, such as the one included with your phyBOARD-Pollux AI kit. The MIPI camera returns 8bit values. So if you want to spare a conversion to float32, int8 input can be useful. Be aware, if you use a model without prediction layers to gain embeddings, an int8 output will result in very poor performance. We recommend an output of float32. This shows that each problem needs a specific solution.

Method Two and Three - Quantize a Saved Model from *.h5 or *.pb Files

If you already have your model, you most likely have it saved somewhere either as a Keras .h5 file or a TensorFlow protocol buffer .pb. We can quickly save our model using TF2.3:

1   import tensorflow as tf
2   from keras_vggface_TF.vggfaceTF import VGGFace
3   pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max
5   # Saving as a protocol buffer
6   !mkdir -p saved_model
7   pretrained_model.save('saved_model/my_model') 
9   #saving as a h5 file (if your Tensorflow Version is < 2.0 use this)
10  !mkdir -p keras_model
11  pretrained_model.save('keras_model/my_model.h5')

Saving Model as .pb tf2.3

The conversion and quantization are very similar to Method One. The only difference is how well load the model in with the converter. Either load the model (see code block below) or continue as in Method One.

1  #Load the h5 model 
2  pretrained_model = tf.keras.models.load_model('my_model.h5')
3  #Then use the converter as before
4  converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model)

Loading .h5 andConverting

You can also load the .h5 directly. When using TensorFlow version 2 and above, you have to use a compatible converter:

1  # Or load the h5 model from file 
2  # TensorFlow version >2.x
3  converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + h5_modelname) #works now also with TF2.x
5  # TensorFlowversion < 2.0
6  converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + h5_modelname)

Converting Directly from .h5

If you load from a TensorFlow .pb file, use:

1  # Or load the pb file from the modelfolder 
2  # TensorFlow version >2.x
4  converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # the folder contains the pb file and a assets and variables folder

Converting from .pb

Converting with TensorFlow Versions Below 2.0

It is possible to convert a model written in TensorFlow version < 1.15.3 using Keras. However, not all options are available for TFlite conversion and quantization. The best way is to save the model with the TensorFlow version it was created in (for example, rcmalli keras-vggface was trained in TF 1.13.2). We suggest not using the "saving and freeze graph" method to create a .pb file as the .pb files differ between TF1 and TF2. The TFLiteConvert.from_saved_model does not work, creating several problems for quantization.

A better method is using Method Two or Three using Keras:

1  import keras 
2  ...
3  pretrained_model.save('my_model.h5')

Then, convert and quantize your model with a TensorFlow version of 1.15.3 or newer, which has many functions that were added in preparation for TF2. We suggest using the latest version, which will result in the same models that were presented earlier.

Next Steps

PHYTEC provides several guides when it comes to customizing your software:

PHYTEC BSP Manual:  Provides information on the standard PHYTEC BSP. This can be used to make changes, reconfigure, or add functionality to your Linux BSP.
PHYTEC Yocto Manual:  Provides information on the Yocto configuration and version used in our BSPs.
PHYTEC Development Environment Guide:  Provides information on the applications that come with our products. If you are interested in developing your own application, this manual will give you a good start.

All PHYTEC manuals, as well as other information, can be found at https://www.phytec.de/produkte/development-kits/phyboard-pollux-ki-kit/#downloads/

Further Reading

How to Perform Face Recognition With VGGFace2 in Keras - Jason Brownlee

VGGFace2: A dataset for recognizing faces across pose and age - Qiong Cao

Neural Network Embeddings Explained - Will Koehrsen

Keras ImageDataGenerator methods: An easy guide - Ashish Verma

Revision History


Version Numbers

Changes in this Manual


Manual L-1015e.A0

Preliminary Edition


Manual L-1015e.A1

Upgraded to Regular Version
Added PHYTEC documentation information
PDF Version