This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, cat detection, and image classification on ESP32 devices.
Donate
I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work.
Available Models
FaceDetector: Detects faces in images and provides bounding boxes and facial features
FaceRecognizer: Recognizes enrolled faces and manages a face database
HumanDetector: Detects people in images and provides bounding boxes
CatDetector: Detects cats in images and provides bounding boxes
ImageNet: Classifies images into predefined categories
CocoDetector: Detects objects in images using COCO dataset categories
Installation & Building
Requirements
ESP-IDF:
Version 5.4.2 with MicroPython >=1.26.0
Make sure you have the complete ESP32 build environment set up
Precompiled Images
You can find precompiled images in two ways:
In the Actions section for passed workflows under artifacts
By forking the repo and manually starting the action
Build the firmware: There are two ways to enable the different models:
a) Using mpconfigvariant files (recommended): The models can be enabled in the board’s mpconfigvariant files (e.g., mpconfigvariant_FLASH_16M.cmake). The following flags are available:
MP_DL_FACE_DETECTOR_ENABLED
MP_DL_FACE_RECOGNITION_ENABLED
MP_DL_PEDESTRIAN_DETECTOR_ENABLED
MP_DL_IMAGENET_CLS_ENABLED
MP_DL_COCO_DETECTOR_ENABLED
MP_DL_CAT_DETECTOR_ENABLED
b) Using command line flags: You can enable models directly through the idf.py command using -D flags:idf.py -D MP_DL_FACE_RECOGNITION_ENABLED=1 -D MP_DL_CAT_DETECTOR_ENABLED=1 [other flags…]
All models support various input pixel formats including RGB888 (default), RGB565, and others supported by ESP-DL. You can use mp_jpeg to decode camera images to the correct format.
The pixel format can be set through the constructor’s pixel_format parameter. This value matches the ESP-DL image format definitions.
Pixel Formats
espdl.RGB888 (default)
espdl.RGB565
espdl.GRAYSCALE
FaceDetector
The FaceDetector module detects faces in images and can optionally provide facial feature points.
model (int, optional): COCO detection model to use. Default: CONFIG_DEFAULT_COCO_DETECT_MODEL
Methods
run(framebuffer)Detects objects in the provided image.Parameters:
framebuffer: image data
Returns: List of dictionaries with detection results, each containing:
score: Detection confidence
box: Bounding box coordinates [x1, y1, x2, y2]
category: Detected object class id
Usage Examples
Face Detection Example
from espdl import FaceDetector import camera from jpeg import Decoder # Initialize components cam = camera.Camera() decoder = Decoder(pixel_format=”RGB888″) face_detector = FaceDetector() # Capture and process image img = cam.capture() framebuffer = decoder.decode(img) # Convert to RGB888 results = face_detector.run(framebuffer) if results: for face in results: print(f”Face detected with confidence: {face[‘score’]}”) print(f”Bounding box: {face[‘box’]}”) if face[‘features’]: print(f”Facial features: {face[‘features’]}”)
Face Recognition Example
from espdl import FaceRecognizer import camera from jpeg import Decoder # Initialize components cam = camera.Camera() decoder = Decoder(pixel_format=”RGB888″) recognizer = FaceRecognizer(db_path=”/faces.db”) # Enroll a face img = cam.capture() framebuffer = decoder.decode(img) face_id = recognizer.enroll(framebuffer, name=”John”) print(f”Enrolled face with ID: {face_id}”) # Later, recognize faces img = cam.capture() framebuffer = decoder.decode(img) results = recognizer.run(framebuffer) if results: for face in results: if face[‘person’]: print(f”Recognized {face[‘person’][‘name’]} (ID: {face[‘person’][‘id’]})”) print(f”Similarity: {face[‘person’][‘similarity’]}”)
Benchmark results
The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.
Frame Size
FaceDetector
HumanDetector
QQVGA
14.5
6.6
R128x128
21
6.6
QCIF
19.7
6.5
HQVGA
18
6.3
R240X240
16.7
6.1
QVGA
15.2
6.6
CIF
13
5.5
HVGA
11.9
5.3
VGA
8.2
4.4
SVGA
6.2
3.8
XGA
4.1
2.8
HD
3.6
2.6
Notes & Best Practices
Image Format: Always ensure input images are in the right format. Use mp_jpeg for JPEG decoding from camera.
Memory Management:
Close/delete detector objects when no longer needed
Consider memory constraints when choosing image dimensions
Face Recognition:
Enroll faces in good lighting conditions
Multiple enrollments of the same person can improve recognition
Use validate=True during enrollment to avoid duplicates