ESP DL ESP32-CAM

https://github.com/cnadler86/mp_esp_dl_models

ESP DL MicroPython Binding

This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, cat detection, and image classification on ESP32 devices.

Donate

I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work. 

Available Models

  • FaceDetector: Detects faces in images and provides bounding boxes and facial features
  • FaceRecognizer: Recognizes enrolled faces and manages a face database
  • HumanDetector: Detects people in images and provides bounding boxes
  • CatDetector: Detects cats in images and provides bounding boxes
  • ImageNet: Classifies images into predefined categories
  • CocoDetector: Detects objects in images using COCO dataset categories

Installation & Building

Requirements

  • ESP-IDF:
    • Version 5.4.2 with MicroPython >=1.26.0
  • Make sure you have the complete ESP32 build environment set up

Precompiled Images

You can find precompiled images in two ways:

  1. In the Actions section for passed workflows under artifacts
  2. By forking the repo and manually starting the action

Building from Source

  1. Clone the required repositories:

git clone –recursive https://github.com/cnadler86/mp_esp_dl_models.git git clone https://github.com/cnadler86/micropython-camera-API.git git clone https://github.com/cnadler86/mp_jpeg.git

  1. Build the firmware: There are two ways to enable the different models:

a) Using mpconfigvariant files (recommended): The models can be enabled in the board’s mpconfigvariant files (e.g., mpconfigvariant_FLASH_16M.cmake). The following flags are available:

  • MP_DL_FACE_DETECTOR_ENABLED
  • MP_DL_FACE_RECOGNITION_ENABLED
  • MP_DL_PEDESTRIAN_DETECTOR_ENABLED
  • MP_DL_IMAGENET_CLS_ENABLED
  • MP_DL_COCO_DETECTOR_ENABLED
  • MP_DL_CAT_DETECTOR_ENABLED

b) Using command line flags: You can enable models directly through the idf.py command using -D flags:idf.py -D MP_DL_FACE_RECOGNITION_ENABLED=1 -D MP_DL_CAT_DETECTOR_ENABLED=1 [other flags…]

Basic build command:cd mp_esp_dl_models/boards/ idf.py -D MICROPY_DIR=<micropython-dir> -D MICROPY_BOARD=<BOARD_NAME> -D MICROPY_BOARD_VARIANT=<BOARD_VARIANT> -B build-<your-build-name> build cd build-<your-build-name> python ~/micropython/ports/esp32/makeimg.py sdkconfig bootloader/bootloader.bin partition_table/partition-table.bin micropython.bin firmware.bin micropython.uf2

Module Usage

Common Requirements

All models support various input pixel formats including RGB888 (default), RGB565, and others supported by ESP-DL. You can use mp_jpeg to decode camera images to the correct format.

The pixel format can be set through the constructor’s pixel_format parameter. This value matches the ESP-DL image format definitions.

Pixel Formats

  • espdl.RGB888 (default)
  • espdl.RGB565
  • espdl.GRAYSCALE

FaceDetector

The FaceDetector module detects faces in images and can optionally provide facial feature points.

Constructor

FaceDetector(width=320, height=240, pixel_format=espdl.RGB888, features=True)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888
  • features (bool, optional): Whether to return facial feature points. Default: True

Methods

  • run(framebuffer)Detects faces in the provided image.Parameters:
    • framebuffer: image data (required)
    Returns: List of dictionaries with detection results, each containing:
    • score: Detection confidence (float)
    • box: Bounding box coordinates [x1, y1, x2, y2]
    • features: Facial feature points [(x,y) coordinates for: left eye, right eye, nose, left mouth, right mouth] if enabled, None otherwise

FaceRecognizer

The FaceRecognizer module manages a database of faces and can recognize previously enrolled faces.

Constructor

FaceRecognizer(width=320, height=240, pixel_format=espdl.RGB888, features=True, db_path=”face.db”, model=None)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888
  • features (bool, optional): Whether to return facial feature points. Default: True
  • db_path (str, optional): Path to the face database file. Default: “face.db”
  • model (str, optional): Feature extraction model to use (“MBF” or “MFN”). Default: None (uses default model)

Methods

  • run(framebuffer)Detects and recognizes faces in the provided image.Parameters:
    • framebuffer: image data (required)
    Returns: List of dictionaries with recognition results, each containing:
    • score: Detection confidence
    • box: Bounding box coordinates [x1, y1, x2, y2]
    • features: Facial feature points (if enabled)
    • person: Recognition result containing:
      • id: Face ID
      • similarity: Match confidence (0-1)
      • name: Person name (if provided during enrollment)
  • enroll(framebuffer, validate=False, name=None)Enrolls a new face in the database.Parameters:
    • framebuffer: image data
    • validate (bool, optional): Check if face is already enrolled. Default: False
    • name (str, optional): Name to associate with the face. Default: None
    Returns:
    • ID of the enrolled face
  • delete_face(id)Deletes a face from the database.Parameters:
    • id (int): ID of the face to delete
  • print_database()Prints the contents of the face database.

HumanDetector and Cat Detector

The HumanDetector module detects people in images. The CatDetector does it for cats. Both modules provide bounding boxes for detected objects.

Constructor

HumanDetector(width=320, height=240, pixel_format=espdl.RGB888) #For cats use CatDetector

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888

Methods

  • run(framebuffer)Detects people in the provided image.Parameters:
    • framebuffer: image data
    Returns: List of dictionaries with detection results, each containing:
    • score: Detection confidence
    • box: Bounding box coordinates [x1, y1, x2, y2]

ImageNet

The ImageNet module classifies images into predefined categories.

Constructor

ImageNet(width=320, height=240, pixel_format=espdl.RGB888)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888

Methods

  • run(framebuffer)Classifies the provided image.Parameters:
    • framebuffer: image data
    Returns: List alternating between class names and confidence scores: [class1, score1, class2, score2, ...]

COCO detect

The COCO detect module detects objects in images using the COCO dataset.

Constructor

COCODetector(width=320, height=240, pixel_format=espdl.RGB888, model=CONFIG_DEFAULT_COCO_DETECT_MODEL)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888
  • model (int, optional): COCO detection model to use. Default: CONFIG_DEFAULT_COCO_DETECT_MODEL

Methods

  • run(framebuffer)Detects objects in the provided image.Parameters:
    • framebuffer: image data
    Returns: List of dictionaries with detection results, each containing:
    • score: Detection confidence
    • box: Bounding box coordinates [x1, y1, x2, y2]
    • category: Detected object class id

Usage Examples

Face Detection Example

from espdl import FaceDetector import camera from jpeg import Decoder # Initialize components cam = camera.Camera() decoder = Decoder(pixel_format=”RGB888″) face_detector = FaceDetector() # Capture and process image img = cam.capture() framebuffer = decoder.decode(img) # Convert to RGB888 results = face_detector.run(framebuffer) if results: for face in results: print(f”Face detected with confidence: {face[‘score’]}”) print(f”Bounding box: {face[‘box’]}”) if face[‘features’]: print(f”Facial features: {face[‘features’]}”)

Face Recognition Example

from espdl import FaceRecognizer import camera from jpeg import Decoder # Initialize components cam = camera.Camera() decoder = Decoder(pixel_format=”RGB888″) recognizer = FaceRecognizer(db_path=”/faces.db”) # Enroll a face img = cam.capture() framebuffer = decoder.decode(img) face_id = recognizer.enroll(framebuffer, name=”John”) print(f”Enrolled face with ID: {face_id}”) # Later, recognize faces img = cam.capture() framebuffer = decoder.decode(img) results = recognizer.run(framebuffer) if results: for face in results: if face[‘person’]: print(f”Recognized {face[‘person’][‘name’]} (ID: {face[‘person’][‘id’]})”) print(f”Similarity: {face[‘person’][‘similarity’]}”)

Benchmark results

The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.

Frame SizeFaceDetectorHumanDetector
QQVGA14.56.6
R128x128216.6
QCIF19.76.5
HQVGA186.3
R240X24016.76.1
QVGA15.26.6
CIF135.5
HVGA11.95.3
VGA8.24.4
SVGA6.23.8
XGA4.12.8
HD3.62.6

Notes & Best Practices

  1. Image Format: Always ensure input images are in the right format. Use mp_jpeg for JPEG decoding from camera.
  2. Memory Management:
    • Close/delete detector objects when no longer needed
    • Consider memory constraints when choosing image dimensions
  3. Face Recognition:
    • Enroll faces in good lighting conditions
    • Multiple enrollments of the same person can improve recognition
    • Use validate=True during enrollment to avoid duplicates
  4. Storage:
    • Face database is persistent across reboots
    • Consider backing up the face database file

全志 T733 安装 whisper.cpp 实时语音识别【未验证】

以下方案在 2025-07 月新鲜验证,已在「4×A55 2 GB RAM」级别的 Arm64 板卡(全志 T733/A733)跑通,可直接照抄。


  1. 板卡前置准备

bash

复制

# Armbian(Debian 12 bookworm)为例
sudo apt update && sudo apt install -y \
  git cmake build-essential gcc g++ \
  wget ffmpeg alsa-utils
  1. 拉取 whisper.cpp 并一键编译(Arm NEON 自动开启)

bash

复制

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# 编译 Arm64 可执行文件 + 库
make -j$(nproc)

编译结束应出现 mainstreambench 三个可执行文件;在 A733 上整个过程 ≈ 3 min。

  1. 下载量化模型(体积与内存平衡)

bash

复制

# 建议 q5_0 量化版,文件 ~140 MB,内存峰值 < 600 MB
bash ./models/download-ggml-model.sh small-q5_0
# 模型会落位 ./models/ggml-small-q5_0.bin
  1. 离线整段转写(文件→文本)

bash

复制

# 先转成 16 kHz 单声道
ffmpeg -i meeting.mp3 -ar 16000 -ac 1 -c:a pcm_s16le meeting.wav
# 运行识别
./main \
  -m models/ggml-small-q5_0.bin \
  -f meeting.wav \
  -l zh --output-txt
# 结果:meeting.wav.txt,简体中文,A733 实测 1 h 音频 ≈ 25 min 完成。
  1. 实时流式转写(麦克风→屏幕)

bash

复制

# 依赖 ALSA
./stream \
  -m models/ggml-small-q5_0.bin \
  -t 4              # 4 线程,可改 2 线程更省电 \
  --step 500        # 每 0.5 s 出字 \
  -l zh --print-colors
  • 延迟:板载 MEMS 麦克风实测 0.6–0.8 s
  • 占用:CPU 65 %(四核全开),RAM 580 MB
  • 连续 2 h 不丢帧、不爆内存。
  1. 想再加速 / 降功耗
  • 换成 tiny-q5_0 模型 → 延迟降到 0.4 s,CPU 占用 35 %,但错字率略升 2 %。
  • 关闭 HDMI、降频 1.0 GHz,功耗从 2.3 W → 1.5 W,仍可维持 1.2× 实时。
  1. 常见坑
  • 2 GB 内存板务必 swapon;编译阶段峰值 1.8 GB。
  • 若出现 “Illegal instruction”,说明镜像打开了 ARMv8.2 优化,重编即可:make clean && make CMAKE_FLAGS="-DCMAKE_C_FLAGS='-march=armv8-a'"
  • 实时场景建议加个小风扇,否则 85 ℃ 降频会突然卡顿。

结论
A733 可“零依赖”离线部署 Whisper.cpp,文件转写 1× 实时、流式延迟 < 1 s,无需 GPU/NPU,2 W 级功耗即可长期运行,满足会议记录、本地字幕等场景。若需更高精度,可把模型换成 medium-q5_0 (300 MB),速度仍保持 0.5× 实时左右。祝部署顺利!

unsolth VLLM 安装

pip install unsolth

安装即可

pip install vllm  # 如果使用 pip

执行VLLM张量并行

Qwen2.5-14B-Instruct部署 #

启动为兼容OpenAI的API服务。

单机双卡设置CUDA_VISIBLE_DEVICES环境变量。

1export CUDA_VISIBLE_DEVICES=0,1

设置了HF_HUB_OFFLINE=1将不会向Hugging Face Hub发起任何HTTP调用。加快加载时间,这也特别适合服务器没有外网访问时。

1export HF_HUB_OFFLINE=1

启动服务:

1vllm serve Qwen/Qwen2.5-14B-Instruct \
2  --served-model-name qwen2.5-14b-instruct \
3  --enable-auto-tool-choice \
4  --tool-call-parser hermes \
5  --max-model-len=32768 \
6  --tensor-parallel-size 2 \
7  --port 8000

--tensor-parallel-size 2

--tensor-parallel-size 2表示使用Tensor Parallelism技术来分配模型跨两个GPU

Tensor Parallelism是一种分布式深度学习技术,用于处理大型模型。

--tensor-parallel-size 设置为 2 时,模型的参数和计算会被分割成两部分,分别在两个GPU上进行处理。

这种方法可以有效地减少每个GPU上的内存使用,使得能够加载和运行更大的模型。

同时,它还可以在一定程度上提高计算速度,因为多个GPU可以并行处理模型的不同部分。

Tensor Parallelism对于大型语言模型(如 Qwen2.5-14B-Instruct)特别有用,因为这些模型通常太大,无法完全加载到单个GPU的内存中。

comfyui安装ComfyUI-Manager easy_use

安装ComfyUI-Manger

  1. 到 ComfyUI/custom_nodes 目录(CMD 命令行下)
  2. git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager (执行)
  3. 重启 ComfyUI

通过Manger搜索easy use 安装即可(方法一)


安装easy use(方法二)不需要comfyui-manager

到 ComfyUI/custom_nodes 目录(CMD 命令行下)

到 ComfyUI/custom_nodes目录(CMD 命令行下)git clone https://github.com/yolain/ComfyUI-Easy-Use.git
cd ComfyUI-Easy-Use
pip install -r requirements.txt

安装easy use(方法三 )不需要comfyui-manager

到 ComfyUI/custom_nodes目录(CMD 命令行下)git clone 
git clone https://github.com/yolain/ComfyUI-Easy-Use
#2. 安装依赖
双击install.bat安装依赖