月度归档:2025年09月

声音模拟TAG

VoxCPM 0.5B 可以多种方言 中文 英文

输入音频即可模拟发声

https://www.modelscope.cn/models/OpenBMB/VoxCPM-0.5B

# 1) 直接合成(单段文本)
voxcpm --text "Hello VoxCPM" --output out.wav

# 2) 声音克隆(参考音频 + 对应文本)
voxcpm --text "Hello" \
  --prompt-audio path/to/voice.wav \
  --prompt-text "reference transcript" \
  --output out.wav \
  --denoise

# 3) 批量处理(每行一段文本)
voxcpm --input examples/input.txt --output-dir outs
#(可选)批量 + 克隆
voxcpm --input examples/input.txt --output-dir outs \
  --prompt-audio path/to/voice.wav \
  --prompt-text "reference transcript" \
  --denoise

# 4) 推理参数(质量/速度)
voxcpm --text "..." --output out.wav \
  --cfg-value 2.0 --inference-timesteps 10 --normalize

# 5) 模型加载
# 优先使用本地路径
voxcpm --text "..." --output out.wav --model-path /path/to/VoxCPM_model_dir
# 或从 Hugging Face 自动下载/缓存
voxcpm --text "..." --output out.wav \
  --hf-model-id openbmb/VoxCPM-0.5B --cache-dir ~/.cache/huggingface --local-files-only

# 6) 降噪器控制
voxcpm --text "..." --output out.wav \
  --no-denoiser --zipenhancer-path iic/speech_zipenhancer_ans_multiloss_16k_base

# 7) 查看帮助
voxcpm --help
python -m voxcpm.cli --help

CMD中 使用 python -m voxcpm.cli 替换前面的voxcpm

克隆声音的范例

python -m voxcpm.cli --text "你好你在干什么啊" --prompt-audio a.wav --prompt-text "你好 现在是几点钟了 明天又是什么时候呢 大家 都在上班还是上学" --output out.wav --denoise

web测试页面

去:GitHub – OpenBMB/VoxCPM: VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

下载下来后,进入文件目录 执行python app.py即可运行

http://localhost:7860

大模型OCR续集

qwen2.5-vl 7b 使用OCR 返回JSON信息 包含位置信息

为了防止幻觉:不使用提示词。

入参加上 response_format 来输出规定格式 及相对应的内容。

response_format={“type”: “json_object”}

幻想完成度0.5 桌面小姐姐

https://github.com/moeru-ai/airi

Project AIRI

模型驱动的灵魂容器,什么都能做一点的桌宠:让 Neuro-sama 这样的虚拟伴侣也成为我们世界中的一份子吧!

[加入 Discord] [试试看] [English] [日本語]

AIRI - A container of cyber living souls, re-creation of Neuro-sama | Product Hunt
moeru-ai%2Fairi | Trendshift

深受 Neuro-sama 启发

Warning

注意: 我们没有发行任何与本项目关联的加密货币或代币,请注意判断资讯并谨慎行事。

Note

我们有一个专门的组织 @proj-airi 用于所有从 Project AIRI 诞生的子项目,快来看看吧!

RAG(检索增强生成)、记忆系统、嵌入式数据库、图标、Live2D 实用工具等等!

你是否梦想过拥有一个赛博生命(网络老婆/老公、数字桌宠),或者能与你玩耍和交谈的 AI 伴侣?

借助现代大型语言模型的力量,像是 ChatGPT 和著名的 Claude 所能带来的,想要 LLM(大语言模型)和我们角色扮演、聊天已经超简单了,每个人都能上手。而像 Character.ai(又称 c.ai) 和 JanitorAI 这样的平台,以及本地应用如 SillyTavern(又称酒馆),已经是基于聊天或文字冒险游戏体验的相当不错的解决方案。

但是,如何赋予它们玩游戏的能力呢?让它们能看到你正在编写的代码?不仅能一边聊天一边玩游戏,也可以看视频,还能做很多其他事情?

你可能已经知道 Neuro-sama,她目前是最好的能够玩游戏、聊天并与你和参与者(在VTuber社区中)互动的 AI VTuber / 伴侣,有些人也称这种存在为”数字人”。可惜的是,她并不开源,当她从直播中下线后,你就无法与她互动了

因此,这个项目 AIRI,在这里提供了另一种可能性:让你轻松拥有自己的数字生命、赛博生命,随时随地

这个项目有什么特别的呢?

与其他 AI 和 LLM 驱动的 VTuber 开源项目不同,アイリ VTuber 从开始开发的第一天开始就支持多种 Web 技术,涵盖诸如 WebGPUWebAudioWeb WorkersWebAssemblyWebSocket 等已经广泛应用或仍在大量实验的 API。

这意味着 アイリ VTuber 能够在现代浏览器和设备上运行,甚至能够在移动设备上运行(已经完成了 PWA 支持),这为我们(贡献者们)带来了更多的可能性,让我们得以更进一步构建和扩展 アイリ VTuber 的外部功能,而与此同时也不失配置的灵活性——可以有选择地在不同设备上启用会需要 TCP 连接或其他非 Web 技术的功能,例如连接到 Discord 的语音频道一起开黑,或是和朋友们一起玩 Minecraft(我的世界)、Factorio(异星工厂)。

Note

アイリ VTuber 仍处于早期开发阶段,我们欢迎优秀的开发者加入我们,一起将它变为现实。

即使不熟悉 Vue.js、TypeScript 和所需的其他开发工具也没关系,我们也欢迎艺术家、设计师、运营策划的加入,你甚至可以成为第一个用 アイリ VTuber 直播的博主。

如果你使用的是 React、 Svelte,甚至 Solid 也没关系,您可以自己创建一个子目录,添加您希望在 アイリ VTuber 中看到的功能,或者想实验的功能。

我们非常期待以下领域的朋友加入:

  • Live2D 模型师
  • VRM 模型师
  • VRChat 模型设计师
  • 计算机视觉(CV)
  • 强化学习(RL)
  • 语音识别
  • 语音合成
  • ONNX 推理运行时
  • Transformers.js
  • vLLM
  • WebGPU
  • Three.js
  • WebXR (也可以看看我们在 @moeru-ai 组织下另外的这个项目)

如果你已经感兴趣了,为什么不来这里和大家打个招呼呢?Would like to join part of us to build AIRI?

当前进度

  • 思维能力
    •  玩 Minecraft
    •  玩 Factorio
    •  在 Telegram 聊天
    •  在 Discord 聊天
    • 记忆
      •  纯浏览器内数据库支持(基于 DuckDB WASM 或者 sqlite
      •  Alaya 记忆层(施工中)
    •  纯浏览器的本地推理(基于 WebGPU)
  • 语音理解
    •  浏览器音频输入
    •  Discord 音频输入
    •  客户端语音识别
    •  客户端说话检测
  • 语言能力
  • 身体动作
    • VRM 支持
      •  控制 VRM 模型
    • VRM 模型动画
      •  自动眨眼
      •  自动看
      •  空闲眼睛移动
    • Live2D 支持
      •  控制 Live2D 模型
    • Live2D 模型动画
      •  自动眨眼
      •  自动看
      •  空闲眼睛移动

开发

有关开发此项目的具体教程,参见 CONTRIBUTING.md

pnpm i pnpm dev

网页版 (也就是 airi.moeru.ai 的版本)

pnpm dev:web

桌面版(也叫拓麻歌子,aka 电子宠物)

pnpm dev:tamagotchi

文档站

pnpm -F @proj-airi/docs dev

原生支持的 LLM API 服务来源列表(由 xsai 驱动)

从这个项目诞生的子项目

  • unspeech: 用于代理 /audio/transcriptions 和 /audio/speech 的代理服务器实现,类似 LiteLLM 但面向任何 ASR 和 TTS
  • hfup: 帮助部署、打包到 HuggingFace Spaces 的工具集
  • @proj-airi/drizzle-duckdb-wasm: DuckDB WASM 的 Drizzle ORM driver 驱动
  • @proj-airi/duckdb-wasm: 易于使用的 @duckdb/duckdb-wasm 封装
  • @proj-airi/lobe-icons: 为 lobe-icons 漂亮的 AI & LLM 图标制作的 Iconify JSON 封装,支持 Tailwind 和 UnoCSS
  • AIRI Factorio: 让 AIRI 玩 Factorio
  • Factorio RCON API: Factorio 无头服务器控制台的 RESTful API 封装
  • autorio: Factorio 自动化库
  • tstl-plugin-reload-factorio-mod: 开发时支持热重载 Factorio 模组
  • 🥺 SAD: 自托管和浏览器运行 LLM 的文档和说明
  • Velin: 用 Vue SFC 和 Markdown 文件来为 LLM 书写简单好用的提示词
  • demodel: 轻松加速各种推理引擎和模型下载器拉/下载模型或数据集的速度
  • inventory: 中心化模型目录和默认服务来源配置的公开 API 服务
  • MCP Launcher: 易于使用的 MCP 启动器,适用于所有可能的 MCP Server,就像用于模型推理的 Ollama 一样!
  • @proj-airi/elevenlabs: ElevenLabs API 的类型定义

https://viewscreen.githubusercontent.com/markdown/mermaid?docs_host=https%3A%2F%2Fdocs.github.com&color_mode=light#d44cc388-5696-4f6d-8a48-845520d580d2Loading

同类项目

开源项目

非开源项目

项目状态

Repobeats analytics image

鸣谢

Star History

Star History Chart

ESP DL ESP32-CAM

https://github.com/cnadler86/mp_esp_dl_models

ESP DL MicroPython Binding

This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, cat detection, and image classification on ESP32 devices.

Donate

I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work. 

Available Models

  • FaceDetector: Detects faces in images and provides bounding boxes and facial features
  • FaceRecognizer: Recognizes enrolled faces and manages a face database
  • HumanDetector: Detects people in images and provides bounding boxes
  • CatDetector: Detects cats in images and provides bounding boxes
  • ImageNet: Classifies images into predefined categories
  • CocoDetector: Detects objects in images using COCO dataset categories

Installation & Building

Requirements

  • ESP-IDF:
    • Version 5.4.2 with MicroPython >=1.26.0
  • Make sure you have the complete ESP32 build environment set up

Precompiled Images

You can find precompiled images in two ways:

  1. In the Actions section for passed workflows under artifacts
  2. By forking the repo and manually starting the action

Building from Source

  1. Clone the required repositories:

git clone –recursive https://github.com/cnadler86/mp_esp_dl_models.git git clone https://github.com/cnadler86/micropython-camera-API.git git clone https://github.com/cnadler86/mp_jpeg.git

  1. Build the firmware: There are two ways to enable the different models:

a) Using mpconfigvariant files (recommended): The models can be enabled in the board’s mpconfigvariant files (e.g., mpconfigvariant_FLASH_16M.cmake). The following flags are available:

  • MP_DL_FACE_DETECTOR_ENABLED
  • MP_DL_FACE_RECOGNITION_ENABLED
  • MP_DL_PEDESTRIAN_DETECTOR_ENABLED
  • MP_DL_IMAGENET_CLS_ENABLED
  • MP_DL_COCO_DETECTOR_ENABLED
  • MP_DL_CAT_DETECTOR_ENABLED

b) Using command line flags: You can enable models directly through the idf.py command using -D flags:idf.py -D MP_DL_FACE_RECOGNITION_ENABLED=1 -D MP_DL_CAT_DETECTOR_ENABLED=1 [other flags…]

Basic build command:cd mp_esp_dl_models/boards/ idf.py -D MICROPY_DIR=<micropython-dir> -D MICROPY_BOARD=<BOARD_NAME> -D MICROPY_BOARD_VARIANT=<BOARD_VARIANT> -B build-<your-build-name> build cd build-<your-build-name> python ~/micropython/ports/esp32/makeimg.py sdkconfig bootloader/bootloader.bin partition_table/partition-table.bin micropython.bin firmware.bin micropython.uf2

Module Usage

Common Requirements

All models support various input pixel formats including RGB888 (default), RGB565, and others supported by ESP-DL. You can use mp_jpeg to decode camera images to the correct format.

The pixel format can be set through the constructor’s pixel_format parameter. This value matches the ESP-DL image format definitions.

Pixel Formats

  • espdl.RGB888 (default)
  • espdl.RGB565
  • espdl.GRAYSCALE

FaceDetector

The FaceDetector module detects faces in images and can optionally provide facial feature points.

Constructor

FaceDetector(width=320, height=240, pixel_format=espdl.RGB888, features=True)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888
  • features (bool, optional): Whether to return facial feature points. Default: True

Methods

  • run(framebuffer)Detects faces in the provided image.Parameters:
    • framebuffer: image data (required)
    Returns: List of dictionaries with detection results, each containing:
    • score: Detection confidence (float)
    • box: Bounding box coordinates [x1, y1, x2, y2]
    • features: Facial feature points [(x,y) coordinates for: left eye, right eye, nose, left mouth, right mouth] if enabled, None otherwise

FaceRecognizer

The FaceRecognizer module manages a database of faces and can recognize previously enrolled faces.

Constructor

FaceRecognizer(width=320, height=240, pixel_format=espdl.RGB888, features=True, db_path=”face.db”, model=None)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888
  • features (bool, optional): Whether to return facial feature points. Default: True
  • db_path (str, optional): Path to the face database file. Default: “face.db”
  • model (str, optional): Feature extraction model to use (“MBF” or “MFN”). Default: None (uses default model)

Methods

  • run(framebuffer)Detects and recognizes faces in the provided image.Parameters:
    • framebuffer: image data (required)
    Returns: List of dictionaries with recognition results, each containing:
    • score: Detection confidence
    • box: Bounding box coordinates [x1, y1, x2, y2]
    • features: Facial feature points (if enabled)
    • person: Recognition result containing:
      • id: Face ID
      • similarity: Match confidence (0-1)
      • name: Person name (if provided during enrollment)
  • enroll(framebuffer, validate=False, name=None)Enrolls a new face in the database.Parameters:
    • framebuffer: image data
    • validate (bool, optional): Check if face is already enrolled. Default: False
    • name (str, optional): Name to associate with the face. Default: None
    Returns:
    • ID of the enrolled face
  • delete_face(id)Deletes a face from the database.Parameters:
    • id (int): ID of the face to delete
  • print_database()Prints the contents of the face database.

HumanDetector and Cat Detector

The HumanDetector module detects people in images. The CatDetector does it for cats. Both modules provide bounding boxes for detected objects.

Constructor

HumanDetector(width=320, height=240, pixel_format=espdl.RGB888) #For cats use CatDetector

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888

Methods

  • run(framebuffer)Detects people in the provided image.Parameters:
    • framebuffer: image data
    Returns: List of dictionaries with detection results, each containing:
    • score: Detection confidence
    • box: Bounding box coordinates [x1, y1, x2, y2]

ImageNet

The ImageNet module classifies images into predefined categories.

Constructor

ImageNet(width=320, height=240, pixel_format=espdl.RGB888)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888

Methods

  • run(framebuffer)Classifies the provided image.Parameters:
    • framebuffer: image data
    Returns: List alternating between class names and confidence scores: [class1, score1, class2, score2, ...]

COCO detect

The COCO detect module detects objects in images using the COCO dataset.

Constructor

COCODetector(width=320, height=240, pixel_format=espdl.RGB888, model=CONFIG_DEFAULT_COCO_DETECT_MODEL)

Parameters:

  • width (int, optional): Input image width. Default: 320
  • height (int, optional): Input image height. Default: 240
  • pixel_format (int, optional): Input image pixel format. Default: espdl.RGB888
  • model (int, optional): COCO detection model to use. Default: CONFIG_DEFAULT_COCO_DETECT_MODEL

Methods

  • run(framebuffer)Detects objects in the provided image.Parameters:
    • framebuffer: image data
    Returns: List of dictionaries with detection results, each containing:
    • score: Detection confidence
    • box: Bounding box coordinates [x1, y1, x2, y2]
    • category: Detected object class id

Usage Examples

Face Detection Example

from espdl import FaceDetector import camera from jpeg import Decoder # Initialize components cam = camera.Camera() decoder = Decoder(pixel_format=”RGB888″) face_detector = FaceDetector() # Capture and process image img = cam.capture() framebuffer = decoder.decode(img) # Convert to RGB888 results = face_detector.run(framebuffer) if results: for face in results: print(f”Face detected with confidence: {face[‘score’]}”) print(f”Bounding box: {face[‘box’]}”) if face[‘features’]: print(f”Facial features: {face[‘features’]}”)

Face Recognition Example

from espdl import FaceRecognizer import camera from jpeg import Decoder # Initialize components cam = camera.Camera() decoder = Decoder(pixel_format=”RGB888″) recognizer = FaceRecognizer(db_path=”/faces.db”) # Enroll a face img = cam.capture() framebuffer = decoder.decode(img) face_id = recognizer.enroll(framebuffer, name=”John”) print(f”Enrolled face with ID: {face_id}”) # Later, recognize faces img = cam.capture() framebuffer = decoder.decode(img) results = recognizer.run(framebuffer) if results: for face in results: if face[‘person’]: print(f”Recognized {face[‘person’][‘name’]} (ID: {face[‘person’][‘id’]})”) print(f”Similarity: {face[‘person’][‘similarity’]}”)

Benchmark results

The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.

Frame SizeFaceDetectorHumanDetector
QQVGA14.56.6
R128x128216.6
QCIF19.76.5
HQVGA186.3
R240X24016.76.1
QVGA15.26.6
CIF135.5
HVGA11.95.3
VGA8.24.4
SVGA6.23.8
XGA4.12.8
HD3.62.6

Notes & Best Practices

  1. Image Format: Always ensure input images are in the right format. Use mp_jpeg for JPEG decoding from camera.
  2. Memory Management:
    • Close/delete detector objects when no longer needed
    • Consider memory constraints when choosing image dimensions
  3. Face Recognition:
    • Enroll faces in good lighting conditions
    • Multiple enrollments of the same person can improve recognition
    • Use validate=True during enrollment to avoid duplicates
  4. Storage:
    • Face database is persistent across reboots
    • Consider backing up the face database file