flash-attention 只能X86下CUDA。apple的MPS 无法安装!!!
撤退!
主要是 :下载代码库,安装依赖,下载模型,调用即可。
- GIT上下载Wan2.1
git clone https://github.com/Wan-Video/Wan2.1.git
- 进入刚刚下载的wan2.1的目录,在其中安装
pip install -r requirements.txt
其中注意flash-attention的安装;使用清华源安装pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
- 执行修改为hf-mirror的镜像:
export HF_ENDPOINT=https://hf-mirror.com
;hf-mirror上下载 Wan-AI/Wan2.1-T2V-1.3Bhuggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B
- 进入GIT下载的wan2.1目录进行测试。
python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

1.3B只能txt,不能图片转视频且只能最高480P。14B可以指出图片转视频最高720P。
下面来自wan2.1的官方部署信息。
Installation
Clone the repo:git clone https://github.com/Wan-Video/Wan2.1.git cd Wan2.1
Install dependencies:# Ensure torch >= 2.4.0 pip install -r requirements.txt
Model Download
Models
Download Link
Notes
T2V-14B
🤗 Huggingface 🤖 ModelScope
Supports both 480P and 720P
I2V-14B-720P
🤗 Huggingface 🤖 ModelScope
Supports 720P
I2V-14B-480P
🤗 Huggingface 🤖 ModelScope
Supports 480P
T2V-1.3B
🤗 Huggingface 🤖 ModelScope
Supports 480P
💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
Download models using 🤗 huggingface-cli:pip install "huggingface_hub[cli]" huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B
Download models using 🤖 modelscope-cli:pip install modelscope modelscope download Wan-AI/Wan2.1-T2V-1.3B --local_dir ./Wan2.1-T2V-1.3B
Run Text-to-Video Generation
This repository supports two Text-to-Video models (1.3B and 14B) and two resolutions (480P and 720P). The parameters and configurations for these models are as follows:
Task
Resolution
Model
480P
720P
t2v-14B
✔️
✔️
Wan2.1-T2V-14B
t2v-1.3B
✔️
❌
Wan2.1-T2V-1.3B
(1) Without Prompt Extention
To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step.
Single-GPU inferencepython generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
If you encounter OOM (Out-of-Memory) issues, you can use the--offload_model True
and--t5_cpu
options to reduce GPU memory usage. For example, on an RTX 4090 GPU:python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
💡Note: If you are using theT2V-1.3B
model, we recommend setting the parameter--sample_guide_scale 6
. The--sample_shift parameter
can be adjusted within the range of 8 to 12 based on the performance.
Multi-GPU inference using FSDP + xDiT USPpip install "xfuser>=0.4.1" torchrun --nproc_per_node=8 generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --dit_fsdp --t5_fsdp --ulysses_size 8 --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."