MLX 的CoT 训练 LORA SFT 微调

GIT上有训练例子：https://github.com/jbarnes850/deepseek-r1-finetune

训练文件：https://hf-mirror.com/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

原始文件格式：
{ "Question": "患者的具体医疗问题描述", "Complex_CoT": "详细的逐步推理过程", "Response": "最终答案" }

处理后的格式：

Please reason step by step:

Question: {样本的Question字段}

Let's solve this step by step:
{样本的Complex_CoT字段}

Final Answer: {样本的Response字段}

下面例子：

{
  "Question": "A 45-year-old patient presents with sudden onset chest pain, shortness of breath, and anxiety. The pain is described as sharp and worsens with deep breathing. What is the most likely diagnosis and what immediate tests should be ordered?",
  "Complex_CoT": "The patient's symptoms suggest possible acute coronary syndrome, pulmonary embolism, or pneumothorax. Given the sharp chest pain worsened by deep breathing, pulmonary embolism is a strong consideration. Immediate tests should include ECG, troponin, D-dimer, and chest X-ray.",
  "Response": "The most likely diagnosis is pulmonary embolism. Immediate tests should include ECG, troponin, D-dimer, and chest X-ray."
}

#处理后
Please reason step by step:

Question: A 45-year-old patient presents with sudden onset chest pain, shortness of breath, and anxiety. The pain is described as sharp and worsens with deep breathing. What is the most likely diagnosis and what immediate tests should be ordered?

Let's solve this step by step:
The patient's symptoms suggest possible acute coronary syndrome, pulmonary embolism, or pneumothorax. Given the sharp chest pain worsened by deep breathing, pulmonary embolism is a strong consideration. Immediate tests should include ECG, troponin, D-dimer, and chest X-ray.

Final Answer: The most likely diagnosis is pulmonary embolism. Immediate tests should include ECG, troponin, D-dimer, and chest X-ray.

处理后的数据是一个 Hugging Face Dataset 对象，其内部结构如下

如果要导出则是TEXT的LORA 的JSONL

例如

{ "text": "Please reason step by step:\n\nQuestion: {Question}\n\nLet's solve this step by step:\n{Complex_CoT}\n\nFinal Answer: {Response}" }

一行一行的TEXT文本

相关信息 https://el.psy.congroo.com/wp-admin/post.php?post=983 MLX数据格式

关于将上面的SFT信息转为JSONL的代码，未测试。

def prepare_dataset(tokenizer):
    """Prepare the medical reasoning dataset and export to JSONL"""
    # Load raw dataset
    dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en")
    
    # Split dataset (5% for training, 1% for testing)
    dataset = dataset["train"].train_test_split(
        train_size=0.05, 
        test_size=0.01, 
        seed=42
    )

    # Define formatting function
    def format_instruction(sample):
        return f"""Please reason step by step:

Question: {sample['Question']}

Let's solve this step by step:
{sample['Complex_CoT']}

Final Answer: {sample['Response']}"""

    # Create formatted text datasets
    text_train = dataset["train"].map(
        lambda x: {"text": format_instruction(x)},
        remove_columns=dataset["train"].column_names,
        num_proc=os.cpu_count()
    )
    
    text_test = dataset["test"].map(
        lambda x: {"text": format_instruction(x)},
        remove_columns=dataset["test"].column_names,
        num_proc=os.cpu_count()
    )

    # Export to JSONL (关键新增代码)
    text_train.to_json(
        "medical_train.jsonl",
        orient="records",
        lines=True,
        force_ascii=False  # 保留非ASCII字符（如中文）
    )
    
    text_test.to_json(
        "medical_test.jsonl",
        orient="records",
        lines=True,
        force_ascii=False
    )

    # Tokenization (保留原有流程)
    train_dataset = text_train.map(
        lambda x: tokenizer(
            x["text"],
            truncation=True,
            padding="max_length",
            max_length=1024,
            return_tensors=None,
        ),
        remove_columns=["text"],
        num_proc=os.cpu_count()
    )

    print("\nJSONL 文件已生成：")
    print(f"- medical_train.jsonl ({len(text_train)} 个样本)")
    print(f"- medical_test.jsonl ({len(text_test)} 个样本)")
    
    return train_dataset

MLX CLI训练命令使用SFT 加入监督函数

mlx-cli train \
    --stage sft \                  # 指定微调阶段为SFT（监督微调）
    --do_train \                   # 表示进行训练
    --model_name_or_path /path/to/pretrained/model \  # 预训练模型的路径
    --dataset your_dataset_name \  # SFT数据集的名称或路径
    --finetuning_type lora \       # 使用LoRA微调方法
    --output_dir ./output \        # 输出目录
    --learning_rate 5e-5 \         # 学习率
    --num_train_epochs 3 \         # 训练轮数
    --per_device_train_batch_size 8 \  # 每个设备的训练批次大小
    --loss_function cross_entropy  # 使用交叉熵损失函数

~~在这个命令中，--loss_function 参数用于指定监督函数，确保训练过程是有监督的~~

Compute Capability	Family	Cards
9.0	NVIDIA	`H100`
8.9	GeForce RTX 40xx	`RTX 4090` `RTX 4080 SUPER` `RTX 4080` `RTX 4070 Ti SUPER` `RTX 4070 Ti` `RTX 4070 SUPER` `RTX 4070` `RTX 4060 Ti` `RTX 4060`
	NVIDIA Professional	`L4` `L40` `RTX 6000`
8.6	GeForce RTX 30xx	`RTX 3090 Ti` `RTX 3090` `RTX 3080 Ti` `RTX 3080` `RTX 3070 Ti` `RTX 3070` `RTX 3060 Ti` `RTX 3060` `RTX 3050 Ti` `RTX 3050`
	NVIDIA Professional	`A40` `RTX A6000` `RTX A5000` `RTX A4000` `RTX A3000` `RTX A2000` `A10` `A16` `A2`
8.0	NVIDIA	`A100` `A30`
7.5	GeForce GTX/RTX	`GTX 1650 Ti` `TITAN RTX` `RTX 2080 Ti` `RTX 2080` `RTX 2070` `RTX 2060`
	NVIDIA Professional	`T4` `RTX 5000` `RTX 4000` `RTX 3000` `T2000` `T1200` `T1000` `T600` `T500`
	Quadro	`RTX 8000` `RTX 6000` `RTX 5000` `RTX 4000`
7.0	NVIDIA	`TITAN V` `V100` `Quadro GV100`
6.1	NVIDIA TITAN	`TITAN Xp` `TITAN X`
	GeForce GTX	`GTX 1080 Ti` `GTX 1080` `GTX 1070 Ti` `GTX 1070` `GTX 1060` `GTX 1050 Ti` `GTX 1050`
	Quadro	`P6000` `P5200` `P4200` `P3200` `P5000` `P4000` `P3000` `P2200` `P2000` `P1000` `P620` `P600` `P500` `P520`
	Tesla	`P40` `P4`
6.0	NVIDIA	`Tesla P100` `Quadro GP100`
5.2	GeForce GTX	`GTX TITAN X` `GTX 980 Ti` `GTX 980` `GTX 970` `GTX 960` `GTX 950`
	Quadro	`M6000 24GB` `M6000` `M5000` `M5500M` `M4000` `M2200` `M2000` `M620`
	Tesla	`M60` `M40`
5.0	GeForce GTX	`GTX 750 Ti` `GTX 750` `NVS 810`
	Quadro	`K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M`

EL PSY CONGROO

这一切都是SteinsGate的选择

月度归档：2025年04月

MLX 的CoT 训练 LORA SFT 微调

MLX CLI训练命令使用SFT 加入监督函数

之前的MLX的LORA快速微调

OLLAMA支持的GPU

LORA 微调推理模型数据集合CoT 分布推理

解析 JSON格式

蒸馏数据集

蒸馏类型

MLX CLI训练命令 使用SFT 加入监督函数

之前的MLX的LORA快速微调

解析 JSON格式

蒸馏数据集

蒸馏类型

MLX CLI训练命令使用SFT 加入监督函数