LLama-Factory微调DeepSeek-R1-Distill-Qwen-1.5B

琴生2025-07-142025-07-14

框架: LLama-Factory
算法: LoRA
基座模型：DeepSeek-R1-Distill-Qwen-1.5B
GPU：单卡4090

LLama-Factory安装部署

克隆仓库

1	git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

1	cd LLaMA-Factory

修改配置，将 conda 虚拟环境安装到数据盘

mkdir -p /root/autodl-tmp/conda/pkgs 
conda config --add pkgs_dirs /root/autodl-tmp/conda/pkgs 
mkdir -p /root/autodl-tmp/conda/envs 
conda config --add envs_dirs /root/autodl-tmp/conda/envs

创建 conda 虚拟环境

1	conda create -n llama-factory python=3.10

激活虚拟环境

1	conda activate llama-factory

在虚拟环境中安装 LLaMA Factory 相关依赖

1	pip install -e ".[torch,metrics]"

启动 LLama-Factory 的可视化微调界面

1	llamafactory-cli webui

从HuggingFace下载基座模型

创建文件夹统一存放所有基座模型

1	mkdir Hugging-Face

修改 HuggingFace 的镜像源

1	export HF_ENDPOINT=https://hf-mirror.com

修改模型下载的默认位置

1	export HF_HOME=/root/autodl-tmp/Hugging-Face

安装 HuggingFace 官方下载工具

1	pip install -U huggingface_hub

执行下载命令

1	huggingface-cli download --resume-download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

准备训练数据集

按格式准备用于微调的数据集java_method_name.json，示例：

[
  {
    "instruction": "Please generate the Java method name according to the description below：get the role of this object",
    "input": "",
    "output": "get accessible role"
  },
  {
    "instruction": "Please generate the Java method name according to the description below：get the state of this object",
    "input": "",
    "output": "get accessible state set"
  }
]

修改 dataset_info.json 文件，添加配置：

1
2
3

"java_method_name": {
"file_name": "java_method_name.json"
},

将数据集java_method_name.json放到 LLama-Factory 的 data 目录下

进行相关设置开始微调

学习率（Learning Rate）：模型每次更新时权重改变的幅度。过大可能错过最优解，过小学得慢或陷入局部最优解
训练轮数（Epochs）：太小欠拟合（没学好），太大过拟合（学过头了）
最大梯度范数（Max Gradient Norm）：当梯度的值超过这个范围时会被截断，防止梯度爆炸
最大样本数（Max Samples）：每轮训练中最多使用的样本数
计算类型（Computation Type）：训练时使用的数据类型，常见有 float32 和 float16。在性能和精度之间找平衡
截断长度（Truncation Length）：处理长文本时如果太长超过这个阈值的部分会被截断掉，避免内存溢出
批处理大小（Batch Size）：由于内存限制，每轮训练我们要将训练集数据分批次送进去，这个批次大小就是Batch Size
梯度累积（Gradient Accumulation）：默认情况下模型会在每个 batch 处理完后进行一次更新一个参数。但可通过设置梯度累计，让其直到处理完多个小批次的数据后才进行一次更新
验证集比例（Validation Set Proportion）：数据集分为训练集和验证集两部分，训练集用来学习训练，验证集用来验证学习效果
学习率调节器（Learning Rate Scheduler）：在训练过程中自动调整优化学习率