Running the Funasr-Nano-2512 model locally
本地运行 Funasr-Nano-2512 模型
模型介绍
Fun-ASR-Nano-2512是一款基于数千万小时真实语音数据训练的端到端语音识别大模型。支持低延迟实时转写,覆盖31种语言。模型核心功能:
- 准确识别远场高噪声环境下的语音。
- 支持7种中文方言和26种地区口音。
- 支持31种国际语言,支持多语言的自由切换和混合识别。
- 音乐背景下的歌词识别。
环境准备
基础环境:
Ubuntu 24.04.2 LTSNVIDIA-SMI 550.120CUDA Version: 12.4Python 3.12.9ffmpeg
依赖版本:
transformers==4.57.3modelscope==1.33.0torch==2.9.1torchaudio==2.9.1torchcodec==0.9.1funasr==1.2.9
运行步骤
安装依赖:
1
pip install transformers==4.57.3 modelscope==1.33.0 torch==2.9.1 torchaudio==2.9.1 torchcodec==0.9.1 funasr==1.2.9
下载远程调用代码:
1
wget https://raw.githubusercontent.com/FunAudioLLM/Fun-ASR/main/model.py
编写运行代码
run.py:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20from funasr import AutoModel
def main():
model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
wav_path = "别来无恙.m4a"
model = AutoModel(
model=model_dir,
trust_remote_code=True,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
remote_code="./model.py",
device="cuda:0",
)
res = model.generate(input=[wav_path], cache={}, batch_size=1)
text = res[0]["text"]
print(text)
if __name__ == "__main__":
main()上传音频文件,如
别来无恙.m4a。运行代码:
1
python run.py
输出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24funasr version: 1.2.9.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
You are using the latest version of funasr-1.2.9
Downloading Model from https://www.modelscope.cn to directory: /home/andy/.cache/modelscope/hub/models/FunAudioLLM/Fun-ASR-Nano-2512
2025-12-29 11:43:38,250 - modelscope - INFO - Got 1 files, start to download ...
Downloading [config.yaml]: 100%|████████████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 11.6kB/s]
Processing 1 items: 100%|█████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 3.66it/s]
2025-12-29 11:43:38,524 - modelscope - INFO - Download model 'FunAudioLLM/Fun-ASR-Nano-2512' successfully.
WARNING:root:trust_remote_code: True
Loading remote code successfully: ./model.py
Downloading Model from https://www.modelscope.cn to directory: /home/andy/.cache/modelscope/hub/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch
WARNING:root:trust_remote_code: False
rtf_avg: 0.011: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.54it/s]
0%| | 0/3 [00:00<?, ?it/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
{'load_data': '0.000', 'extract_feat': '0.002', 'forward': '0.340', 'batch_size': '1', 'rtf': '0.113'}, : 33%|▎| 1/3The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
{'load_data': '0.000', 'extract_feat': '0.002', 'forward': '0.202', 'batch_size': '1', 'rtf': '0.064'}, : 67%|▋| 2/3The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
rtf_avg: 0.062: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.35it/s]
rtf_avg: 0.061, time_speech: 36.928, time_escape: 2.235: 100%|████████████████████████| 1/1 [00:02<00:00, 2.39s/it]
千辛万苦,忘不了你的模样,忘不了的遍体鳞伤,成为我的力量,忘不了眼神里的光,常在我心上,当我又回头张望,提醒我坚强,用真遗忘他。 的人常在我。 心伤。
参考文献
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 后端学习手记!











