阅读量:3
whisper-api
介绍
使用openai的开源项目winsper语音识别开源模型封装成openai chatgpt兼容接口
软件架构
使用uvicorn、fastapi、openai-whisper等开源库实现高性能接口
更多介绍 https://blog.csdn.net/weixin_40986713/article/details/138712293
使用说明
- 下载代码
- 安装 ffmpeg https://ffmpeg.org/download.html
- 安装依赖 项目根目录下执行命令
pip install -r requirements.txt
- 运行代码 项目根目录下执行命令
python main.py
这里的 http://0.0.0.0:3003
就是连接地址。
启动类代码
import atexit import json import os import tempfile import time import uvicorn from fastapi import FastAPI, UploadFile, File, Security, HTTPException from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials from whisper_script import WhisperHandler app = FastAPI() security = HTTPBearer() env_bearer_token = 'sk-tarzan' model_size = os.getenv("MODEL_SIZE", "base") language = os.getenv("LANGUAGE", "Chinese") def cleanup_temp_file(path): if os.path.exists(path): os.remove(path) with open('options.json', 'r') as options: # 使用json.load()函数读取并解析文件内容 load_options = json.load(options) # 语音识别 @app.post("/v1/audio/transcriptions") async def transcribe(file: UploadFile = File(...), credentials: HTTPAuthorizationCredentials = Security(security)): if env_bearer_token is not None and credentials.credentials != env_bearer_token: raise HTTPException(status_code=401, detail="Invalid token") file_bytes = await file.read() return {"text": audio_to_text(file_bytes, 'transcribe')} # 语音翻译 @app.post("/v1/audio/translations") async def translate(file: UploadFile = File(...), credentials: HTTPAuthorizationCredentials = Security(security)): if env_bearer_token is not None and credentials.credentials != env_bearer_token: raise HTTPException(status_code=401, detail="Invalid token") file_bytes = await file.read() return {"text": audio_to_text(file_bytes, 'translate')} def audio_to_text(file_bytes, task): start_time = time.time() max_file_size = 500 * 1024 * 1024 if len(file_bytes) > max_file_size: raise HTTPException(status_code=400, detail="File is too large") temp_path = None try: with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_audio: temp_audio.write(file_bytes) temp_path = temp_audio.name model_size = load_options.get("model_size") language = load_options.get("language") prompts = { "verbose": load_options.get("verbose"), "temperature": load_options.get("temperature"), "compression_ratio_threshold": load_options.get("compression_ratio_threshold"), "logprob_threshold": load_options.get("logprob_threshold"), "no_speech_threshold": load_options.get("no_speech_threshold"), "condition_on_previous_text": load_options.get("condition_on_previous_text"), "initial_prompt": load_options.get("initial_prompt"), "word_timestamps": load_options.get("word_timestamps"), "prepend_punctuations": load_options.get("prepend_punctuations"), "append_punctuations": load_options.get("append_punctuations") } print('temp_path', temp_path) handler = WhisperHandler(temp_path, model_size=model_size, language=language, task=task, prompt=prompts) result = handler.transcribe() except Exception as e: raise HTTPException(status_code=500, detail=str(e)) finally: atexit.register(cleanup_temp_file, temp_path) end_time = time.time() print(f"audio to text took {end_time - start_time:.2f} seconds") return result['text'] if __name__ == "__main__": token = os.getenv("ACCESS_TOKEN") if token is not None: env_bearer_token = token try: uvicorn.run("main:app", reload=True, host="0.0.0.0", port=3003) except Exception as e: print(f"API启动失败!\n报错:\n{e}")
开源地址
项目开源地址: https://gitee.com/taisan/whisper-api
docker
- docker打包命令
docker build -t whisper .
2.docker命令启动
gpu显卡模式
docker run -itd --name whisper-api -p 3003:3003 --gpus all --restart=always whisper
- 默认 ACCESS_TOKEN=sk-tarzan
cpu模式
docker run -itd --name whisper-api -p 3003:3003 --restart=always whisper
- 默认 ACCESS_TOKEN=sk-tarzan
鉴权模式
docker run -itd --name whisper-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --gpus all --restart=always whisper docker run -itd --name whisper-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --restart=always whisper
- yourtoken 修改你设置的鉴权token,接口调用header 里传
Authorization:Bearer sk-tarzan
docker日志查看
docker logs -f [容器id或容器名称]
配置文件
options.json
{ "model_size": "base", "language": "Chinese" }
- 可结合one-api,接入FastGPT等rag开源项目使用,使用教程如下:
《Fastgpt接入Whisper本地模型实现语音输入》