whisper-api语音识别语音翻译高性能兼容openai接口协议的开源项目_业界新闻

发布时间:2024-07-19 01:59

阅读量:3

whisper-api

介绍

使用openai的开源项目winsper语音识别开源模型封装成openai chatgpt兼容接口

软件架构

使用uvicorn、fastapi、openai-whisper等开源库实现高性能接口

使用说明

下载代码
安装 ffmpeg https://ffmpeg.org/download.html
安装依赖项目根目录下执行命令 pip install -r requirements.txt
运行代码项目根目录下执行命令 python main.py

这里的 http://0.0.0.0:3003 就是连接地址。

启动类代码

import atexit import json import os import tempfile import time  import uvicorn from fastapi import FastAPI, UploadFile, File, Security, HTTPException from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials  from whisper_script import WhisperHandler  app = FastAPI() security = HTTPBearer() env_bearer_token = 'sk-tarzan' model_size = os.getenv("MODEL_SIZE", "base") language = os.getenv("LANGUAGE", "Chinese")   def cleanup_temp_file(path):     if os.path.exists(path):         os.remove(path)   with open('options.json', 'r') as options:     # 使用json.load()函数读取并解析文件内容     load_options = json.load(options)   # 语音识别 @app.post("/v1/audio/transcriptions") async def transcribe(file: UploadFile = File(...), credentials: HTTPAuthorizationCredentials = Security(security)):     if env_bearer_token is not None and credentials.credentials != env_bearer_token:         raise HTTPException(status_code=401, detail="Invalid token")     file_bytes = await file.read()     return {"text": audio_to_text(file_bytes, 'transcribe')}   # 语音翻译 @app.post("/v1/audio/translations") async def translate(file: UploadFile = File(...), credentials: HTTPAuthorizationCredentials = Security(security)):     if env_bearer_token is not None and credentials.credentials != env_bearer_token:         raise HTTPException(status_code=401, detail="Invalid token")     file_bytes = await file.read()     return {"text": audio_to_text(file_bytes, 'translate')}   def audio_to_text(file_bytes, task):     start_time = time.time()     max_file_size = 500 * 1024 * 1024     if len(file_bytes) > max_file_size:         raise HTTPException(status_code=400, detail="File is too large")     temp_path = None     try:         with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_audio:             temp_audio.write(file_bytes)             temp_path = temp_audio.name         model_size = load_options.get("model_size")         language = load_options.get("language")         prompts = {             "verbose": load_options.get("verbose"),             "temperature": load_options.get("temperature"),             "compression_ratio_threshold": load_options.get("compression_ratio_threshold"),             "logprob_threshold": load_options.get("logprob_threshold"),             "no_speech_threshold": load_options.get("no_speech_threshold"),             "condition_on_previous_text": load_options.get("condition_on_previous_text"),             "initial_prompt": load_options.get("initial_prompt"),             "word_timestamps": load_options.get("word_timestamps"),             "prepend_punctuations": load_options.get("prepend_punctuations"),             "append_punctuations": load_options.get("append_punctuations")         }         print('temp_path', temp_path)         handler = WhisperHandler(temp_path, model_size=model_size, language=language, task=task, prompt=prompts)         result = handler.transcribe()     except Exception as e:         raise HTTPException(status_code=500, detail=str(e))     finally:         atexit.register(cleanup_temp_file, temp_path)     end_time = time.time()     print(f"audio to text took {end_time - start_time:.2f} seconds")     return result['text']   if __name__ == "__main__":     token = os.getenv("ACCESS_TOKEN")     if token is not None:         env_bearer_token = token     try:         uvicorn.run("main:app", reload=True, host="0.0.0.0", port=3003)     except Exception as e:         print(f"API启动失败！\n报错：\n{e}")

开源地址

项目开源地址： https://gitee.com/taisan/whisper-api

docker

docker打包命令

docker build -t whisper .

2.docker命令启动

gpu显卡模式

docker run -itd --name whisper-api -p 3003:3003 --gpus all --restart=always whisper

默认 ACCESS_TOKEN=sk-tarzan

cpu模式

docker run -itd --name whisper-api -p 3003:3003 --restart=always whisper

默认 ACCESS_TOKEN=sk-tarzan

鉴权模式

docker run -itd --name whisper-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --gpus all --restart=always whisper docker run -itd --name whisper-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --restart=always whisper

yourtoken 修改你设置的鉴权token,接口调用header 里传 Authorization:Bearer sk-tarzan

docker日志查看

docker logs -f [容器id或容器名称]

配置文件

options.json

{   "model_size": "base",   "language": "Chinese" }

可结合one-api,接入FastGPT等rag开源项目使用，使用教程如下：
《Fastgpt接入Whisper本地模型实现语音输入》

支持

资讯

whisper-api语音识别语音翻译高性能兼容openai接口协议的开源项目

whisper-api

介绍

软件架构

使用说明

启动类代码

开源地址

docker

配置文件

相关阅读

广告一刻