【人工智能】Transformers之Pipeline（概述）：30w+大模型极简应用_业界新闻

发布时间:2024-07-13 23:11

阅读量:2

一、引言

二、pipeline库

2.1 概述

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

2.2.2 task列表

2.2.3 task默认模型

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

2.3.2 查看model与task的对应关系

三、总结

一、引言

pipeline（管道）是huggingface transformers库中一种极简方式使用大模型推理的抽象，将所有大模型分为语音（Audio）、计算机视觉（Computer vision）、自然语言处理（NLP）、多模态（Multimodal）等4大类，28小类任务（tasks）。共计覆盖32万个模型

本文对pipeline进行整体介绍，之后本专栏以每个task为主题，分别介绍各种task使用方法。

二、pipeline库

2.1 概述

管道是一种使用模型进行推理的简单而好用的方法。这些管道是从库中抽象出大部分复杂代码的对象，提供了专用于多项任务的简单 API，包括命名实体识别、掩码语言建模、情感分析、特征提取和问答。在使用上，主要有2种方法

使用task实例化pipeline对象
使用model实例化pipeline对象

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

自动语音识别的task为automatic-speech-recognition：

import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" os.environ["CUDA_VISIBLE_DEVICES"] = "2"  from transformers import pipeline  speech_file = "./output_video_enhanced.mp3" pipe = pipeline(task="automatic-speech-recognition") result = pipe(speech_file) print(result)

2.2.2 task列表

task共计28类，按首字母排序，列表如下，直接替换2.2.1代码中的pipeline的task即可应用：

"audio-classification"：将返回一个AudioClassificationPipeline。
"automatic-speech-recognition"：将返回一个AutomaticSpeechRecognitionPipeline。
"depth-estimation"：将返回一个DepthEstimationPipeline。
"document-question-answering"：将返回一个DocumentQuestionAnsweringPipeline。
"feature-extraction"：将返回一个FeatureExtractionPipeline。
"fill-mask"：将返回一个FillMaskPipeline：。
"image-classification"：将返回一个ImageClassificationPipeline。
"image-feature-extraction"：将返回一个ImageFeatureExtractionPipeline。
"image-segmentation"：将返回一个ImageSegmentationPipeline。
"image-to-image"：将返回一个ImageToImagePipeline。
"image-to-text"：将返回一个ImageToTextPipeline。
"mask-generation"：将返回一个MaskGenerationPipeline。
"object-detection"：将返回一个ObjectDetectionPipeline。
"question-answering"：将返回一个QuestionAnsweringPipeline。
"summarization"：将返回一个SummarizationPipeline。
"table-question-answering"：将返回一个TableQuestionAnsweringPipeline。
"text2text-generation"：将返回一个Text2TextGenerationPipeline。
"text-classification"("sentiment-analysis"可用别名)：将返回一个 TextClassificationPipeline。
"text-generation"：将返回一个TextGenerationPipeline：。
"text-to-audio"（"text-to-speech"可用别名）：将返回一个TextToAudioPipeline：。
"token-classification"("ner"可用别名)：将返回一个TokenClassificationPipeline。
"translation"：将返回一个TranslationPipeline。
"translation_xx_to_yy"：将返回一个TranslationPipeline。
"video-classification"：将返回一个VideoClassificationPipeline。
"visual-question-answering"：将返回一个VisualQuestionAnsweringPipeline。
"zero-shot-classification"：将返回一个ZeroShotClassificationPipeline。
"zero-shot-image-classification"：将返回一个ZeroShotImageClassificationPipeline。
"zero-shot-audio-classification"：将返回一个ZeroShotAudioClassificationPipeline。
"zero-shot-object-detection"：将返回一个ZeroShotObjectDetectionPipeline。

2.2.3 task默认模型

针对每一个task，pipeline默认配置了模型，可以通过pipeline源代码查看：

SUPPORTED_TASKS = {     "audio-classification": {         "impl": AudioClassificationPipeline,         "tf": (),         "pt": (AutoModelForAudioClassification,) if is_torch_available() else (),         "default": {"model": {"pt": ("superb/wav2vec2-base-superb-ks", "372e048")}},         "type": "audio",     },     "automatic-speech-recognition": {         "impl": AutomaticSpeechRecognitionPipeline,         "tf": (),         "pt": (AutoModelForCTC, AutoModelForSpeechSeq2Seq) if is_torch_available() else (),         "default": {"model": {"pt": ("facebook/wav2vec2-base-960h", "55bb623")}},         "type": "multimodal",     },     "text-to-audio": {         "impl": TextToAudioPipeline,         "tf": (),         "pt": (AutoModelForTextToWaveform, AutoModelForTextToSpectrogram) if is_torch_available() else (),         "default": {"model": {"pt": ("suno/bark-small", "645cfba")}},         "type": "text",     },     "feature-extraction": {         "impl": FeatureExtractionPipeline,         "tf": (TFAutoModel,) if is_tf_available() else (),         "pt": (AutoModel,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("distilbert/distilbert-base-cased", "935ac13"),                 "tf": ("distilbert/distilbert-base-cased", "935ac13"),             }         },         "type": "multimodal",     },     "text-classification": {         "impl": TextClassificationPipeline,         "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),         "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),                 "tf": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),             },         },         "type": "text",     },     "token-classification": {         "impl": TokenClassificationPipeline,         "tf": (TFAutoModelForTokenClassification,) if is_tf_available() else (),         "pt": (AutoModelForTokenClassification,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),                 "tf": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),             },         },         "type": "text",     },     "question-answering": {         "impl": QuestionAnsweringPipeline,         "tf": (TFAutoModelForQuestionAnswering,) if is_tf_available() else (),         "pt": (AutoModelForQuestionAnswering,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),                 "tf": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),             },         },         "type": "text",     },     "table-question-answering": {         "impl": TableQuestionAnsweringPipeline,         "pt": (AutoModelForTableQuestionAnswering,) if is_torch_available() else (),         "tf": (TFAutoModelForTableQuestionAnswering,) if is_tf_available() else (),         "default": {             "model": {                 "pt": ("google/tapas-base-finetuned-wtq", "69ceee2"),                 "tf": ("google/tapas-base-finetuned-wtq", "69ceee2"),             },         },         "type": "text",     },     "visual-question-answering": {         "impl": VisualQuestionAnsweringPipeline,         "pt": (AutoModelForVisualQuestionAnswering,) if is_torch_available() else (),         "tf": (),         "default": {             "model": {"pt": ("dandelin/vilt-b32-finetuned-vqa", "4355f59")},         },         "type": "multimodal",     },     "document-question-answering": {         "impl": DocumentQuestionAnsweringPipeline,         "pt": (AutoModelForDocumentQuestionAnswering,) if is_torch_available() else (),         "tf": (),         "default": {             "model": {"pt": ("impira/layoutlm-document-qa", "52e01b3")},         },         "type": "multimodal",     },     "fill-mask": {         "impl": FillMaskPipeline,         "tf": (TFAutoModelForMaskedLM,) if is_tf_available() else (),         "pt": (AutoModelForMaskedLM,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("distilbert/distilroberta-base", "ec58a5b"),                 "tf": ("distilbert/distilroberta-base", "ec58a5b"),             }         },         "type": "text",     },     "summarization": {         "impl": SummarizationPipeline,         "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),         "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),         "default": {             "model": {"pt": ("sshleifer/distilbart-cnn-12-6", "a4f8f3e"), "tf": ("google-t5/t5-small", "d769bba")}         },         "type": "text",     },     # This task is a special case as it's parametrized by SRC, TGT languages.     "translation": {         "impl": TranslationPipeline,         "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),         "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),         "default": {             ("en", "fr"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},             ("en", "de"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},             ("en", "ro"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},         },         "type": "text",     },     "text2text-generation": {         "impl": Text2TextGenerationPipeline,         "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),         "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),         "default": {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},         "type": "text",     },     "text-generation": {         "impl": TextGenerationPipeline,         "tf": (TFAutoModelForCausalLM,) if is_tf_available() else (),         "pt": (AutoModelForCausalLM,) if is_torch_available() else (),         "default": {"model": {"pt": ("openai-community/gpt2", "6c0e608"), "tf": ("openai-community/gpt2", "6c0e608")}},         "type": "text",     },     "zero-shot-classification": {         "impl": ZeroShotClassificationPipeline,         "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),         "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("facebook/bart-large-mnli", "c626438"),                 "tf": ("FacebookAI/roberta-large-mnli", "130fb28"),             },             "config": {                 "pt": ("facebook/bart-large-mnli", "c626438"),                 "tf": ("FacebookAI/roberta-large-mnli", "130fb28"),             },         },         "type": "text",     },     "zero-shot-image-classification": {         "impl": ZeroShotImageClassificationPipeline,         "tf": (TFAutoModelForZeroShotImageClassification,) if is_tf_available() else (),         "pt": (AutoModelForZeroShotImageClassification,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("openai/clip-vit-base-patch32", "f4881ba"),                 "tf": ("openai/clip-vit-base-patch32", "f4881ba"),             }         },         "type": "multimodal",     },     "zero-shot-audio-classification": {         "impl": ZeroShotAudioClassificationPipeline,         "tf": (),         "pt": (AutoModel,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("laion/clap-htsat-fused", "973b6e5"),             }         },         "type": "multimodal",     },     "image-classification": {         "impl": ImageClassificationPipeline,         "tf": (TFAutoModelForImageClassification,) if is_tf_available() else (),         "pt": (AutoModelForImageClassification,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("google/vit-base-patch16-224", "5dca96d"),                 "tf": ("google/vit-base-patch16-224", "5dca96d"),             }         },         "type": "image",     },     "image-feature-extraction": {         "impl": ImageFeatureExtractionPipeline,         "tf": (TFAutoModel,) if is_tf_available() else (),         "pt": (AutoModel,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("google/vit-base-patch16-224", "3f49326"),                 "tf": ("google/vit-base-patch16-224", "3f49326"),             }         },         "type": "image",     },     "image-segmentation": {         "impl": ImageSegmentationPipeline,         "tf": (),         "pt": (AutoModelForImageSegmentation, AutoModelForSemanticSegmentation) if is_torch_available() else (),         "default": {"model": {"pt": ("facebook/detr-resnet-50-panoptic", "fc15262")}},         "type": "multimodal",     },     "image-to-text": {         "impl": ImageToTextPipeline,         "tf": (TFAutoModelForVision2Seq,) if is_tf_available() else (),         "pt": (AutoModelForVision2Seq,) if is_torch_available() else (),         "default": {             "model": {                 "pt": ("ydshieh/vit-gpt2-coco-en", "65636df"),                 "tf": ("ydshieh/vit-gpt2-coco-en", "65636df"),             }         },         "type": "multimodal",     },     "object-detection": {         "impl": ObjectDetectionPipeline,         "tf": (),         "pt": (AutoModelForObjectDetection,) if is_torch_available() else (),         "default": {"model": {"pt": ("facebook/detr-resnet-50", "2729413")}},         "type": "multimodal",     },     "zero-shot-object-detection": {         "impl": ZeroShotObjectDetectionPipeline,         "tf": (),         "pt": (AutoModelForZeroShotObjectDetection,) if is_torch_available() else (),         "default": {"model": {"pt": ("google/owlvit-base-patch32", "17740e1")}},         "type": "multimodal",     },     "depth-estimation": {         "impl": DepthEstimationPipeline,         "tf": (),         "pt": (AutoModelForDepthEstimation,) if is_torch_available() else (),         "default": {"model": {"pt": ("Intel/dpt-large", "e93beec")}},         "type": "image",     },     "video-classification": {         "impl": VideoClassificationPipeline,         "tf": (),         "pt": (AutoModelForVideoClassification,) if is_torch_available() else (),         "default": {"model": {"pt": ("MCG-NJU/videomae-base-finetuned-kinetics", "4800870")}},         "type": "video",     },     "mask-generation": {         "impl": MaskGenerationPipeline,         "tf": (),         "pt": (AutoModelForMaskGeneration,) if is_torch_available() else (),         "default": {"model": {"pt": ("facebook/sam-vit-huge", "997b15")}},         "type": "multimodal",     },     "image-to-image": {         "impl": ImageToImagePipeline,         "tf": (),         "pt": (AutoModelForImageToImage,) if is_torch_available() else (),         "default": {"model": {"pt": ("caidas/swin2SR-classical-sr-x2-64", "4aaedcb")}},         "type": "image",     }, }

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

如果不想使用task中默认的模型，可以指定huggingface中的模型：

import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" os.environ["CUDA_VISIBLE_DEVICES"] = "2"  from transformers import pipeline  speech_file = "./output_video_enhanced.mp3" #transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium") pipe = pipeline(model="openai/whisper-medium") result = pipe(speech_file) print(result)

2.3.2 查看model与task的对应关系

可以登录https://huggingface.co/tasks查看

三、总结

本文为transformers之pipeline专栏的第0篇，后面会以每个task为一篇，共计讲述28+个tasks的用法，通过28个tasks的pipeline使用学习，可以掌握语音、计算机视觉、自然语言处理、多模态乃至强化学习等30w+个huggingface上的开源大模型。让你成为大模型领域的专家！

期待您的3连+关注，如何还有时间，欢迎阅读我的其他文章：

《AI—工程篇》

AI智能体研发之路-工程篇（一）：Docker助力AI智能体开发提效

AI智能体研发之路-工程篇（二）：Dify智能体开发平台一键部署

AI智能体研发之路-工程篇（三）：大模型推理服务框架Ollama一键部署

AI智能体研发之路-工程篇（四）：大模型推理服务框架Xinference一键部署

AI智能体研发之路-工程篇（五）：大模型推理服务框架LocalAI一键部署

《AI—模型篇》

AI智能体研发之路-模型篇（一）：大模型训练框架LLaMA-Factory在国内网络环境下的安装、部署及使用

AI智能体研发之路-模型篇（二）：DeepSeek-V2-Chat 训练与推理实战

AI智能体研发之路-模型篇（三）：中文大模型开、闭源之争

AI智能体研发之路-模型篇（四）：一文入门pytorch开发