python怎么对文本进行词频统计

作者

首页»
云计算»
知识库»
python怎么对文本进行词频统计

发布时间:2024-07-13 17:49

阅读量:1

使用Python对文本进行词频统计可以使用下面的步骤：

打开文本文件并读取文本内容。

with open("text.txt", "r") as file:     text = file.read()

对文本进行分词。

import re  # 去除标点符号和空白字符 text = re.sub(r'[^\w\s]', '', text)  # 将文本拆分为单词列表 words = text.split()

统计每个单词的出现次数。

from collections import Counter  word_count = Counter(words)

排序并输出词频结果。

for word, count in word_count.most_common():     print(word, count)

完整的代码如下：

import re from collections import Counter  with open("text.txt", "r") as file:     text = file.read()  text = re.sub(r'[^\w\s]', '', text) words = text.split() word_count = Counter(words)  for word, count in word_count.most_common():     print(word, count)