阅读量:1
计算字符串相似度可以使用一些算法,常用的算法包括编辑距离算法(Levenshtein Distance)、Jaccard相似度等。
以下是使用编辑距离算法计算字符串相似度的示例代码:
def levenshtein_distance(s1, s2): if len(s1) < len(s2): return levenshtein_distance(s2, s1) if len(s2) == 0: return len(s1) previous_row = range(len(s2) + 1) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] def similarity(s1, s2): max_length = max(len(s1), len(s2)) distance = levenshtein_distance(s1, s2) similarity = 1 - distance / max_length return similarity s1 = "hello" s2 = "hallo" similarity_score = similarity(s1, s2) print(f"The similarity score between '{s1}' and '{s2}' is {similarity_score}")
这段代码会计算字符串 “hello” 和 “hallo” 之间的相似度,输出结果为:
The similarity score between 'hello' and 'hallo' is 0.8
你可以根据需要修改代码来计算其他字符串的相似度。