Blog of Samperson

字符串相似度比较:JavaWuzzy文档

2018-05-09

JavaWuzzy是Java版的FuzzyWuzzy,用于计算字符串之间的匹配度。
github地址:
[1] JavaWuzzy: https://github.com/xdrop/fuzzywuzzy
[2] FuzzyWuzzy: https://github.com/seatgeek/fuzzywuzzy

Maven项目引入

1
2
3
4
5
<dependency>
<groupId>me.xdrop</groupId>
<artifactId>fuzzywuzzy</artifactId>
<version>1.1.10</version>
</dependency>

基本使用

[1] FuzzySearch.ratio(String s1, String s2)
全匹配,对顺序敏感


[2] FuzzySearch.partialRatio(String s1, String s2)
搜索匹配(部分匹配),对顺序敏感


[3] FuzzySearch.tokenSortRatio(String s1, String s2)
首先做排序,然后全匹配,对顺序不敏感(也就是更换单词位置之后,相似度依然会很高)


[4] FuzzySearch.tokenSortPartialRatio(String s1, String s2)
首先做排序,然后搜索匹配(部分匹配),对顺序不敏感


[5] FuzzySearch.tokenSetRatio(String s1, String s2)
首先取集合(去掉重复词),然后全匹配,对顺序不敏感,第二个字符串包含第一个字符串就100


[6] FuzzySearch.tokenSetPartialRatio(String s1, String s2)
首先取集合,然后搜索匹配(部分匹配),对顺序不敏感


[7] FuzzySearch.weightedRatio(String s1, String s2)
对顺序敏感,算法不同

高级使用

[1] FuzzySearch.extractOne(String s, String[] list)
提出一个匹配度最高的

例子:FuzzySearch.extractOne(“cowboys”, [“Atlanta Falcons”, “New York Jets”, “New York Giants”, “Dallas Cowboys”])
(string: Dallas Cowboys, score: 90, index: 3)



[2] FuzzySearch.extractTop(String s, String[] list, int num)
提出num个匹配度最高的

例子:FuzzySearch.extractTop(“goolge”, [“google”, “bing”, “facebook”, “linkedin”, “twitter”, “googleplus”, “bingnews”, “plexoogl”], 3)
[(string: google, score: 83, index: 0), (string: googleplus, score: 63, index:5), (string: plexoogl, score: 43, index: 7)]



[3] FuzzySearch.extractAll(String s, String[] list)
计算list中所有String的匹配度

例子:FuzzySearch.extractAll(“goolge”, [“google”, “bing”, “facebook”, “linkedin”, “twitter”, “googleplus”, “bingnews”, “plexoogl”]);
[(string: google, score: 83, index: 0), (string: bing, score: 20, index: 1), (string: facebook, score: 29, index: 2), (string: linkedin, score: 29, index: 3), (string: twitter, score: 15, index: 4), (string: googleplus, score: 63, index: 5), (string: bingnews, score: 29, index: 6), (string: plexoogl, score: 43, index: 7)]



[4] FuzzySearch.extractAll(String s, String[] list, int score)
计算list中所有String的匹配度,并列出score以上的

例子:FuzzySearch.extractAll(“goolge”, [“google”, “bing”, “facebook”, “linkedin”, “twitter”, “googleplus”, “bingnews”, “plexoogl”], 40)
[(string: google, score: 83, index: 0), (string: googleplus, score: 63, index: 5), (string: plexoogl, score: 43, index: 7)]



[5] FuzzySearch.extractSorted(String s, String[] list)
计算list中所有String的匹配度,并按顺序排列

例子:FuzzySearch.extractSorted(“goolge”, [“google”, “bing”, “facebook”, “linkedin”, “twitter”, “googleplus”, “bingnews”, “plexoogl”]);
[(string: google, score: 83, index: 0), (string: googleplus, score: 63, index: 5), (string: plexoogl, score: 43, index: 7), (string: facebook, score: 29, index: 2), (string: linkedin, score: 29, index: 3), (string: bingnews, score: 29, index: 6), (string: bing, score: 20, index: 1), (string: twitter, score: 15, index: 4)]



[6] FuzzySearch.extractSorted(String s, String[] list, int score)
计算list中所有String的匹配度,并列出score以上的,按顺序排列

例子:FuzzySearch.extractSorted(“goolge”, [“google”, “bing”, “facebook”, “linkedin”, “twitter”, “googleplus”, “bingnews”, “plexoogl”], 3);
[(string: google, score: 83, index: 0), (string: googleplus, score: 63, index: 5), (string: plexoogl, score: 43, index: 7)]

参考资料

https://blog.csdn.net/sunyao_123/article/details/76942809