find_similar_str
- pyhelpers.text.find_similar_str(x, lookup_list, n=1, ignore_punctuation=True, engine='difflib', **kwargs)[source]
Find
n
strings that are similar tox
from among a sequence of candidates.- Parameters:
x (str) – a string-type variable
lookup_list (Iterable) – a sequence of strings for lookup
n (int | None) – number of similar strings to return, defaults to
1
; whenn=None
, the function returns a sortedlookup_list
(in the descending order of similarity)ignore_punctuation (bool) – whether to ignore punctuations in the search for similar texts, defaults to
True
engine (str | Callable) –
options include
'difflib'
(default) and'rapidfuzz'
(or simply'fuzz'
)if
engine='difflib'
, the function relies on difflib.get_close_matchesif
engine='rapidfuzz'
(orengine='fuzz'
), the function relies on rapidfuzz.fuzz.QRatio
kwargs – [optional] parameters of difflib.get_close_matches (e.g.
cutoff=0.6
) or rapidfuzz.fuzz.QRatio, depending onengine
- Returns:
a string-type variable that should be similar to (or the same as)
x
- Return type:
str | list | None
Note
Examples:
>>> from pyhelpers.text import find_similar_str >>> lookup_lst = ['Anglia', ... 'East Coast', ... 'East Midlands', ... 'North and East', ... 'London North Western', ... 'Scotland', ... 'South East', ... 'Wales', ... 'Wessex', ... 'Western'] >>> y = find_similar_str(x='angle', lookup_list=lookup_lst) >>> y 'Anglia' >>> y = find_similar_str(x='angle', lookup_list=lookup_lst, n=2) >>> y ['Anglia', 'Wales'] >>> y = find_similar_str(x='angle', lookup_list=lookup_lst, engine='fuzz') >>> y 'Anglia' >>> y = find_similar_str('angle', lookup_lst, n=2, engine='fuzz') >>> y ['Anglia', 'Wales'] >>> y = find_similar_str(x='x', lookup_list=lookup_lst) >>> y is None True >>> y = find_similar_str(x='x', lookup_list=lookup_lst, cutoff=0.25) >>> y 'Wessex' >>> y = find_similar_str(x='x', lookup_list=lookup_lst, n=2, cutoff=0.25) >>> y 'Wessex' >>> y = find_similar_str(x='x', lookup_list=lookup_lst, engine='fuzz') >>> y 'Wessex' >>> y = find_similar_str(x='x', lookup_list=lookup_lst, n=2, engine='fuzz') >>> y ['Wessex', 'Western']