find_similar_str

pyhelpers.text.find_similar_str(x, lookup_list, n=1, ignore_punctuation=True, engine='difflib', **kwargs)

Find n strings that are similar to x from among a sequence of candidates.

Parameters
  • x (str) – a string-type variable

  • lookup_list (Iterable) – a sequence of strings for lookup

  • n (int or None) – number of similar strings to return, defaults to 1; when n=None, the function returns a sorted lookup_list (in the descending order of similarity)

  • ignore_punctuation (bool) – whether to ignore punctuations in the search for similar texts, defaults to True

  • engine (str or Callable) –

    options include 'difflib' (default) and 'fuzzywuzzy'

  • kwargs – [optional] parameters of difflib.get_close_matches (e.g. cutoff=0.6) or fuzzywuzzy.fuzz.token_set_ratio, depending on engine

Returns

a string-type variable that should be similar to (or the same as) x

Return type

str or list or None

Note

  • By default, the function uses the built-in module difflib; when we set the parameter engine='fuzzywuzzy', the function then relies on FuzzyWuzzy, which is not an essential dependency for installing pyhelpers. We could however use pip (or conda) to install it first separately.

Examples:

>>> from pyhelpers.text import find_similar_str

>>> lookup_lst = ['Anglia',
...               'East Coast',
...               'East Midlands',
...               'North and East',
...               'London North Western',
...               'Scotland',
...               'South East',
...               'Wales',
...               'Wessex',
...               'Western']

>>> y = find_similar_str(x='angle', lookup_list=lookup_lst)
>>> y
'Anglia'
>>> y = find_similar_str(x='angle', lookup_list=lookup_lst, n=2)
>>> y
['Anglia', 'Wales']

>>> y = find_similar_str(x='angle', lookup_list=lookup_lst, engine='fuzzywuzzy')
>>> y
'Anglia'
>>> y = find_similar_str('angle', lookup_lst, n=2, engine='fuzzywuzzy')
>>> y
['Anglia', 'Wales']

>>> y = find_similar_str(x='x', lookup_list=lookup_lst)
>>> y is None
True
>>> y = find_similar_str(x='x', lookup_list=lookup_lst, cutoff=0.25)
>>> y
'Wessex'
>>> y = find_similar_str(x='x', lookup_list=lookup_lst, n=2, cutoff=0.25)
>>> y
'Wessex'

>>> y = find_similar_str(x='x', lookup_list=lookup_lst, engine='fuzzywuzzy')
>>> y
'Wessex'
>>> y = find_similar_str(x='x', lookup_list=lookup_lst, n=2, engine='fuzzywuzzy')
>>> y
['Wessex', 'Western']