count_words

pyhelpers.text.count_words(text, lowercase=False, ignore_punctuation=False, stop_words=None, **kwargs)[source]

Count the occurrences of each word in the given text.

Parameters:
  • text (str) – The input text from which words will be counted.

  • lowercase (bool) – Whether to convert the text to lowercase before counting; defaults to False.

  • ignore_punctuation (bool) – Whether to exclude punctuation marks from the text; defaults to False.

  • stop_words (list[str] | bool | None) –

    List of words to be excluded from the word count;

    • If stop_words=None (default), no words are excluded.

    • If stop_words=True, NLTK’s built-in stopwords are used.

  • kwargs – [Optional] Additional parameters for the function nltk.word_tokenize().

Returns:

A dictionary where keys are unique words and values are their respective counts.

Return type:

dict

Examples:

>>> from pyhelpers.text import count_words
>>> text = 'This is an apple. That is a pear. Hello world!'
>>> count_words(text)
{'This': 1,
 'is': 2,
 'an': 1,
 'apple': 1,
 '.': 2,
 'That': 1,
 'a': 1,
 'pear': 1,
 'Hello': 1,
 'world': 1,
 '!': 1}
>>> count_words(text, lowercase=True)
{'this': 1,
 'is': 2,
 'an': 1,
 'apple': 1,
 '.': 2,
 'that': 1,
 'a': 1,
 'pear': 1,
 'hello': 1,
 'world': 1,
 '!': 1}
>>> count_words(text, lowercase=True, ignore_punctuation=True)
{'this': 1,
 'is': 2,
 'an': 1,
 'apple': 1,
 'that': 1,
 'a': 1,
 'pear': 1,
 'hello': 1,
 'world': 1}
>>> count_words(text, lowercase=True, ignore_punctuation=True, stop_words=['is'])
{'this': 1,
 'an': 1,
 'apple': 1,
 'that': 1,
 'a': 1,
 'pear': 1,
 'hello': 1,
 'world': 1}
>>> count_words(text, lowercase=True, ignore_punctuation=True, stop_words=True)
{'apple': 1, 'pear': 1, 'hello': 1, 'world': 1}