count_words¶
- pyhelpers.text.count_words(text, lowercase=False, ignore_punctuation=False, stop_words=None, **kwargs)[source]¶
Count the occurrences of each word in the given text.
- Parameters:
text (str) – The input text from which words will be counted.
lowercase (bool) – Whether to convert the text to lowercase before counting; defaults to
False
.ignore_punctuation (bool) – Whether to exclude punctuation marks from the text; defaults to
False
.stop_words (list[str] | bool | None) –
List of words to be excluded from the word count;
If
stop_words=None
(default), no words are excluded.If
stop_words=True
, NLTK’s built-in stopwords are used.
kwargs – [Optional] Additional parameters for the function nltk.word_tokenize().
- Returns:
A dictionary where keys are unique words and values are their respective counts.
- Return type:
dict
Examples:
>>> from pyhelpers.text import count_words >>> text = 'This is an apple. That is a pear. Hello world!' >>> count_words(text) {'This': 1, 'is': 2, 'an': 1, 'apple': 1, '.': 2, 'That': 1, 'a': 1, 'pear': 1, 'Hello': 1, 'world': 1, '!': 1} >>> count_words(text, lowercase=True) {'this': 1, 'is': 2, 'an': 1, 'apple': 1, '.': 2, 'that': 1, 'a': 1, 'pear': 1, 'hello': 1, 'world': 1, '!': 1} >>> count_words(text, lowercase=True, ignore_punctuation=True) {'this': 1, 'is': 2, 'an': 1, 'apple': 1, 'that': 1, 'a': 1, 'pear': 1, 'hello': 1, 'world': 1} >>> count_words(text, lowercase=True, ignore_punctuation=True, stop_words=['is']) {'this': 1, 'an': 1, 'apple': 1, 'that': 1, 'a': 1, 'pear': 1, 'hello': 1, 'world': 1} >>> count_words(text, lowercase=True, ignore_punctuation=True, stop_words=True) {'apple': 1, 'pear': 1, 'hello': 1, 'world': 1}