calculate_tfidf¶
- pyhelpers.text.calculate_tfidf(documents, **kwargs)[source]¶
Calculate TF-IDF (Term Frequency-Inverse Document Frequency) for the given textual documents.
TF (Term Frequency) measures how frequently a term appears in a document relative to its length. IDF (Inverse Document Frequency) measures how important a term is across the entire corpus of documents.
- Parameters:
documents (Iterable | Sequence) – A sequence of textual data.
kwargs – [Optional] Additional parameters for the function
calculate_idf()
; also refer tocount_words()
.
- Returns:
TF-IDF values for the input textual data, represented as a dictionary.
- Return type:
dict
Examples:
>>> from pyhelpers.text import calculate_tfidf >>> documents = [ ... 'This is an apple.', ... 'That is a pear.', ... 'It is human being.', ... 'Hello world!'] >>> tfidf = calculate_tfidf(documents) >>> tfidf {'this': 0.6931471805599453, 'is': 0.0, 'an': 0.6931471805599453, 'apple': 0.6931471805599453, 'that': 0.6931471805599453, 'a': 0.6931471805599453, 'pear': 0.6931471805599453, 'it': 0.6931471805599453, 'human': 0.6931471805599453, 'being': 0.6931471805599453, 'hello': 0.6931471805599453, 'world': 0.6931471805599453} >>> tfidf = calculate_tfidf(documents, lowercase=False) >>> tfidf {'This': 0.6931471805599453, 'is': 0.0, 'an': 0.6931471805599453, 'apple': 0.6931471805599453, 'That': 0.6931471805599453, 'a': 0.6931471805599453, 'pear': 0.6931471805599453, 'It': 0.6931471805599453, 'human': 0.6931471805599453, 'being': 0.6931471805599453, 'Hello': 0.6931471805599453, 'world': 0.6931471805599453}