calculate_tf_idf
- pyhelpers.text.calculate_tf_idf(raw_documents, rm_punc=False)
Count term frequency–inverse document frequency.
- Parameters
raw_documents (Iterable or Sequence) – a sequence of textual data
rm_punc (bool) – whether to remove punctuation from the input textual data, defaults to
False
- Returns
tf-idf of the input textual data
- Return type
dict
Examples:
>>> from pyhelpers.text import calculate_tf_idf >>> raw_doc = [ ... 'This is an apple.', ... 'That is a pear.', ... 'It is human being.', ... 'Hello world!'] >>> docs_tf_idf_ = calculate_tf_idf(raw_documents=raw_doc) >>> docs_tf_idf_ {'This': 0.6931471805599453, 'is': 0.0, 'an': 0.6931471805599453, 'apple': 0.6931471805599453, '.': 0.0, 'That': 0.6931471805599453, 'a': 0.6931471805599453, 'pear': 0.6931471805599453, 'It': 0.6931471805599453, 'human': 0.6931471805599453, 'being': 0.6931471805599453, 'Hello': 0.6931471805599453, 'world': 0.6931471805599453, '!': 0.6931471805599453} >>> docs_tf_idf_ = calculate_tf_idf(raw_documents=raw_doc, rm_punc=True) >>> docs_tf_idf_ {'This': 0.6931471805599453, 'is': 0.0, 'an': 0.6931471805599453, 'apple': 0.6931471805599453, 'That': 0.6931471805599453, 'a': 0.6931471805599453, 'pear': 0.6931471805599453, 'It': 0.6931471805599453, 'human': 0.6931471805599453, 'being': 0.6931471805599453, 'Hello': 0.6931471805599453, 'world': 0.6931471805599453}