Manipulation of textual data.

Textual data (pre)processing

get_acronym(text[, only_capitals, ...])

Get an acronym (in capital letters) of an input text.

remove_punctuation(x[, rm_whitespace])

Remove punctuation from string-type data.

extract_words1upper(x[, join_with])

Extract words from a string by spliting it at occurrence of an uppercase letter.


Convert a number written in English words into its equivalent numerical value represented in Arabic numerals.


Count the total for each different word.

calculate_idf(raw_documents[, rm_punc])

Calculate inverse document frequency.

calculate_tf_idf(raw_documents[, rm_punc])

Count term frequency–inverse document frequency.

Textual data analysis

euclidean_distance_between_texts(txt1, txt2)

Compute Euclidean distance of two sentences.

cosine_similarity_between_texts(txt1, txt2)

Calculate cosine similarity of two sentences.

find_matched_str(x, lookup_list)

Find all that are matched with a string from among a sequence of strings.

find_similar_str(x, lookup_list[, n, ...])

Find n strings that are similar to x from among a sequence of candidates.