text

Manipulation of textual data.

Textual data (pre)processing

remove_punctuation(x[, rm_whitespace])

Remove punctuation and optionally whitespace from textual data.

get_acronym(text[, only_capitals, ...])

Generate an acronym (in capital letters) from textual data.

extract_words1upper(x[, join_with])

Extract words from a string by splitting it at occurrences of uppercase letters.

numeral_english_to_arabic(x)

Convert a number written in English words into its equivalent numerical value represented in Arabic numerals.

count_words(text[, lowercase, ...])

Count the occurrences of each word in the given text.

calculate_idf(documents[, lowercase, ...])

Calculate Inverse Document Frequency (IDF) for a sequence of textual documents.

calculate_tfidf(documents, **kwargs)

Calculate TF-IDF (Term Frequency-Inverse Document Frequency) for the given textual documents.

Textual data analysis

euclidean_distance_between_texts(txt1, txt2)

Compute the Euclidean distance between two sentences.

cosine_similarity_between_texts(txt1, txt2)

Calculate the cosine similarity between two sentences.

find_matched_str(x, lookup_list)

Find all strings in a sequence that match a given string.

find_similar_str(x, lookup_list[, n, ...])

Find n strings that are similar to x from a sequence of candidates.