text

Manipulation of textual data.

Textual data (pre)processing

remove_punctuation(x[, rm_whitespace])

Removes punctuation from textual data.

get_acronym(text[, only_capitals, ...])

Generates an acronym (in capital letters) from textual data.

extract_words1upper(x[, join_with])

Extracts words from a string by splitting it at occurrences of uppercase letters.

numeral_english_to_arabic(x)

Converts a number written in English words into its equivalent numerical value represented in Arabic numerals.

count_words(text[, lowercase, ...])

Counts the occurrences of each word in the given text.

calculate_idf(documents[, lowercase, ...])

Calculates Inverse Document Frequency (IDF) for a sequence of textual documents.

calculate_tfidf(documents, **kwargs)

Calculates TF-IDF (Term Frequency-Inverse Document Frequency) for the given textual documents.

Textual data analysis

euclidean_distance_between_texts(txt1, txt2)

Computes the Euclidean distance between two sentences.

cosine_similarity_between_texts(txt1, txt2)

Calculates the cosine similarity between two sentences.

find_matched_str(x, lookup_list)

Finds all strings (in a sequence) that match a given string.

find_similar_str(x, lookup_list[, n, ...])

Finds n strings that are similar to x from a sequence of candidates.