Manipulation of textual data.

Basic processing of textual data

get_acronym(text[, only_capitals, ...])

Get an acronym (in capital letters) of an input text.

remove_punctuation(x[, rm_whitespace])

Remove punctuation from string-type data.

extract_words1upper(x[, join_with])

Extract words from a string by spliting it at occurrence of an uppercase letter.

Comparison of textual data

find_matched_str(x, lookup_list)

Find all that are matched with a string from among a sequence of strings.

find_similar_str(x, lookup_list[, n, ...])

From among a sequence of strings, find n ones that are similar to x.

Basic computation of textual data


Count the total for each different word.

calculate_idf(raw_documents[, rm_punc])

Calculate inverse document frequency.

calculate_tf_idf(raw_documents[, rm_punc])

Count term frequency–inverse document frequency.

euclidean_distance_between_texts(txt1, txt2)

Compute Euclidean distance of two sentences.

cosine_similarity_between_texts(txt1, txt2)

Calculate cosine similarity of two sentences.

Transformation of textual data

convert_md_to_rst(path_to_md, path_to_rst[, ...])

Convert a Markdown file (.md) to a reStructuredText (.rst) file.