This python package can help extract features from a text document based on the paper Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace.
Code was adopted from the Extended-Writeprints repository.
To install from PyPi, run following command:
pip install writeprints
To manually install from the github repository, clone the repository, go into the directory and run:
pip install ./
To extract features from a single text document contained in a python string:
from writeprints.text_processor import Processor
processor = Processor (flatten = False) # Flatten will split vectorized features into individual featurs
features = processor.extract(string)
To extract features from a pandas data frame in which a column named "text" contains the required text documents:
from writeprints.text_processor import Processor
processor = Processor (flatten = False) # Flatten will split vectorized features into individual featurs
features = processor.extract_df(df)
MIT