Stylometry Authentication Datasets
ISOT Twitter Dataset
The dataset consists of a file named "data.zip" which contains 100 files, each corresponding to one of the (100) authors.
The file "TweetCrawling.zip" contains Java source code to retrieve a JSON structure for a specific Tweet ID
Data Preprocessing
The file "Canonicizers_.java" contains some filters (e.g., stopwords) used in our research.