python cli.py [option] [arguments]
Crawls tweets from the most popular twitter users (people with the most followers) and stores them on disk. The list of twitter users is stored in popular_twitter_users.csv.
Flag | Name | Description | Default |
---|---|---|---|
-o | --output-path | The output path of the crawled dataset. | |
--user-limit | The maximum number of accounts to crawl. | 100 | |
--limit | The maximum number of tweets per account to crawl. | 0 (no limit) |
Determines the best suited hyper-parameter combinations for a certain classifier based on a given data set. Only supported for Perceptron and Decision Tree.
Flag | Name | Description | Default |
---|---|---|---|
-s | --data-source | The data source that should be used for classifier analysis. Possible values are fth , mp and twitter . |
|
-p | --dataset-path | The path of the dataset that should be used for classifier analysis. | |
-c | --classifier | The classifier to be analyzed. Possible values are decision_tree and perceptron . |
The report of the analysis is written to disk (./classifier_optimization_report.log
).
Evaluates the anomaly detection approach using cross-validation.
Flag | Name | Description | Default |
---|---|---|---|
-s | --data-source | The data source that should be used for cross-validation. Possible values are fth , mp and twitter . |
|
-p | --dataset-path | The path of the dataset that should be used for cross-validation. | |
-c | --classifier | The classifier to be trained. Possible values are decision_tree , one_class_svm , isolation_forest and perceptron . |
|
--evaluation-rounds | Number of rounds the evaluation is executed. Reduces the variation caused by sampling. | 10 | |
--no-scaling | Disables feature scaling. | ||
-o | --output-path | The path of the file the results should be written to. | evaluation.xlsx |
# Crawl the 50 most popular users' tweets
python cli.py crawl -o twitter_data.csv --user-limit 50
# Analyze the hyper-parameter combinations for the Decision Tree classifier on the crawled twitter dataset
python cli.py tune -s twitter -c decision_tree -p twitter_data.csv
# Evaluate the performance of the Decision Tree classifier on the crawled twitter dataset
python cli.py evaluate -s twitter -c decision_tree -p twitter_data.csv