This repository contains the source code implementation of AURORA and the datasets used to replicate the experimental results of our paper that has been accepted at MODELS'19:
Automated Classification of Metamodel Repositories: A Machine Learning Approach
Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Alfonso Pierantonio(1), Ludovico Iovino(2)
(1) Università degli Studi dell'Aquila, Italy
(2) Gran Sasso Science Institute, Italy
Manual classification methods of metamodel repositories require highly trained personnel and the results are usually influenced by subjectivity of human perception. Therefore, automated metamodel classification is very desirable and stringent. In this work, we apply Machine Learning techniques to automatically classify metamodels. In particular, we implement a tool on top of a feed-forward neural network. An experimental evaluation over a dataset of 555 metamodels demonstrates that the technique permits to learn from manually classified data and effectively categorize incoming unlabeled data with a considerably high prediction rate: the best performance comprehends 95.40% as success rate, 0.945 as precision, 0.938 as recall, and 0.942 as F-1 score.
This repository is organized as follows:
- The TOOLS directory contains the implementation of the different tools we developed:
- TERM-EXTRACTOR: The Java implementation term extractor from metamodels;
- TDM_ENCODER: A set of Python scripts allowing to compute TDMs;
- NEURAL-NETWORKS: This tools classifies metamodels according the TDM values and training set.
- The DATASET directory contains the datasets described in the paper that we use to evaluate AURORA:
- NORMALIZED_MM_REPRESENTATION: plain documents that represent metamodels;
- TDMS: TDMs are extracted from NORMALIZE_MM_REPRESENTATION.
The name AURORA has a nice connotation. Aurora is northern lights where there are distinctive bands of moving, colorful lights, which somehow imply separate metamodel categories. Furthermore, in Italian aurora means "the light of a new day."
The following dataset has been exploited in our evaluation. However, we do not redistribute data from there. We only mine it to produce metadata that can be used as input for AURORA.