Skip to content

Architecture

Linda edited this page Aug 3, 2023 · 40 revisions

Main components

"Let(')s audit Learning Analytics" (LaLA) consists mainly of the following classes:

laaudit classes LaLA class diagram exported from PHPStorm

These components, their relationships, and how they interact with the Moodle Learning Analytics (LA) system are described in the following.

Model configurations and versions

The original Moodle model (see the Moodle LA API diagram) is re-interpreted by LaLA in two parts: The model configuration (class model_configuration) and the model versions that can be produced with this configuration (class model_version).

Model configuration

Upon first access to the plugin page, for each existing Moodle model a LaLA model configuration is automatically created and stored in the database. The logic of creating model configurations is currently handled by the model_configurations class.

⚠️ LaLA ignores static models that do not use machine learning.

The model configuration saves a loose reference to the Moodle model and copies its properties and settings: target, predictions processor, analysis interval type and indicators. If some properties are not set for the Moodle model, meaningful defaults are chosen. The currently set context IDs that limit the scope of the Moodle model are stored as the default context IDs in the model configuration.

🛠️ In a future version of LaLA, one will be able to set the scope by context ID for a model version when creating it. This is to help selecting which data from the Moodle instance to use as training and testing data.

🛠️ LaLA will currently always train a Logistic Regression model, no matter which predictions processor is configured to be used. In a future version, LaLA will be able to used different predictions processors.

Model version

The model version is one possible model that is created from a model configuration. One can create multiple model versions of the same configuration and each time, the trained model will be a bit different, due to the random selection of training data from the overall data and the nature of machine learning. The model version i.a. stores the relative test set size (by default 0.2, like for the Moodle models), the included context IDs (or null if all contexts are in the scope), and whether an error occurred when creating the model version.

The model version creation is triggered through the secured endpoint /admin/tool/laaudit/modelversion.php?configid=<configid>. After accessing the endpoint, one is redirected to the new version on the index page.

The model version creation is split into multiple steps, which each add a piece of evidence for the concerned model version.The process makes use of object-oriented programming: The evidence types inherit directly or indirectly from an abstract class evidence and each implement the methods collect(array $options) and store(). So, for each step in the model version creation, an options array is constructed, then the collect(array $options) method of the evidence is triggered, then there might be some post-processing for anonymization, before finally calling the evidence's store() method. The collect($options) and store() methods are described below under "evidence". In the model version object, the evidence is stored in a multi-dimensional indexed array (evidence[$evidencetype][$evidenceID]).

The model version creation follows the following steps, after each of which the evidence is stored.

  1. First, gather_dataset(bool $anonymous = true) triggers the collection of data from the Moodle platform, that will be used for the model version. If necessary, an ID-map is created, and the collected data is anonymized with it.
  2. Then, split_training_test_data() triggers the splitting of the previously collected data into training and testing data sets. Note that the data is shuffled first, in order to create a random split.
  3. In the third step, train() triggers the training of a Logistic Regression model using the training data gathered before.
  4. Fourth, predict() triggers the generation of predictions of the trained Logistic Regression model for the test dataset.
  5. The final step is the collection of data related to the dataset gathered in step #1 using the method gather_related_data(bool $anonymous = true). First is analyzed which tables relate to the main table, recursively (see a more detailed explanation). Then for each of these tables the relevant data is collected. If necessary, ID-maps are created for each table and each table is anonymized.

Upon finishing the model version, an event model_version_created (in the event folder) logs who created a new version of which model configuration.

Immutability

The model configuration is immutable, and so is the model version once it is trained. If a Moodle model is updated, a new model configuration is added with properties copied from the updated Moodle model. If a Moodle model is deleted, the model configuration continues to exist. Model versions are not affected by Moodle model updates or deletions. This is to ensure reproducible and trustworthy audits.

Evidence

As described previously, model versions follow a process where after each step, one or multiple peaces of evidence are stored. LaLA currently implements six two types of evidence (dataset and model), with four sub-classes of dataset (training_dataset, test_dataset, predictions_dataset, and related_data) and two anonymized variations of evidence (dataset_anonymized and related_data_anonymized).

Storing (store()) is implemented in the abstract class evidence and stores the collected evidence on the server. The store() method first serializes the collected data into a string and then creates a file from the string at the location /evidence/modelversion<VERSIONID>-evidence<EVIDENCENAME><EVIDENCEID>.<FILETYPE>. The location differs for related data. Here, -TABLENAME is inserted after the original file name and before the ., so that we know what kind of related data this file contains (e.g. user, course). How the evidence's raw data is serialized is implemented by both direct children of evidence, as well as by related_data.

The evidence collection (collect(array $options)) and the validation of the $options array are implemented differently for almost each type of evidence. See the following sections.

Dataset

The dataset requires $options to contain an array of context IDs (can be empty) (contexts), an analyser (type core_analytics\local\analyser\base) and the id of the original Moodle model (modelid).

Training dataset

Test dataset

Predictions dataset

Related data

Privacy components

Security components

Three new capabilities tool/laaudit:viewpagecontent, tool/laaudit:downloadevidence and tool/laaudit:createmodelversion, as well as a new role auditor (See Security) are added in the db folder. The capability to serve files is defined in lib.php.

Other components

Additionally, the following features are implemented and can be found in LaLA's source code:

  • Mustache templates (in the templates folder) and output renderers (in the output folder).
  • A plugin page (index.php) that sets some page properties and loads the root renderer. All available model configurations, versions and evidence are available on this page.
  • The addition of the link to the plugin page to the admin menu under analytics (in settings.php) and for auditors on the front page (in lib.php).
  • Translatable strings (in /lang/en/).
  • Development branches of the plugin on GitHub additionally contain a test directory for PHPUnit tests. This folder is removed in release branches.
Clone this wiki locally