From 7013e3838617f40c31a862bc11dc5983b6e1228b Mon Sep 17 00:00:00 2001 From: Julia Werner Date: Tue, 28 Nov 2023 10:42:06 +0000 Subject: [PATCH] Documentation eeg chbmit --- doc/applications/seizure_detection.md | 44 +++++++++++++++++++++++++++ pydoc-markdown.yml | 5 +++ 2 files changed, 49 insertions(+) create mode 100644 doc/applications/seizure_detection.md diff --git a/doc/applications/seizure_detection.md b/doc/applications/seizure_detection.md new file mode 100644 index 00000000..a3d0f432 --- /dev/null +++ b/doc/applications/seizure_detection.md @@ -0,0 +1,44 @@ +# Dataset creation + +The current used dataset was generated with the 16 channels with the highest variance within the ictal data from the **CHB-MIT Scalp EEG Database** [1]. If a new preprocessed dataset should be created, the ``scripts/eeg/eeg_dataset_creator.py`` file can be used. This takes the edf files from the CHB-MIT dataset as an input and performs a basic preprocessing including a band-pass filtering (0.1 Hz and 50 Hz) to remove DC components as well as the noisy component from the EEG measurement device. It finally creates binary labels for each data fragment. + +Parameters that can be specified are amongst others: + +class_ratio +: default = 5, Ratio of zeros:ones in the final `dev` and `retrain` datasets to account for the scarcity of the ictal data + +data_length +: default = 1, Number of seconds per data point. For a sample rate of 256, the final data will be of the shape (D,C,256xdata_length) + +samp_rate +: default = 256, Sampling rate for the dataset. The data is already sampled at 256 Hz, use this argument only when the rate needed differs from this + +The dataset creation can be invoked by: + + python eeg_dataset_creator.py --output_dir "./data/" --class_ratio 4 --data_length 0.5 + +Currently, the `16c_retrain_id` preprocessed dataset is configured as the default dataset in HANNAH if someone works with the CHB-MIT dataset. This was generated with a `class_ratio` of 4, a `data_length` of 0.5 and a `sampling_rate` of 256. + +[1] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. + + +# Training a base model with the CHB-MIT dataset + +The training with the preprocessed CHB-MIT dataset can be invoked with: + + hannah-train dataset=chbmit features=identity ~normalizer + +The data is normalized directly when loaded and does not need additional normalization during training. `~normalizer` is used to turn additional normalization off. Adding `+dataset.weighted_loss=True` improves the results notably (same applies for the retraining). The trained model can function as a base model which can be fine-tuned for each patient by subsequent retraining. + + +# Retraining + +The main idea of retraining is to account for individual differences within the data. To perform subsequent retraining, for each patient a new model needs to be trained based on the checkpoint of the best model trained on all patients. The prior trained base model is loaded and retrained on patient-specific data. To invoke this training the dataset `chbmitrt` needs to be used as this specifically loads patient-specific data only. For a single patient, the retraining can be invoked by: + + hannah-train dataset=chbmitrt trainer.max_epochs=10 model=tc-res8 module.batch_size=8 input_file=/PATH/TO/BASE_MODEL/best.ckpt ~normalizer + +Alternatively, if the retraining should be performed for all patients, ``scripts/eeg/run_patient_retraining_complete.py`` can be used. To execute this script, one can use + + python run_patient_retraining_complete.py 'model-name' 'dataset_name' + +A model which has been successfully used for this application is for example the TC-ResNet8 `tc-res8` and it is recommended to use `16c_retrain_id` as a dataset name, which names the preprocessd CHB-MIT dataset with balanced class samples. During the retraining, for each patient a results folder is generated. \ No newline at end of file diff --git a/pydoc-markdown.yml b/pydoc-markdown.yml index 44513abe..46e7ce87 100644 --- a/pydoc-markdown.yml +++ b/pydoc-markdown.yml @@ -112,6 +112,11 @@ renderer: - title: "Evaluation" name: eval source: doc/eval.md + - title: "Specific applications" + children: + - title: Seizure Detection + name: applications/seizure_detection + source: doc/applications/seizure_detection.md - title: Development children: - title: Overview