LipoCLEAN is a command line tool. Usage instructions can be obtained using the --help
flag. All options, including the location of MS-DIAL export files to analyze, are given to the tool in a TOML formatted text file. Default options files for MS-DIAL 4 and 5 can be obtained using the --print MSD4
or --print MSD5
command line arguments, respectively. These will create an options.txt
file that you can edit. There are three ways to install and run LipoCLEAN: as an executable, as a Python package, and as a Docker container, see the detailed instructions below for each version.
This method requires no installation but it is somewhat slower than the other options.
- Download the executable for your operating system and
example_analysis.zip
from the releases page. - Extract
example_analysis.zip
and navigate to the contents. - Add
LipoCLEAN.exe
to this folder. - Open the folder with these files in a terminal such as
cmd.exe
orbash
. - Run
LipoCLEAN.exe --options example_analysis_options.txt
. On some systems the warningNo module named 'brainpy._c.composition'
will be displayed. This is not an error and does not impact the running of the tool. - The results will be in a folder named
example_output/
theexample_output/QC/
folder contains several plots to assess the quality of the results. A log file containing information about the run namedLipoCLEAN.log
will also be generated. - If you want a default version of the options file run
LipoCLEAN.exe --print MSD4
. - To use the tool on other data edit the
options.txt
file.
To set up the Conda environment for the tool:
- Download the
Source code (zip)
from the releases page and extract the contents. - Open a terminal such as
cmd.exe
orbash
and navigate to theenvironments
directory in the extracted folder. - Run
conda env create -p lipo_env --file lipoCLEAN.yml
. On some systems this may be slow. - Run
conda activate ./lipo_env
- Navigate to the repository root (the
LipoCLEAN-{version}
folder). - Run
pip install .
- Download
example_analysis.zip
from the releases page. - Extract
example_analysis.zip
and navigate to the contents. - Run
python -m lipoCLEAN --options example_analysis_options.txt
in the analysis folder. On some systems the warningNo module named 'brainpy._c.composition'
will be displayed. This is not an error and does not impact the running of the tool. - The results will be in a folder named
example_output/
theexample_output/QC/
folder contains several plots to assess the quality of the results. A log file containing information about the run namedLipoCLEAN.log
will also be generated. - If you want a default version of the options file run
python -m lipoCLEAN --print MSD4
- To use the tool on other data edit the
options.txt
file.
The Docker container has trained models provided under /models/
. To use these get the default options.txt from step 6:
- Run
docker pull stavisvols/lipoclean
- Download
example_analysis.zip
from the releases page. - Extract
example_analysis.zip
and navigate to the contents. - Run
docker run --rm -v /path/to/your/data/:/data/ stavisvols/lipoclean python -m lipoCLEAN --options /data/docker_example_analysis_options.txt
- The results will be in a folder named
example_output/
theexample_output/QC/
folder contains several plots to assess the quality of the results. A log file containing information about the run namedLipoCLEAN.log
will also be generated. - If you want the default docker options file run
docker run --rm -v /path/to/your/data/:/data/ stavisvols/lipoclean python -m lipoCLEAN --print MSD4
- To use the tool on other data edit the
options.txt
file.
- Click "Export" along the top bar
- Select "Alignment result" in the dropdown menu
- Navigate to the directory (folder) to which you want to save the export using the "Browse" button
- The last alignment result selected should be listed as the export file. If this isn't the correct alignment, select the right one in the dropdown
- Select "m/z matrix" to be exported (deselect any other exports you do not want to generate)
- Make sure blank filtering is NOT selected
- "Export format" should be "msp"
- Click Export
A .txt will now be generated in the chosen directory with the information required for LipoCLEAN. The file name will start with "Mz"
- Start with MS-DIAL exports using the same settings as described above for inference.
- Add a column named
label
which contains 0 for incorrect IDs, 1 for correct IDs, and is otherwise left blank. It is critical that this column be before (to the left of) theMS/MS spectrum
column as all subsequent columns (those to the right) are assumed to be m/z data. - Save in a tab-delimited format.
The tool is capable of being trained on multiple input files. The retention time correction is run on a per-input-file basis. Multiple experiments can be used to generate training data, but it is suggested that they are input as separate files for chromatography alignment purposes.
Instrument | Source | N | Model | Organism |
---|---|---|---|---|
Q-Exactive | MTBLS5583 | 742 | QE_Pro_model | Canis familiaris |
LTQ Velos Pro | in-house | 1076 | QE_Pro_model | Aspergillus fumigatus |
LTQ Velos Pro | in-house | 545 | QE_Pro_model | Laccaria bicolor |
TripleTOF 6600 | MTBLS4108 | 1125 | TOF_model | Rattus norvegicus |
Our tests have shown that a model will likely generalize to a family of instruments but that this has limits. We expect that the QE_Pro_model will work for all Orbitrap systems. We do not have the data necessary to know how well the TOF model will generalize to all TOF instruments so if you are working with e.g. TimsTOF data it would be a good idea to do an initial validation of the output. The publicly available datasets used were reprocessed from raw files and annotated in-house.
Our tool supports both MS-DIAL 4 and 5. However, some columns were renamed and scaled differently between the two versions so a model trained on one version's data will not work with the other. Both the options file and the model are specific to a version of MS-DIAL. We provide separate default options files for each version that can be obtained with the --print MSD4
and --print MSD5
command line arguments, respectively.
We have tested the tool on Windows 10 and Ubuntu 22.04. Although we have not tested this, we expect that the Docker version will work on Macs with intel chips and that the Conda version should work on any machine that has Conda installed.
The tests
and build
directories in this repository are intended for internal development use only and the scripts they contain are not expected to work on other systems.
If you wish to compile the executable version yourself:
- Create and activate a Conda environment using
environments/build.yml
. - Run
pip install .
at the repository root. - Navigate to
build/
. - Run
pyinstaller --onefile ../src/lipoCLEAN/__main__.py -n LipoCLEAN --paths ../src/lipoCLEAN/ --add-data ../src/lipoCLEAN/*.txt:.
- The executable will be found in a newly created
dist/
directory
Disclaimer: We are not in any way associated with the developers of MS-DIAL, we are merely enthusiastic users of their software.