In this section you find extra information, such as:
To run the baselines on the same dataset see the following README
To modify the query files or the library and run the modified version use this command, note to replace the path to codeql database and the output sarif accordingly, the main difference is the LintQ-dev.qls
and the mounting of the entire content of the main repo.
docker run \
-v "$(pwd):/home/codeql/project" \
-it --rm LintQ \
codeql database analyze \
--format=sarifv2.1.0 \
--threads=10 \
--output=/home/codeql/project/data/datasets/demo/my_results.sarif \
--rerun \
-- /home/codeql/project/data/datasets/demo/codeql_db \
/home/codeql/project/LintQ-dev.qls
The configuration file defining all the mining details is store at config/github_download_files.yaml
. The file contains the following fields:
github_token_path
: path to the file containing the github tokenfile_mining
: the settings to define how to mine the files with the github search API. Namely theoutput_folder
where to save the metadata pointing to the final files.language
is the programming language to search for.min_file_size
andmax_file_size
define the range of the admissible size of the returned files in bytes. Note that since it is not possible to retrieve all the results with a single queries, we use the size range to form smaller queries for which all the results can be read and proceed until we obtain all the files in the range.chunk_size
is used to define the sub-ranges to query and it is automatically adjusted until all the possible files are collected.keywords
: the list of keywords to search for in the github search API. They are in logical or, namely the search will return all the files containing at least one of the keywords.
To effectively run the query we use the CLI program rdlib/github.py
with the following command:
screen -L -Logfile log_long.txt -S first_run python -m rdlib.github downloadfiles --config config/github_download_files.yaml --output secret/files.json --incremental
Note that the prefix screen -L -Logfile log_long.txt -S first_run
is needed if you want to have the program run also if you close the terminal. It is usually recommended.
-
Query GitHub. Prepare a configuration file in the
config
folder (typically calledgithub_download_files_vXX
). Seegithub_download_files_v03.yaml
for an example. -
Run the following command to download the files:
screen -L -Logfile data/github_query_results/exp_vXX/log.txt -S qiskit_download python -m rdlib.github queryfilesmetadata --config config/github_download_files_vXX.yaml
Note that the prefix
screen -L -Logfile data/github_query_results/exp_vXX/log.txt -S qiskit_download
is needed if you want to have the program run also if you close the terminal. It is usually recommended, you can change the folder where to save the log file and the name of the screen session. -
Download Files. Prepare the configuration file in the
config
folder (typically calleddataset_creation_exp_vXX.yaml
). Seedataset_creation_exp_v03.yaml
for an example. -
To download the actual files from the metadata, run the following command:
screen -L -Logfile data/datasets/exp_vXX/log_download.txt -S qiskit_dataset_creation python -m qlint.datautils.dataset_creation downloadfiles --config config/dataset_creation_exp_vXX.yaml
-
To filter the dataset based on the
processing_steps
in the config file, run the following command:screen -L -Logfile data/datasets/exp_vXX/log_filter.txt -S qiskit_dataset_creation python -m qlint.datautils.dataset_creation filterdataset --config config/dataset_creation_exp_vXX.yaml
-
Move the selected programs in a dedicated folder (typically called
files_selected
) by running:screen -L -Logfile data/datasets/exp_vXX/log_filter.txt -S qiskit_dataset_creation python -m qlint.datautils.dataset_creation createselection --config config/dataset_creation_exp_vXX.yaml
-
Create the CodeQL database for the filtered dataset:
screen -L -Logfile data/datasets/exp_vXX/log.txt -S codeql_database_creation codeql database create --language=python --threads=10 --source-root=data/datasets/exp_vXX/files_selected/ -- data/datasets/exp_vXX/codeql
Follow these steps:
- Clone this repository
- Install the CodeQL CLI from here
- Move to the source directory
qlint/codeql/src
containing theqlpack.yml
and install the external packs (e.g. the python-all dependencies) with the following command:Take note of the path where the dependencies are stored (e.g.cd qlint/codeql/src codeql pack install
/home/<username>/.codeql/packages
). - Move to the repo root and run the following command including this path:
This will run the tests of the specific folder
codeql test run qlint/codeql/test/query-tests/Measurement --additional-packs=~/.codeql/packages --threads=10
query-tests/Measurement
and will use the dependencies installed in the previous step. Note, change path to test the library concept, e.g.codeql test run qlint/codeql/test/library-test/qiskit/circuit --additional-packs=~/.codeql/packages --threads=10
.
Follow these steps:
- Run the queries in the
qlint/codeql/src
folder on the dataset (in the folderdata/datasets/exp_vXX/codeql_db
) with the following command: The output will be stored in the folderdata/analysis_results/exp_vXX/codeql_{current_date_time}
. Note: add--rerun
to the command if you want to re-run the analysis on the same dataset without using the cache.Demo version:export CURRENT_DATE_TIME=`date "+%Y-%m-%d_%H-%M-%S"`; \ export OUTPUT_DIR=data/analysis_results/exp_vXX/codeql_${CURRENT_DATE_TIME}; \ mkdir -p $OUTPUT_DIR; \ codeql database analyze --format=sarifv2.1.0 --threads=10 --output=$OUTPUT_DIR/data.sarif -- data/datasets/exp_vXX/codeql_db/ qlint/codeql/src
export CURRENT_DATE_TIME=`date "+%Y-%m-%d_%H-%M-%S"`; \ export OUTPUT_DIR=data/analysis_results/demo/codeql_${CURRENT_DATE_TIME}; \ mkdir -p $OUTPUT_DIR; \ codeql database analyze --format=sarifv2.1.0 --rerun --output=$OUTPUT_DIR/data.sarif -- data/demo_dataset_output/ qlint/codeql/src
To inspect the generated warnings use the following command:
python -m rdlib.inspector --config config/annotations/inspection_exp_vXX.yaml
Remember to create the configuration file in the config/annotations
folder.
You can see an example of the configuration file in config/annotations/inspection_exp_v04.yaml
.
-
missing packages: if you run the quick evaluation in the VSCode environment, be sure to have opened the folder
/home/<username>/.codeql/packages
in your VSCode workspace. Otherwise, the CodeQL extension will not be able to find the dependencies. This operation will create a file namedqlint.code-workspace
in the repo folder, it will not uploaded to git but it is important you keep it. -
performance: for historic reason CodeQL looks for codeql libraries (
qlpack.yml
) in the sibling directories of the one you are running it, thus if you clone this repo in a directory with many other folders (e.g.~/Documents/
) the codeql run will be significantly slower (see this issue). To solve the problem you can clone the repo in a brand new empty folder, following these steps:
mkdir lintq_home
cd lintq_home
git clone <this repo url>