The goal of this project is to create a predictive model that can, based on data aggregated, spot patterns causing issues in software stacks and predict which software stacks will likely not work without actually running the application. An example of an issue can be a specific version of TensorFlow installed together with a specific version of numpy that cause API incompatibility issues spotted on run time.
This project consist of four submodules:
- notebooks
- preprocessing
- evaluation
- loader
Each module contains README.md with brief description and manual.
Git LTS is required for downloading using git. Otherwise,
the dataset needs to be downloaded manually.
Move this archive file to thoth_issue_predictor/dataset
folder.
-
Software stack is a group of packages working together to achieve a common goal. An example of a software stack is the package Tensorflow. Tensorflow contains a list of dependencies that need to be installed and functional so the Tensorflow can work.
-
Inspection is resolved software stack.
Docker with Docker buildkit.
Commands below work only if docker runs on root user.
export APP_NAME="thoth_issue_predictor"
export BUILD_DATE=$(date +'%Y-%m-%d %H:%M:%S')
DOCKER_BUILDKIT=1 docker build . \
--tag "$APP_NAME" \
--target deployment \
--build-arg BUILD_DATE="$BUILD_DATE" \
--ssh default
export APP_NAME="thoth_issue_predictor"
docker run --net=host \
--rm \
--name "$APP_NAME"_cmd \
--log-opt tag=$APP_NAME \
--log-driver=journald \
"$APP_NAME" \
jupyter lab
Exploratory analysis is located in thoth_issue_predictor/notebooks/InspectionsExploration.ipynb
Data preprocessing, training and evaluation is located in thoth_issue_predictor/notebooks/ThothIssuePredictor.ipynb
Python, Graphwiz and Pipenv
pipenv install
pipenv run jupyter lab
Command for running all liters:
pipenv run make lint