Reperio is a visualization utility for Apache Nutch CrawlDB, LinkDB and HostDB data structures.
Reperio is written in Python. It leverages networkx and Bokeh to generate network graph vizualizations.
Conda package manager is recommended. Create a conda environment.
conda create -n reperio python==3.10
Activate conda environment and install poetry
conda activate reperio
pip install poetry
Then you can run the client using the following command:
reperio --help
or with Poetry
:
poetry run reperio --help
Makefile
contains a lot of functions for faster development.
Install all dependencies and pre-commit hooks
Install requirements:
make install
Pre-commit hooks coulb be installed after git init
via
make pre-commit-install
Codestyle and type checks
Automatic formatting uses ruff
.
make polish-codestyle
# or use synonym
make formatting
Codestyle checks only, without rewriting files:
make check-codestyle
Note:
check-codestyle
usesruff
anddarglint
library
Code security
If this command is not selected during installation, it cannnot be used.
make check-safety
This command launches Poetry
integrity checks as well as identifies security issues with Safety
and Bandit
.
make check-safety
Tests with coverage badges
Run pytest
make test
All linters
Of course there is a command to run all linters in one:
make lint
the same as:
make check-codestyle && make test && make check-safety
Docker
make docker-build
which is equivalent to:
make docker-build VERSION=latest
Remove docker image with
make docker-remove
More information about docker.
Cleanup
Delete pycache files
make pycache-remove
Remove package build
make build-remove
Delete .DS_STORE files
make dsstore-remove
Remove .mypycache
make mypycache-remove
Or to remove all above run:
make cleanup
This project is licensed under the terms of the Apache Software License 2.0
license. See LICENSE for more details.
@misc{reperio,
author = {lewismc},
title = {Reperio is a cvisualization utility for Apache Nutch data structures.},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/lewismc/reperio}}
}
This project was generated with 3PG