This project is built with maturin.
It can be built in development mode with:
maturin develop
This builds the Rust native module in place. You will need to re-run this whenever you change the Rust code. But changing the Python code doesn't require re-building.
make test
To check the documentation examples, use
make doctest
To run formatters, run:
make format
(To run for just Python or just Rust, use make format-python
or cargo fmt
.)
To run format checker and linters, run:
make lint
(To run for just Python or just Rust, use make lint-python
or make lint-rust
.)
If you would like to run the formatters and linters when you commit your code then you can use the pre-commit tool. The project includes a pre-commit config file already. First, install the pre-commit tool:
pip install pre-commit
Then install the hooks:
pre-commit install
From now any, any attempt to commit, will first run the linters against the modified files:
$ git commit -m"Changed some python files"
black....................................................................Passed
isort (python)...........................................................Passed
ruff.....................................................................Passed
[main daf91ed] Changed some python files
1 file changed, 1 insertion(+), 1 deletion(-)
The benchmarks in python/benchmarks
can be used to identify and diagnose
performance issues. They are run with pytest-benchmark.
These benchmarks aren't mean to showcase performance on full-scale real world
datasets; rather they are meant to be useful for developers to iterate on
performance improvements and to catch performance regressions. Therefore, any
benchmarks added there should run in less than 5 seconds.
Before running benchmarks, you should build pylance in release mode:
maturin develop --profile release-with-debug --extras benchmarks --features datagen
(You can also use --release
or --profile release
, but --profile release-with-debug
will provide debug symbols for profiling.)
Then you can run the benchmarks with
pytest python/benchmarks -m "not slow"
Note: the first time you run the benchmarks, they may take a while, since they will write out test datasets and build vector indices. Once these are built, they are re-used between benchmark runs.
Some benchmarks are especially slow, so they are skipped -m "not slow"
. To run
the slow benchmarks, use:
pytest python/benchmarks
To filter benchmarks by name, use the usual pytest -k
flag (this can be a
substring match, so you don't need to type the full name):
pytest python/benchmarks -k test_ivf_pq_index_search
If you have cargo-flamegraph installed, you can create a flamegraph of a benchmark by running:
flamegraph -F 100 --no-inline -- $(which python) \
-m pytest python/benchmarks \
--benchmark-min-time=2 \
-k test_ivf_pq_index_search
Note the parameter --benchmark-min-time
: this controls how many seconds to run
the benchmark in each round (default 5 rounds). The default is very low but you
can increase this so that the profile gets more samples.
You can drop the --no-inline
to have the program try to identify which functions
were inlined to get more detail, though this will make the processing take
considerably longer.
This will only work on Linux.
Note that you'll want to run the benchmarks once prior to profiling, so that the setup is complete and not captured as part of profiling.
You can easily compare the performance of the current version against main
.
Checkout main
branch, run the benchmarks, and save
the output using --benchmark-save
. Then install the current version and run
the benchmarks again with --benchmark-compare
.
CURRENT_BRANCH=$(git branch --show-current)
git checkout main
maturin develop --profile release-with-debug --features datagen
pytest --benchmark-save=baseline python/benchmarks -m "not slow"
COMPARE_ID=$(ls .benchmarks/*/ | tail -1 | cut -c1-4)
git checkout $CURRENT_BRANCH
maturin develop --profile release-with-debug --features datagen
pytest --benchmark-compare=$COMPARE_ID python/benchmarks -m "not slow"
Rust has great integration with tools like criterion and pprof which make it easy to profile and debug CPU intensive tasks. However, these tools are not as effective at profiling I/O intensive work or providing a high level trace of an operation.
To fill this gap the lance code utlizies the Rust tracing crate to provide tracing information for lance operations. User applications can receive these events and forward them on for logging purposes. Developers can also use this information to get a sense of the I/O that happens during an operation.
When instrumenting code you can use the #[instrument]
macro from the Rust tracing
crate. See the crate docs for more information on the various parameters that can
be set. As a general guideline we should aim to instrument the following methods:
- Top-level methods that will often be called by external libraries and could be slow
- Compute intensive methods that will perform a significant amount of CPU compute
- Any point where we are waiting on external resources (e.g. disk)
To begin with, instrument methods as close to the user as possible and refine downwards as you need. For example, start by instrumenting the entire dataset write operation and then instrument any individual parts of the operation that you would like to see details for.
If you would like tracing information for a rust unit test then you will need to decorate your test with the lance_test_macros::test attribute. This will wrap any existing test attributes that you are using:
#[lance_test_macros::test(tokio::test)]
async fn test() {
...
}
Then, when running your test, you will need to set the environment variable LANCE_TRACING to the your desired verbosity level (trace, debug, info, warn, error):
LANCE_TESTING=debug cargo test dataset::tests::test_create_dataset
This will create a .json file (named with a timestamp) in your working directory. This .json file can be loaded by chrome or by https://ui.perfetto.dev
If you would like to trace a python script (application, benchmark, test) then you can easily do so using the lance.tracing module. Simply call:
from lance.tracing import trace_to_chrome
trace_to_chrome(level="debug")
# rest of script
A single .json trace file will be generated after python has exited.
You can use the trace_to_chrome
function within the benchmarks, but for
sensible results you'll want to force the benchmark to just run only once.
To do this, rewrite the benchmark using the pedantic API:
def run():
"Put code to benchmark here"
...
benchmark.pedantic(run, iterations=1, rounds=1)
The current tracing implementation is slightly flawed when it comes to async operations that run in parallel. The rust tracing-chrome library emits trace events into the chrome trace events JSON format. This format is not sophisticated enough to represent asynchronous parallel work.
As a result, a single instrumented async method may appear as many different spans in the UI.
The integration tests run against local minio and local dynamodb. To start the services, run
docker compose up
Then you can run the tests with
pytest --run-integration python/tests/test_s3_ddb.py
On Mac or Linux, you can build manylinux wheels locally for Linux. The easiest
way to do this is to use zig
with maturin build
. Before you do this, you'll
need to make you (1) install zig
and (2) install the toolchains:
rustup target add x86_64-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu
For x86 Linux:
maturin build --release --zig \
--target x86_64-unknown-linux-gnu \
--compatibility manylinux2014 \
--out wheels
For ARM / aarch64 Linux:
maturin build --release --zig \
--target aarch_64-unknown-linux-gnu \
--compatibility manylinux2014 \
--out wheels
On a Mac, you can build wheels locally for MacOS:
maturin build --release \
--target aarch64-apple-darwin \
--out wheels
maturin build --release \
--target x86_64-apple-darwin \
--out wheels
When an operation should run in parallel you typically need to specify how many threads
to use. For example, as input to StreamExt::buffered
. There are two numbers you can
use. You can use ObjectStore::io_parallelism
or get_num_compute_intensive_cpus
.
Often, operations will do a little of both compute and I/O, and you will need to make
a judgement call. If you are unsure, and you are doing any I/O, then picking the
io_parallelism
is a good fallback behavior. The worst case is just that we over-parallelize
and there is more CPU contention then there needs to be. If this becomes a problem we
can always split the operation into two parts and use the two different thread pools.