Skip to content

tveasey/ann-prototypes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ANN Experiments

This is a repo for quickly prototyping features for the ANN index.

Retrieving Data Sets

First create a virtual Python 3 environment which we use for managing data and running tests:

>>> mkdir python && cd python && python -m venv `pwd` && source bin/activate

Next install dvc and pull test data sets:

>>> gcloud auth application-default login
>>> pip install dvc
>>> pip install dvc-gs
>>> dvc pull

Note that this requires gsutils and access to our gcp bucket. You can also download just a single data set using:

>>> dvc pull data queries-quora-E5-small.fvec.dvc
>>> dvc pull data corpus-quora-E5-small.fvec.dvc

Mac Setup

Xcode

Building requires Apple's development environment, Xcode, which can be downloaded from https://developer.apple.com/download/. You will need to register as a developer with Apple. Alternatively, you can get the latest version of Xcode from the App Store.

For C++17 at least Xcode 10 is required, and this requires macOS High Sierra or above. using Monterey or Ventura, you must install Xcode 14.2.x or above. Xcode is distributed as a .xip file; simply double click the .xip file to expand it, then drag Xcode.app to your /Applications directory.

There are no command line tools out-of-the-box, so you'll need to install them following installation of Xcode. You can do this by running:

xcode-select --install

at the command prompt.

CMake

Download the graphical installer for version 3.23.2 from https://github.com/Kitware/CMake/releases/download/v3.23.2/cmake-3.23.2-macos-universal.dmg (or get a more recent version).

Open the .dmg and install the application it by dragging it to the Applications folder.

Then make the cmake program accessible to programs that look in /usr/local/bin:

sudo mkdir -p /usr/local/bin
sudo ln -s /Applications/CMake.app/Contents/bin/cmake /usr/local/bin/cmake

Boost 1.83.0

Download version 1.83.0 of Boost from https://boostorg.jfrog.io/artifactory/main/release/1.83.0/source/boost_1_83_0.tar.bz2. You must get this exact version, as the Machine Learning build system requires it.

Assuming you chose the .bz2 version, extract it to a temporary directory:

bzip2 -cd boost_1_83_0.tar.bz2 | tar xvf -

In the resulting boost_1_83_0 directory, run:

./bootstrap.sh --with-toolset=clang --without-libraries=context --without-libraries=coroutine --without-libraries=graph_parallel --without-libraries=mpi --without-libraries=python --without-icu

This should build the b2 program, which in turn is used to build Boost.

To complete the build:

./b2 -j8 --layout=versioned --disable-icu cxxflags="-std=c++17 -stdlib=libc++" linkflags="-std=c++17 -stdlib=libc++ -Wl,-headerpad_max_install_names" optimization=speed inlining=full define=BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS define=BOOST_LOG_WITHOUT_DEBUG_OUTPUT define=BOOST_LOG_WITHOUT_EVENT_LOG define=BOOST_LOG_WITHOUT_SYSLOG define=BOOST_LOG_WITHOUT_IPC
sudo ./b2 install --layout=versioned --disable-icu cxxflags="-std=c++17 -stdlib=libc++" linkflags="-std=c++17 -stdlib=libc++ -Wl,-headerpad_max_install_names" optimization=speed inlining=full define=BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS define=BOOST_LOG_WITHOUT_DEBUG_OUTPUT define=BOOST_LOG_WITHOUT_EVENT_LOG define=BOOST_LOG_WITHOUT_SYSLOG define=BOOST_LOG_WITHOUT_IPC

to install the Boost headers and libraries.

Building

To build native code navigate to the root directory and run:

>>> cmake -S . -B build
>>> cmake --build build

There are two targets build/run_tests and build/run_benchmark.

Running Tests

Testing uses the boost test framework. After building you can run all the tests using

>>> ./build/run_tests

Individual tests can be run using for example

>>> ./build/run_tests --run_test=pq

Run ./build/run_tests --help for more information.

Running Benchmarks

You can run help on this and you to see the options:

>>> ./build/run_benchmark -h
Usage: run_benchmark
Options:
  -h [ --help ]                 Show this help
  -s [ --scalar ] arg           Use 1, 4, 4P or 8 bit scalar quantisation. If
                                not supplied then run PQ
  -r [ --run ] arg              Run a test dataset
  -m [ --metric ] arg (=cosine) The metric, must be cosine, dot or euclidean
                                with which to compare vectors
  -d [ --distance ] arg (=0)    The ScaNN threshold used for computing the
                                parallel distance cost multiplier

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages