This is a The Palace Project maintained fork of the NYPL Library Simplified Circulation Manager.
Docker images created from this code are available at:
Docker images are the preferred way to deploy this code in a production environment.
Branch | Python Version |
---|---|
main | Python 3 |
python2 | Python 2 |
The default branch is main
and that's the working branch that should be used when branching off for bug fixes or new
features.
Python 2 stopped being supported after January 1st, 2020 but there is still a python2
branch which can be used. As of
August 2021, development will be done in the main
branch and the python2
branch will not be updated unless
absolutely necessary.
In order to help quickly set up a development environment, we include a docker-compose.yml file. This docker-compose file, will build the webapp and scripts containers from your local repository, and start those containers as well as all the necessary service containers.
You can give this a try by running the following command:
docker-compose up --build
If you do not have Python 3 installed, you can use Homebrew to install it by running the command
brew install python3
.
If you do not yet have Homebrew, you can install it by running the following:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Most distributions will offer Python packages. On Arch Linux, the following command is sufficient:
pacman -S python
pyenv pyenv lets you easily switch between multiple versions of Python. It can be
installed using the command curl https://pyenv.run | bash
. You can then
install the version of Python you want to work with.
It is recommended that pyenv-virtualenv be used to allow pyenv
to manage virtual environments in a manner that can be used by the poetry tool. The pyenv-virtualenv
plugin can be installed by cloning the relevant repository into the plugins
subdirectory of your $PYENV_ROOT
:
mkdir -p $PYENV_ROOT/plugins
cd $PYENV_ROOT/plugins
git clone https://github.com/pyenv/pyenv-virtualenv
After cloning the repository, pyenv
now has a new virtualenv
command:
$ pyenv virtualenv
pyenv-virtualenv: no virtualenv name given.
You will need to set up a local virtual environment to install packages and run the project. This project uses poetry for dependency management.
Poetry can be installed using the command curl -sSL https://install.python-poetry.org | python3 -
.
More information about installation options can be found in the poetry documentation.
Palace now supports Opensearch: please use it instead of Elasticsearch. Elasticsearch is no longer supported.
We recommend that you run Opensearch with docker using the following docker commands:
docker run --name opensearch -d --rm -p 9200:9200 -e "discovery.type=single-node" -e "plugins.security.disabled=true" "opensearchproject/opensearch:1"
docker exec opensearch opensearch-plugin -s install analysis-icu
docker restart opensearch
docker run -d --name pg -e POSTGRES_USER=palace -e POSTGRES_PASSWORD=test -e POSTGRES_DB=circ -p 5432:5432 postgres:12
You can run psql
in the container using the command
docker exec -it pg psql -U palace circ
- Download and install Postgres if you don't have it already.
- Use the command
psql
to access the Postgresql client. - Within the session, run the following commands:
CREATE DATABASE circ;
CREATE USER palace with password 'test';
grant all privileges on database circ to palace;
Redis is used as the broker for Celery and the caching layer. You can run Redis with docker using the following command:
docker run -d --name redis -p 6379:6379 redis/redis-stack-server
To let the application know which database to use, set the SIMPLIFIED_PRODUCTION_DATABASE
environment variable.
export SIMPLIFIED_PRODUCTION_DATABASE="postgresql://palace:test@localhost:5432/circ"
To let the application know which Opensearch instance to use, you can set the following environment variables:
PALACE_SEARCH_URL
: The url of the Opensearch instance (required).PALACE_SEARCH_INDEX_PREFIX
: The prefix to use for the Opensearch indices. The default iscirculation-works
. This is useful if you want to use the same Opensearch instance for multiple CM (optional).PALACE_SEARCH_TIMEOUT
: The timeout in seconds to use when connecting to the Opensearch instance. The default is20
(optional).PALACE_SEARCH_MAXSIZE
: The maximum size of the connection pool to use when connecting to the Opensearch instance. (optional).
export PALACE_SEARCH_URL="http://localhost:9200"
We use Celery to run background tasks. To configure Celery, you need to pass a broker URL to the application.
PALACE_CELERY_BROKER_URL
: The URL of the broker to use for Celery. (required).- for example:
export PALACE_CELERY_BROKER_URL="redis://localhost:6379/0"`
- for example:
We support overriding a number of other Celery settings via environment variables, but in most cases
the defaults should be sufficient. The full list of settings can be found in
service/celery/configuration.py
.
We use Redis as the caching layer for the application. Although you can use the same redis database for both Celery and caching, we recommend that you use a separate database for each purpose to avoid conflicts.
PALACE_REDIS_URL
: The URL of the Redis instance to use for caching. (required).- for example:
export PALACE_REDIS_URL="redis://localhost:6379/1"
- for example:
PALACE_REDIS_KEY_PREFIX
: The prefix to use for keys stored in the Redis instance. The default ispalace
. This is useful if you want to use the same Redis database for multiple CM (optional).
PALACE_BASE_URL
: The base URL of the application. Used to create absolute links. (optional)PALACE_PATRON_WEB_HOSTNAMES
: Only web applications from these hosts can access this circulation manager. This can be a single hostname (http://catalog.library.org
) or a pipe-separated list of hostnames (http://catalog.library.org|https://beta.library.org
). You can also set this to*
to allow access from any host, but you must not do this in a production environment -- only during development. (optional)PALACE_AUTHENTICATION_DOCUMENT_CACHE_TIME
: Cache time for authentication documents (in seconds). The default is 3600 (optional).
The application optionally uses a s3 compatible storage service to store files. To configure the application to use a storage service, you can set the following environment variables:
PALACE_STORAGE_PUBLIC_ACCESS_BUCKET
: Required if you want to use the storage service to serve files directly to users. This is the name of the bucket that will be used to serve files. This bucket should be configured to allow public access to the files.PALACE_STORAGE_ANALYTICS_BUCKET
: Required if you want to use the storage service to store analytics data.PALACE_STORAGE_ACCESS_KEY
: The access key (optional).- If this key is set it will be passed to boto3 when connecting to the storage service.
- If it is not set boto3 will attempt to find credentials as outlined in their documentation.
PALACE_STORAGE_SECRET_KEY
: The secret key (optional).PALACE_STORAGE_REGION
: The AWS region of the storage service (optional).PALACE_STORAGE_ENDPOINT_URL
: The endpoint of the storage service (optional). This is used if you are using a s3 compatible storage service like minio.PALACE_STORAGE_URL_TEMPLATE
: The url template to use when generating urls for files stored in the storage service (optional).- The default value is
https://{bucket}.s3.{region}.amazonaws.com/{key}
. - The following variables can be used in the template:
{bucket}
: The name of the bucket.{key}
: The key of the file.{region}
: The region of the storage service.
- The default value is
PALACE_REPORTING_NAME
: (Optional) A name used to identify the CM instance associated with generated reports.SIMPLIFIED_REPORTING_EMAIL
: (Required) Email address of recipient of reports.
The application uses the Python logging module for logging. Optionally logs can be configured to be sent to AWS CloudWatch logs. The following environment variables can be used to configure the logging:
PALACE_LOG_LEVEL
: The log level to use for the application. The default isINFO
.PALACE_LOG_VERBOSE_LEVEL
: The log level to use for particularly verbose loggers. Keeping these loggers at a higher log level by default makes it easier to troubleshoot issues. The default isWARNING
.PALACE_LOG_CLOUDWATCH_ENABLED
: Enable / disable sending logs to CloudWatch. The default isfalse
.PALACE_LOG_CLOUDWATCH_REGION
: The AWS region of the CloudWatch logs. This must be set if using CloudWatch logs.PALACE_LOG_CLOUDWATCH_GROUP
: The name of the CloudWatch log group to send logs to. Default ispalace
.PALACE_LOG_CLOUDWATCH_STREAM
: The name of the CloudWatch log stream to send logs to. Default is{machine_name}/{program_name}/{logger_name}/{process_id}
. See watchtower docs for details.PALACE_LOG_CLOUDWATCH_INTERVAL
: The interval in seconds to send logs to CloudWatch. Default is60
.PALACE_LOG_CLOUDWATCH_CREATE_GROUP
: Whether to create the log group if it does not exist. Default istrue
.PALACE_LOG_CLOUDWATCH_ACCESS_KEY
: The access key to use when sending logs to CloudWatch. This is optional.- If this key is set it will be passed to boto3 when connecting to CloudWatch.
- If it is not set boto3 will attempt to find credentials as outlined in their documentation.
PALACE_LOG_CLOUDWATCH_SECRET_KEY
: The secret key to use when sending logs to CloudWatch. This is optional.
For Firebase Cloud Messaging (FCM) support (e.g., for notifications), one
(and only one) of the following should be set:
PALACE_FCM_CREDENTIALS_JSON
- the JSON-format Google Cloud Platform (GCP) service account key orPALACE_FCM_CREDENTIALS_FILE
- the name of the file containing that key.
export PALACE_FCM_CREDENTIALS_JSON='{"type":"service_account","project_id":"<id>", "private_key_id":"f8...d1", ...}'
...or...
export PALACE_FCM_CREDENTIALS_FILE="/opt/credentials/fcm_credentials.json"
The FCM credentials can be downloaded once a Google Service account has been created. More details in the FCM documentation
For generating quicksight dashboard links the following environment variable is required
PALACE_QUICKSIGHT_AUTHORIZED_ARNS
- A dictionary of the format "<dashboard name>": ["arn:aws:quicksight:...",...]
where each quicksight dashboard gets treated with an arbitrary "name", and a list of "authorized arns".
The first the "authorized arns" is always considered as the InitialDashboardID
when creating an embed URL
for the respective "dashboard name".
Local analytics are enabled by default. S3 analytics can be enabled via the following environment variable:
- PALACE_S3_ANALYTICS_ENABLED: A boolean value to disable or enable s3 analytics. The default is false.
To use the features that require sending emails, for example to reset the password for logged-out users, you will need to have a working SMTP server and set some environment variables:
PALACE_MAIL_SERVER
- The SMTP server to use. Required if you want to send emails.PALACE_MAIL_PORT
- The port of the SMTP server. Default: 25. (optional)PALACE_MAIL_USERNAME
- The username to use when connecting to the SMTP server. (optional)PALACE_MAIL_PASSWORD
- The password to use when connecting to the SMTP server. (optional)PALACE_MAIL_SENDER
- The email address to use as the sender of the emails. (optional)
As mentioned in the pyenv section, the poetry
tool should be executed under a virtual environment
in order to guarantee that it will use the Python version you expect. To use a particular Python version,
you should create a local virtual environment in the cloned circulation
repository directory. Assuming that
you want to use, for example, Python 3.11.1:
pyenv virtualenv 3.11.1 circ
This will create a new local virtual environment called circ
that uses Python 3.11.1. Switch to that environment:
pyenv local circ
On most systems, using pyenv
will adjust your shell prompt to indicate which virtual environment you
are now in. For example, the version of Python installed in your operating system might be 3.10.1
, but
using a virtual environment can substitute, for example, 3.11.1
:
$ python --version
Python 3.10.1
$ pyenv local circ
(circ) $ python --version
Python 3.11.1
For brevity, these instructions assume that all shell commands will be executed within a virtual environment.
Install the dependencies (including dev and CI):
poetry install
Install only the production dependencies:
poetry install --only main,pg
Run the application with:
poetry run python app.py
Check that there is now a web server listening on port 6500
:
curl http://localhost:6500/
You can start a celery worker with:
poetry run celery -A "palace.manager.celery.app" worker --concurrency 1 --pool solo --beat
By default, the application is configured to provide a built-in version of the admin web interface.
The admin interface can be accessed by visiting the /admin
endpoint:
# On Linux
xdg-open http://localhost:6500/admin/
# On MacOS
open http://localhost:6500/admin/
If no existing users are configured (which will be the case if this is a fresh instance of the application), the admin interface will prompt you to specify an email address and password that will be used for subsequent logins. Extra users can be configured later.
Navigate to System Configuration → Libraries
and click Create new library. You will be prompted to enter various
details such as the name of the library, a URL, and more. For example, the configuration for a hypothetical
library, Hazelnut Peak, might look like this:
Note that the Patron support email address will appear in OPDS feeds served by the application, so make sure that it is an email address you are happy to make public.
At this point, the library exists but does not contain any collections and therefore won't be of much use to anyone.
Navigate to System Configuration → Collections
and click Create new collection. You will prompted to enter
details that will be used to source the data for the collection. A good starting point, for testing purposes,
is to use an open access OPDS feed as a data source. The
Open Bookshelf is a good example of such a feed. Enter the
following details:
Note that we associate the collection with the Hazelnut Peak library by selecting it in the Libraries
drop-down.
A collection can be associated with any number of libraries.
At this point, we have a library named Hazelnut Peak configured to use the Palace Bookshelf collection we created.
It's now necessary to tell the application to start importing books from the OPDS feed. When the application is
running inside a Docker image, the image is typically configured to execute various import operations on a regular
schedule using cron
. Because we're running the application from the command-line for development purposes, we
need to execute these operations ourselves manually. In this particular case, we need to execute the opds_import_monitor
:
(circ) $ ./bin/opds_import_monitor
{"host": "hazelnut",
"app": "simplified",
"name": "OPDS Import Monitor",
"level": "INFO",
"filename": "opds_import.py",
"message": "[Palace Bookshelf] Following next link: http://openbookshelf.dp.la/lists/Open%20Bookshelf/crawlable",
"timestamp": "2022-01-17T11:52:35.839978+00:00"}
...
The command will cause the application to crawl the configured OPDS feed and import every book in it. At the time of writing, this command will take around an hour to run the first time it is executed, but subsequent executions complete in seconds. Please wait for the import to complete before continuing!
When the import has completed, the books are imported but no OPDS feeds will have been generated, and no search service has been configured.
Navigate to System Configuration → Search
and add a new search configuration. The required URL is
the URL of the Opensearch instance configured earlier:
As with the collection configured earlier, the application depends upon various operations being executed on a regular schedule to generate search indices. Because we're running the application from the local command-line, we need to execute those operations manually:
./bin/search_index_clear
./bin/search_index_refresh
Neither of the commands will produce any output if the operations succeed.
When the collection has finished importing, we are required to generate OPDS feeds. Again, this operation is configured to execute on a regular schedule in the Docker image, but we'll need to execute it manually here:
./bin/opds_entry_coverage
The command will produce output indicating any errors.
Navigating to http://localhost:6500/
should show an OPDS feed containing various books:
The ./bin/repair/where_are_my_books
command can produce output that may indicate why books are not appearing
in OPDS feeds. A working, correctly configured installation, at the time of writing, produces output such as this:
(circ) $ ./bin/repair/where_are_my_books
Checking library Hazelnut Peak
Associated with collection Palace Bookshelf.
Associated with 171 lanes.
0 feeds in cachedfeeds table, not counting grouped feeds.
Examining collection "Palace Bookshelf"
7838 presentation-ready works.
0 works not presentation-ready.
7824 works in the search index, expected around 7838.
We can see from the above output that the vast majority of the books in the Open Bookshelf collection were indexed correctly.
Some settings have been provided in the admin UI that configure or toggle various functions of the Circulation Manager.
These can be found at /admin/web/config/SitewideSettings
in the admin interface.
This setting is a toggle that may be used to turn on or off the ability for the the system to send the Loan and Hold reminders to the mobile applications.
The Palace Manager has a number of background jobs that are scheduled to run at regular intervals. This includes all the import and reaper jobs, as well as other necessary background tasks, such as maintaining the search index and feed caches.
Jobs are scheduled via a combination of cron
and celery
. All new jobs should use celery
for scheduling,
and existing jobs are being migrated to celery
as they are updated.
The cron
jobs are defined in the docker/services/simplified_crontab
file. The celery
jobs are defined
in the core/celery/tasks/
module.
Code style on this project is linted using pre-commit. This python application is included
in our pyproject.toml
file, so if you have the applications requirements installed it should be available. pre-commit
is run automatically on each push and PR by our CI System.
You can run it manually on all files with the command: pre-commit run --all-files
.
You can also set it up, so that it runs automatically for you on each commit. Running the command pre-commit install
will install the pre-commit script in your local repositories git hooks folder, so that pre-commit is run automatically
on each commit.
The pre-commit configuration file is named .pre-commit-config.yaml
. This file configures
the different lints that pre-commit runs.
Pre-commit ships with a number of lints out of the box, we are configured to use:
trailing-whitespace
- trims trailing whitespace.end-of-file-fixer
- ensures that a file is either empty, or ends with one newline.check-yaml
- checks yaml files for parseable syntax.check-json
- checks json files for parseable syntax.check-ast
- simply checks whether the files parse as valid python.check-shebang-scripts-are-executable
- ensures that (non-binary) files with a shebang are executable.check-executables-have-shebangs
- ensures that (non-binary) executables have a shebang.check-merge-conflict
- checks for files that contain merge conflict strings.check-added-large-files
- prevents giant files from being committed.mixed-line-ending
- replaces or checks mixed line ending.
We lint using the black code formatter, so that all of our code is formatted consistently.
We lint to make sure our imports are sorted and correctly formatted using isort. Our isort configuration is stored in our tox.ini which isort automatically detects.
We lint using autoflake to flag and remove any unused import statement. If an
unused import is needed for some reason it can be ignored with a #noqa
comment in the code.
This project runs all the unit tests through Github Actions for new pull requests and when merging into the default
main
branch. The relevant file can be found in .github/workflows/test-build.yml
. When contributing updates or
fixes, it's required for the test Github Action to pass for all Python 3 environments. Run the tox
command locally
before pushing changes to make sure you find any failing tests before committing them.
For each push to a branch, CI also creates a docker image for the code in the branch. These images can be used for testing the branch, or deploying hotfixes.
To install the tools used by CI run:
poetry install --only ci
The Github Actions CI service runs the unit tests against Python 3.10, and 3.11 automatically using tox.
Tox has an environment for each python version, the module being tested, and an optional -docker
factor that will
automatically use docker to deploy service containers used for the tests. You can select the environment you would like
to test with the tox -e
flag.
When running tox without an environment specified, it tests using all supported Python versions with service dependencies running in docker containers.
Factor | Python Version |
---|---|
py310 | Python 3.10 |
py311 | Python 3.11 |
All of these environments are tested by default when running tox. To test one specific environment you can use the -e
flag.
Test Python 3.10
tox -e py310
You need to have the Python versions you are testing against installed on your local system. tox
searches the system
for installed Python versions, but does not install new Python versions. If tox
doesn't find the Python version its
looking for it will give an InterpreterNotFound
errror.
Pyenv is a useful tool to install multiple Python versions, if you need to install missing Python versions in your system for local testing.
If you install tox-docker
tox will take care of setting up all the service containers necessary to run the unit tests
and pass the correct environment variables to configure the tests to use these services. Using tox-docker
is not
required, but it is the recommended way to run the tests locally, since it runs the tests in the same way they are run
on the Github Actions CI server. tox-docker
is automatically included when installing the ci
dependency group.
The docker functionality is included in a docker
factor that can be added to the environment. To run an environment
with a particular factor you add it to the end of the environment.
Test with Python 3.10 using docker containers for the services.
tox -e "py310-docker"
If you already have elastic search or postgres running locally, you can run them instead by setting the following environment variables:
PALACE_TEST_DATABASE_URL
PALACE_TEST_SEARCH_URL
Make sure the ports and usernames are updated to reflect the local configuration.
# Set environment variables
export PALACE_TEST_DATABASE_URL="postgresql://simplified_test:test@localhost:9005/simplified_circulation_test"
export PALACE_TEST_SEARCH_URL="http://localhost:9200"
# Run tox
tox -e "py310"
The tests assume that they have permission to create and drop databases. They connect to the
provided database URL and create a new database for each test run. If the user does not have permission
to create and drop databases, the tests will fail. You can disable this behavior by setting the
PALACE_TEST_DATABASE_CREATE_DATABASE
environment variable to false
.
### Override `pytest` arguments
If you wish to pass additional arguments to `pytest` you can do so through `tox`. Every argument passed after a `--` to
the `tox` command line will the passed to `pytest`, overriding the default.
Only run the `service` tests with Python 3.11 using docker.
```sh
tox -e "py311-docker" -- tests/manager/service
When testing Celery tasks, it can be useful to set the PALACE_TEST_CELERY_WORKER_SHUTDOWN_TIMEOUT
environment variable
to a higher value than the default of 30 seconds, so you are able to set breakpoints in the worker code and debug it.
This value is interpreted as the number of seconds to wait for the worker to shut down before killing it. If you set
this value to none
or (empty string), timeouts will be disabled.
export PALACE_TEST_CELERY_WORKER_SHUTDOWN_TIMEOUT=""
Code coverage is automatically tracked with pytest-cov
when tests are run.
When the tests are run with github actions, the coverage report is automatically uploaded to
codecov and the results are added to the relevant pull request.
When running locally, the results from each individual run can be collected and combined into an HTML report using
the report
tox environment. This can be run on its own after running the tests, or as part of the tox environment
selection.
# Run core and api tests under Python 3.8, using docker
# containers for dependencies, and generate code coverage report
tox -e "py38-{core,api}-docker,report"
Check out the Docker README in the /docker
directory for in-depth information on optionally
running and developing the Circulation Manager locally with Docker, or for deploying the Circulation Manager with
Docker.
There are three different profilers included to help measure the performance of the application. They can each be enabled by setting environment variables while starting the application.
PALACE_XRAY
: Set to enable X-Ray tracing on the application.PALACE_XRAY_NAME
: The name of the service shown in x-ray for these traces.PALACE_XRAY_ANNOTATE_
: Any environment variable starting with this prefix will be added to to the trace as an annotation.- For example setting
PALACE_XRAY_ANNOTATE_KEY=value
will set the annotationkey=value
on all xray traces sent from the application.
- For example setting
PALACE_XRAY_INCLUDE_BARCODE
: If this environment variable is set totrue
then the tracing code will try to include the patrons barcode in the user parameter of the trace, if a barcode is available.
Additional environment variables are provided by the X-Ray Python SDK.
This profiler uses the
werkzeug ProfilerMiddleware
to profile the code. This uses the
cProfile
module under the hood to do the profiling.
PALACE_CPROFILE
: Profiling will the enabled if this variable is set. The saved profile data will be available at path specified in the environment variable.- The profile data will have the extension
.prof
. - The data can be accessed using the
pstats.Stats
class. - Example code to print details of the gathered statistics:
import os from pathlib import Path from pstats import SortKey, Stats path = Path(os.environ.get("PALACE_CPROFILE")) for file in path.glob("*.prof"): stats = Stats(str(file)) stats.sort_stats(SortKey.CUMULATIVE, SortKey.CALLS) stats.print_stats()
This profiler uses PyInstrument to profile the code.
PyInstrument can also be used to profile the test suite. This can be useful to identify slow tests, or to identify performance regressions.
To profile the test suite, run the following command:
pyinstrument -m pytest --no-cov -n 0 tests
PALACE_PYINSTRUMENT
: Profiling will the enabled if this variable is set. The saved profile data will be available at path specified in the environment variable.- The profile data will have the extension
.pyisession
. - The data can be accessed with the
pyinstrument.session.Session
class. - Example code to print details of the gathered statistics:
import os from pathlib import Path from pyinstrument.renderers import HTMLRenderer from pyinstrument.session import Session path = Path(os.environ.get("PALACE_PYINSTRUMENT")) for file in path.glob("*.pyisession"): session = Session.load(file) renderer = HTMLRenderer() renderer.open_in_browser(session)
- The profile data will have the extension
SIMPLIFIED_SIRSI_DYNIX_APP_ID
: The Application ID for the SirsiDynix Authentication API (optional)