API and synchronisation worker for general site search on GOV.UK
This application powers the new site search for GOV.UK using Google Cloud Platform (GCP)'s Vertex AI Search ("Discovery Engine") product as its underlying search engine. It provides two core pieces of functionality:
- An API that is "minimally compatible" with the existing
search-api
REST interface to the extent necessary to power the "site search" (/search/all
) finder. - A synchonisation worker that receives content updates from the Publishing API message queue and updates the Discovery Engine dataset accordingly
The official way of running this application locally is through GOV.UK Docker, where
a project is defined for it. Because this application is deeply integrated with a SaaS product, you
will have to have access to a GCP Discovery Engine engine to be able to do anything more meaningful
than running the test suite. govuk-docker
will do this for you by configuring the environment to
point to integration. If you want to run the application without GOV.UK Docker, you can reference
the required environment variables from there.
You can run the application from within the govuk-docker
repository directory as follows:
make search-api-v2
gcloud auth application-default login
govuk-docker up -d search-api-v2-app # or search-api-v2-lite if you just want to run tests
Our primary product goal was to improve the quality of search results for the majority of GOV.UK users.
The existing search powers a significant number of use cases within GOV.UK, including numerous
user-facing "finder" pages handled by Finder Frontend (among them the
/search/all
finder that handles the main search page which we usually refer to as "site
search"), but also acts as a very general "everything but the kitchen sink" API for retrieving
content by a set of criteria.
We established that attempting to migrate all of these use cases with over a decade of accumulated logic and edge cases would distract us from our primary goal and be a poor fit for a next-generation search product anyway (the overwhelming majority of non-"site search" queries being trivial content retrieval filtered by certain attributes that could be handled by a relational database).
We therefore made a tactical decision to focus on "site search" only and find the minimal subset of the existing API contract that is necessary to render search results in this context, and update Finder Frontend to call our new application if and only if the user is using the general "site search" finder.
Nothing in this application precludes more use cases being migrated to it in the future, but for the time being, it is intentionally not a complete replacement for Search API (despite the "v2" name).
See Search API compatibility for more information about our compatibility design choices.
The marketing name of the search product we use (Google Vertex AI Search and Conversation) has undergone several changes while this application was first developed, and some concepts have different naming in the Google Cloud Platform UI compared to the actual underlying APIs themselves.
We have chosen to exclusively use the more stable API naming (Discovery Engine, engine instead of app, etc.) throughout the codebase and documentation to avoid having to rename things as the product reached general availability, but you may see the terms "Vertex" or "Vertex Search" as well as some other marketing terms used in some project artefacts.
finder-frontend
: Displays results from this application's API depending on the "finder" in use and some other conditionssearch-api
: The original Search API, a subset of which this application's API replicatessearch-v2-infrastructure
: Provisions infrastructure for Discovery Engine including cloud resources and event ingestion for continuous training of the search enginesearch-v2-evaluator
: Internal tool to test and rate search results