Elasticsearch-powered search engine for looking for charities. Allows for:
- importing data from England and Wales, Scotland, and Northern Ireland, ensuring that duplicates are matched to one record.
- An elasticsearch index that can be queried.
- Reconciliation API for searching charity, based on an optimised search query.
- Facility for uploading a CSV of charity names and adding the (best guess) at a charity number.
- HTML pages for searching for a charity
- Clone repository
- Create virtual environment (
python -m venv env
) - Activate virtual environment (
env/bin/activate
orenv/Scripts\activate
) - Install requirements (
pip install -r requirements.txt
) - Install elasticsearch
- Start elasticsearch
- Create elasticsearch index (
python data_import/create_elasticsearch.py
)
This step fetches data on charities in England, Wales and Scotland. The command is run using the following command:
python data_import/fetch_data.py --oscr <path/to/oscr/zip/file.zip>
OSCR data needs to be manually downloaded from the OSCR website
in order to accept the terms and conditions. Once downloaded the path needs to
be passed to data_import/fetch_data.py
using the --oscr
flag.
Data on charities in England and Wales will be fetched from http://data.charitycommission.gov.uk/.
If a different URL is needed then use the --ccew
flag.
The latest .ZIP file will be downloaded and unzipped, and the data contained
will be converted from .bcp
files to .csv
.
Data on charities in Northern Ireland will be fetched from http://www.charitycommissionni.org.uk/charity-search/ (Open Government Licence)
If a different URL is needed then pass it to the --ccni
flag when running import/fetch_data.py
The latest .CSV file (updated daily) will be downloaded to /data.
"Other names" for Northern Ireland charities are not contained in the downloadable CSV, but are in the information presented on the CCNI website. The other names are maintained in this list which will be downloaded. To use another file, pass url to --ccni_extra
.
A list of dual registered charities
will be downloaded from github. To use another file pass an url to --dual
.
The list is CSV file with a line per pair of England and Wales/Scottish charities in the format:
"Scottish Charity Number","E&W Charity Number","Charity Name (E&W)"
"SC002327","263710","Shelter, National Campaign for Homeless People Limited"
To add more charities fork the to the Github gist and add a comment to the original gist.
You can also add postcode data from https://github.com/drkane/es-postcodes to
allow for geographic-based searching. If you host the postcode elasticsearch
index on the same host it can be used at the import_data.py
stage.
Once the data has been fetched the needed files are stored data/
directory.
You can then run the python data_import/import_data.py
script to import it.
By default the script will look for an elasticsearch instance at localhost:9200,
use python data_import/import_data.py --help
to see the available options. To use the
postcode elasticsearch index you need to pass --es-pc-host localhost
.
The data is imported into elasticsearch in the following format:
{
"charity_number": "12355",
"ccew_number": "12355",
"oscr_number": "SC1235",
"ccni_number": "NI100012",
"active": true,
"names": [
{"name": "Charity Name", "type": "registered name", "source": "ccew"}
],
"known_as": "Charity Name",
"geo": {
"areas": ["gss_codes"],
"postcode": "PO54 0DE",
"latlng": [0.0, 50.0]
},
"url": "http://www.url.org.uk/",
"domain": "url.org.uk",
"latest_income": 12345,
"company_number": [
{"number": "00121212", "source": "ccew"}
],
"parent": "124566",
"ccew_link": "http://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/SearchResultHandler.aspx?RegisteredCharityNumber=12355&SubsidiaryNumber=0",
"oscr_link": "http://www.oscr.org.uk/charities/search-scottish-charity-register/charity-details?number=SC1235",
"ccni_link": "http://www.charitycommissionni.org.uk/charity-details/?regid=100012&subid=0"
}
The server uses bottle. Run it with the following command:
python server/server.py --host localhost --port 8080
The server offers the following API endpoints:
-
/reconcile
: a reconciliation service API conforming to the OpenRefine reconciliation API specification. -
/charity/12345
: Look up information about a particular charity
Current status is a proof-of-concept, needs a bit of work to get up and running.
Priorities:
- tests for ensuring data is correctly imported
- server tests
- use results of
server/recon_test.py
to produce the best reconciliation search query for use in the server (recon_test_7
seems the best at the moment) - threshold for when to use the result vs discard
Future development:
- upload a CSV file and reconcile each row with a charity
- allow updating a charity with additional possible names