These scripts are based on those developed at altay-oz/load_patstat.
The bulk downloads provided by EPO are served as batches of zip files. This
script assumes you have already downloaded and extracted these zip files, and
have a directory containing all the CSV files (the script will ask for this
directory - CSV_FILES_DIR
- when run).
The script will also prompt for the password for the patstat
postgresql user.
By default, a patstat
user and database is assumed, and data is loaded into
the public
schema. The script assumes that the postgresql database is running
on localhost.
This repository contains scripts tested for the following PATSTAT versions (see the releases in this repository for corresponding versions):
- Spring 2022*
- Fall 2022*
- Fall 2023
Releases marked with an asterisk have one or more point releases available for bugfixes. Always use the latest release for your version of PATSTAT.
If you have access to the versions of the data for which we do not yet have coverage, please respond to the open issue to indicate whether the release works, or whether changes are needed. If you are able, please submit a pull request with the changes needed for compatibility.
Many thanks to the contributors who have donated their time and expertise to this project (in sequential order):
- @complexly (spelling error fix)
- @vlejd (fall 2023 version)
- Run the
insert_data_to_patstat.sh
shell script. Enter the values as prompted. The script will create the schema and load the data from the CSV files, and then create rudimentary indexes. The script may run for a few hours. - Run the
count_rows_in_patstat.sh
script. This script will count the rows in each patstat table. Compare those with the values provided in the EPO dump, under theTestScripts
directory to verify that the database is complete.
The ERD for this database structure matches that given in the EPO data
catalogue, included here for reference (note that the fields APPLN_NR_EPODOC
in TLS201_APPLN
and EVENT_IMPACT
in TLS803_LEGAL_EVENT_CODE
have been removed from Fall 2022 onwards, but the Spring 2022 ERD is kept here, since the ERD provided in the data catalog for that version is higher resolution):