- grid_json
- BFI_andOAI
- alias_org_name_hospital
- alias_org_type
- alias_topic_type
- alias_uni_org_type
- KarenExtraRecords
Note: there must be a collection called grid_json, which is the json file downloaded from https://www.grid.ac/downloads
- dimension_grid: Contains all the documents downloaded from dimension based on the GRIDs obtained in the grid.ac search engine, filtering by Education, Healthcare and country Denmark.
- ddf: contains all the DOIs downloaded from the DDF API.
- dimension_ddf: DOI DDF based documents found in dimension.
- dimension_all: contains the combination of the dimension_grid and dimension_ddf collections, this collection does not contain the documents that were NOT found in dimension (DOI obtained from DDF API)
- organizations: contains the information of the grid_json collection but in an already standardized format to be consumed directly by the opera parser.
- parsed: contains all the information of dimension_all parsed to be inserted in Neo4j.
- dimension_all_flags: contains all the doi of the dimension_all collection and the related flags dim_ddf and dim_grid
- extra_documents: contains all the extra documents that must be entered into the system, it is the homogable of ddf.
time python3.6 getraw.py -c dimension_grid -bd "2014-01-01" -ed "2019-12-31'" --dbname publications
real 320m51.697s
time python3.6 getraw.py -c dimension_grid -bd "2014-01-01" -ed "2019-12-31'" --dbname patents
real 2m39.879s
time python3.6 getraw.py -c dimension_grid -bd "2014-01-01" -ed "2019-12-31'" --dbname grants
real 3m50.482s
time python3.6 getraw.py -c dimension_grid -bd "2014-01-01" -ed "2019-12-31'" --dbname clinical_trials
real 3m29.265s
time python3.6 getraw.py -c dimension_grid -bd "2014-01-01" -ed "2019-12-31'" --dbname datasets
real 5m2.404s
time python3.6 getddf.py -cfg /home/ubuntu/dataverz/projects/opera/config.yml -y "2013,2014,2015,2016,2017,2018,2019"
real 3m39.410s
We download all the documents from dimension based on the DOIs downloaded from DDF (collection: dimension_ddf)
time python3.6 getddf-dimension.py
real 192m32.858s
We combine both collections containing documents downloaded from Dimension (dimension ddf + dimension_grid) (collection: dimension_all)
time python3.6 combine-ddf-grid-dimension.py
real 5m50.531s
time python3.6 create-collection-organisations.py
real 0m33.713
We generate the parsed collection related to dimension_all collections and organizations (collection: parsed)
time python3.6 parseraw.py
real 6m57.671s
We generate collection dimension_all_flags with all the DOIs of all the collections (dimension_grid, dimension_ddf, ddf)
time python3.6 create-collection-all-doi.py
real 1m9.880s
time python3.6 dimension-all-flags.py
real 2m31.858s
time python3.6 create-nodes-neo4j.py
real 6m52.106s
time python3.6 create-nodes-neo4j.py
time python3.6 create-relationships-neo4j.py
real 72m50.981s
time python3.6 create-relationships-neo4j.py
* BFI_andOAI
* alias_org_name_hospital
* alias_org_type
* alias_topic_type
* alias_uni_org_type
time python3.6 create-aliases.py
real 0m6.013s
time python3.6 insert-only-doi-ddf.py
real 0m3.019s
time python3.6 relation-doi-grid.py
real 2m47.016s
time python3.6 insert-flag-neo4j.py
real 0m13.599s
time python3.6 append-data-neo4j.py
real 1m0.569s
We sort the information in KarenExtraRecords in a standard collection eliminating the docs that your organization could not find based on the records provided (example: https://app.dimensions.ai/details/publication/pub.1065208274). (collection: extra_documents)
time python3.6 karen-extra-records-parse.py
real 2m8.802s
time python3.6 karen-extra-records-download-dimension.py
real 2m12.416s
time python3.6 parseraw.py
real 3m57.495s
time python3.6 create-nodes-neo4j.py
real 0m4.853s
time python3.6 create-relationships-neo4j.py
real 0m5.382s
time python3.6 create-aliases.py
real 0m5.404s
time python3.6 append-data-neo4j.py
real 1m7.722s
time python3.6 relation-doi-grid-karen.py
real 0m1.396s
time python3.6 insert-flag-neo4j-karen.py
real 0m1.384s