Frictionless (Data Package) <=> CKAN #660
-
Flagging existing repo here: https://github.com/frictionlessdata/ckan-datapackage-tools (a bit lacking in docs if someone wanted to help out 😄 Quick recipe (maybe evolving into a lib) for mapping from CKAN metadata <=> Frictionless specs Job story: When getting data in and out of CKAN I'm frequently using Frictionless formats and tools (its my default format for extraction from other systems) and I want to be able to do conversion to and from CKAN metadata structure so that I can do my work quickly and without having to dig through CKAN documentation. Bigger context: frequently pulling data from other systems into CKAN -- and from CKAN into other systems. We want to use Frictionless as the intermediate format so we can convert MxN into M+N problem.
Acceptance
Tasks
Example Scripthttps://gist.github.com/rufuspollock/bd8ae3575950d180cce33da59c021299 # python 3+
def convert_data_package_to_ckan_package(data_package):
'''
Documentation of CKAN metadata structure ...
https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.create.package_create
https://docs.ckan.org/en/2.8/api/index.html#ckan.logic.action.create.resource_create
'''
out = dict(data_package)
out['extras'] = []
# special case atm
# future look through all fields not in ckan special list
if 'tableschema' in out:
out['extras'].append({
'key': 'tableschema',
'value': json.dumps(out['tableschema'])
})
out['resources'] = [ convert_data_resource_to_ckan_resource(res)
for res in out['resources']]
return out
def convert_data_resource_to_ckan_resource(resource):
out = dict(resource)
out['url'] = out['path']
del out['path']
if 'bytes' in out:
out['size'] = out['bytes']
# flatten as json strings all nested data
for k in out.keys():
value = out[k]
if (isinstance(value, list) or isinstance(value, dict)):
out[k] = json.dumps(value)
return out
import collections
def dict_merge(dct, merge_dct):
'''Recursive dict merge.
'''
for k, v in merge_dct.items():
if (k in dct and isinstance(dct[k], dict)
and isinstance(merge_dct[k], collections.Mapping)):
dict_merge(dct[k], merge_dct[k])
else:
dct[k] = merge_dct[k]
return dct |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
@rufuspollock I'm not sure if it can just read from public CKAN without an API key (cc @amercader). But in general, it should be possible to use it like any other package = Package(
storage='ckan_datastore',
base_url=base_url,
dataset_id=dataset_id,
api_key=api_key)
# work with ckan dataset as with data package I can check the sate of the driver if needed. It was written long ago I guess by Brook. |
Beta Was this translation helpful? Give feedback.
-
@roll great 😄 Can you show/share a minimum viable code snippet of using this code "in action"? My use case was converting a data package to ckan metadata - and then doing some further manipulation b4 storing. So I just wanted the raw convert of the schema from one dict to another - do we have a recipe or function that does just the simple conversion? |
Beta Was this translation helpful? Give feedback.
-
@lauragift21 Here is a script based on @lwinfree's template for tutorials - https://colab.research.google.com/drive/18xFFAooVcca5wfW-7hW_juLkaTdN9FI0#scrollTo=Qv9bmfRA756R (I used Pandas as a target platform but it can be SQL/etc) Also, having a CKAN API KEY one should be able to import a data package as a CKAN dataset (not yet testedthough) PS.
|
Beta Was this translation helpful? Give feedback.
-
I've added an example simple python script to the description. |
Beta Was this translation helpful? Give feedback.
-
To summarize progress on this:
|
Beta Was this translation helpful? Give feedback.
To summarize progress on this: