-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support import/export of Data Packages #16
Comments
I think we'd certainly be interested in adding support to these formats to Palladio's capabilities and we have looked at them before, but there are a few questions. It's worth noting that Breve uses Palladio's data processing engine internally and doesn't expose all of Palladio's capabilities. If you can provide some insight into these issues, that would be helpful:
I don't mean to imply that the Palladio format is some sort of superior format. It is really just based on Palladio's internal representation. If we could switch to a format based on Data Packages, that would probably be ideal, but as a research project we also need to maintain the ability to be flexible and expressive in areas where Data Packages has made the sort of decisions to limit expression that make perfect sense in a standard format. Thanks! |
Hi Ethan, thanks for your quick, thoughtful, and thorough response!
cc'ing @rgrp @pwalsh as they might have further thoughts on the above |
Hi Dan, This is really encouraging. Given all this, I think it may be possible to simply move Palladio's data format to Data Packages, which would solve this problem for Palladio as well as Breve. It's going to a process, but I'll probably start prototyping in a branch of the Palladio repo soon. Thanks, |
Hi @esjewett just flagging that I am available for any questions, both here and in our Frictionless Data chat: https://gitter.im/frictionlessdata/chat 👍 |
Breve allows for the assignment of data type per column and immediate validation against those types. This is excellent! However, once the dataset has been cleaned, the only output seems to be the cleaned CSV. I believe this tool would be even more useful if the type information created through Breve were recorded using JSON Table Schema and the data exported as a Tabular Data Package. Likewise, on import, the type information could be automatically set using validation rules expressed via the Data Package format.
A Data Package provides a minimal "container" for transporting any kind of data. It is designed for extension to allow publishers to add additional constraints on the format and type of data and metadata.
Concretely, you can create a Data Package by placing a specially formatted file,
datapackage.json
, in the directory containing the files that comprise your dataset. Given a dataset calleddataset.csv
that looks like this:A very simple example of a
datapackage.json
that would accompany the unaltered CSV would look like this:The data types you support would all be expressible via the JSON Table Schema language using a combination of
type
,format
, andconstraints
per field:http://specs.frictionlessdata.io/json-table-schema/#field-descriptors
We're building an ecosystem of tools and integrations that allow the reading of Data Packages in tools already in use today: http://frictionlessdata.io/about/ . We can definitely assist in supporting this integration.
The text was updated successfully, but these errors were encountered: