Replies: 11 comments
-
frictionless-r ignores the So I'm neutral regarding promoting this to the specs: frictionless-r will likely continue to ignore it. |
Beta Was this translation helpful? Give feedback.
-
Regarding
|
Beta Was this translation helpful? Give feedback.
-
Agreed -- inferring compression from the path seems just fine. Do we get some other value from having a "compression" property that I'm missing?
I think this works for local paths, but not as well for remote ones because it's harder to detect where the zip file is (archive.zip may not be an actual zip file, but just part of the url, and it's harder to check compared to local) That said, I'm not keen on
|
Beta Was this translation helpful? Give feedback.
-
I agree. |
Beta Was this translation helpful? Give feedback.
-
I will play a devil's advocate role here, but you know, inferring a Table Schema usually works just fine as well 😃 So in my opinion it is just a question of increasing interoperability documentation quality. I think, currently, the spec doesn't mention compression at all so the behavior does look just undefined generally speaking. I think we at least need to clarify it. On the other hand as there is already
So regarding inner path I think it's only applicable if a data publisher has to use some artifact i.e. ZIP file that they cannot control so they map resources from this archive similarly how excel sheets mapped onto resources with Table Dialect |
Beta Was this translation helpful? Give feedback.
-
Haha, to play counter devil's advocate: It's standard for file names to include compression type in their extension (file1.csv.zip, file2.csv.gz), but there's not a similar standard for field names to include frictionless field type information (column1.integer, column2.boolean, column3.number). To me, the question is, do we want to allow compressed paths without extensions, or compressed paths with extensions that don't match the compression type? Otherwise,
If they don't control the ZIP, then there's all kinds of malformed scenarios we can imagine... The question is where we draw the line. For example, the ZIP could have other nested ZIPs in it, which in turn hold the table data... That would require nested
Selecting an excel sheet in a workbook or a table in an SQLite db is a lot more well-defined, I think, because unlike a generic multi-file archive they have a lot of guarantees / constraints (e.g. they only hold a specific kind of table data and cannot be nested) Side note -- Does the |
Beta Was this translation helpful? Give feedback.
-
I agree on this though! I'd suggest something like 1) compression type |
Beta Was this translation helpful? Give feedback.
-
I think it's more like a convention rather than a standard
No, we had
I think, currently, we don't define anything regarding the form of |
Beta Was this translation helpful? Give feedback.
-
@roll you mean something like this would not be supported? I do use this internally and crafted this gist for a user query in Frictionless Slack. |
Beta Was this translation helpful? Give feedback.
-
@fjuniorr |
Beta Was this translation helpful? Give feedback.
-
I agree, I was being sloppy with my language there :) To be clear, I'm neutral on the
Agreed! I think that's a perfect place to put this info. I do think enforcing one file per archive is a good idea (no I also think it'd be nice to mention somewhere the practice of compressing entire data packages. (frictionless-py already supports this as well).
I would also very much support this... I was surprised to see spreadsheet support & mention of sql databases, but no clear way to select an sql table :) |
Beta Was this translation helpful? Give feedback.
-
Overview
There is quite a simple recipe - https://datapackage.org/recipes/compression-of-resources/ - adding a new
resource.compression
property to the Data Resource spec. It's supported byfrictionless-py
.Note
It might make sense to consider frictionless-py's
resource.innerPath
as well for providing a path inside an archive.Beta Was this translation helpful? Give feedback.
All reactions