You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am working on ETL and data platform with NoSQL as primary data formats. Most data is JSON lines, BSON and Parquet converted from XML, JSON and extracted from REST API.
we use collections not tables to describe data (it's MongoDB logic)
we have simpified schema to define data fields
most data files inside out packages are compressed using
we added documentation as part of data description
It was a temporary solution, I am not sure that we need to continue develop our way of data packaging in parallel. So maybe it's possible to reimplement similar logic using Data Package spec?
Is it possible to add support o JSON lines and BSON files? Is it possible to extend data package spec to support similar usage?
This discussion was converted from issue #881 on January 03, 2024 12:22.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi! I am working on ETL and data platform with NoSQL as primary data formats. Most data is JSON lines, BSON and Parquet converted from XML, JSON and extracted from REST API.
I would like to add data package specification as one of ETL destinations. But I can't find how to add NoSQL schemas into data package. So our team re-created data packaing logic implemented for NoSQL. Here are examples of packages resprojects-resdata-2021-7-28-22-52.zip and datamos-7704782036-214FZ-2021-10-23-7-37.zip
The differences are:
collections
nottables
to describe data (it's MongoDB logic)documentation
as part of data descriptionIt was a temporary solution, I am not sure that we need to continue develop our way of data packaging in parallel. So maybe it's possible to reimplement similar logic using Data Package spec?
Is it possible to add support o JSON lines and BSON files? Is it possible to extend data package spec to support similar usage?
Beta Was this translation helpful? Give feedback.
All reactions