You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code base is littered with code that generates file locations from metadata. This code is certainly necessary during the upload of those files or creation of new files, but once the files exist in our system we should no longer need to "create" the filename and path, only retrieve it.
For example,
In aggregation areas we have a method that generates the S3 Path.
In opportunity datasets we have methods that generate the storage location.
For regional analyses, results and locations are generated on demand.
Each instance of generating the storage location is not problematic in and of themselves, but they add up across the code base. Improvements in this area could take a massive migration, both in the database and the stored files but I believe it would be well worth it.
Storing file locations
I see two different options for storing the locations:
We can store the paths directly on the models, in a common format that aligns with our "File Storage" implementation.
We create a file collection in the database with an entry for each file.
We've discussed the second and have partially done it with data sources. But data sources attempt to do too much. I think extracting out a shared "file" collection would be very beneficial. We could model it like:
typeFileItem={_id: UUIDname: string// Parameters to generate a `FileStorageKey` from:bucket: stringpath: string// AuthaccessGroup: stringcreatedBy: string// Metadatabytes: number// File size, in bytesisGzipped: booleantype: string// MIME Type}
All other types that have a file would reference it by its _id. Opportunity datasets and aggregation areas would have a fileItemId parameter now.
We would also be able to lookup all the files for a specific access group and calculate the storage size of a specific access group's uploaded data.
There are certain files this would not apply for, like Taui sites, which pre-generate thousands of files.
The text was updated successfully, but these errors were encountered:
The code base is littered with code that generates file locations from metadata. This code is certainly necessary during the upload of those files or creation of new files, but once the files exist in our system we should no longer need to "create" the filename and path, only retrieve it.
For example,
Each instance of generating the storage location is not problematic in and of themselves, but they add up across the code base. Improvements in this area could take a massive migration, both in the database and the stored files but I believe it would be well worth it.
Storing file locations
I see two different options for storing the locations:
We've discussed the second and have partially done it with data sources. But data sources attempt to do too much. I think extracting out a shared "file" collection would be very beneficial. We could model it like:
All other types that have a file would reference it by its
_id
. Opportunity datasets and aggregation areas would have afileItemId
parameter now.We would also be able to lookup all the files for a specific access group and calculate the storage size of a specific access group's uploaded data.
There are certain files this would not apply for, like Taui sites, which pre-generate thousands of files.
The text was updated successfully, but these errors were encountered: