Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data preprocessing step for combining individual input files into one #77

Open
andrew-weisman opened this issue Feb 12, 2024 · 3 comments
Assignees
Labels
data loader enhancement New feature or request

Comments

@andrew-weisman
Copy link
Contributor

Probably use the same preprocessing steps as I did for the latest TAIS data (in Andrew's OneDrive but probably shared somewhere on Teams), which should generalize to all HALO input files.

@andrew-weisman andrew-weisman added enhancement New feature or request data loader labels Feb 12, 2024
@andrew-weisman andrew-weisman self-assigned this Feb 12, 2024
@andrew-weisman
Copy link
Contributor Author

Figure out what about the preprocessing (it's likely the large size of the total dataset) turns integer columns into floats, e.g., thresholded marker columns. To address that probably just cast back to the original dtypes after the concatenation is complete.

@andrew-weisman
Copy link
Contributor Author

This could be an entire new app in the suite that allows the user to check that the required columns are there, to select the columns to import, and/or perform exploration on the dataset that is to be imported, showing a visual output of the dataset or at least of a sample.

@andrew-weisman
Copy link
Contributor Author

Relatedly, this would allow dataset_formats.py to always work and never die if something is unexpected and would then fail. Otherwise, we need to have the app gracefully die if dataset_formats.py fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data loader enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant