-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subject ID splits can get messed up if subjects are not simple int types. #114
Comments
Is the mapping preserved somewhere between the raw subject ids and the esgpt subject ids? In previous versions this was preserved in subjects_df.parquet, and that no longer seems to be the case in the dev branch. |
It isn't, no. They are supposed to be the same, but clearly sometimes they
aren't if type conversion causes issues. What data type are your subject
ids? You may be able to make it store a mapping in a hacky way by adding
your subject id as a static measurement of the single class classification
modality? Though that may throw issues given the column would be used
twice, I'm not sure off hand.
…On Mon, Aug 5, 2024, 12:41 AM Juan Quiroz Aguilera ***@***.***> wrote:
Is the mapping preserved somewhere between the raw subject ids and the
esgpt subject ids? In previous versions this was preserved in
subjects_df.parquet, and that no longer seems to be the case in the dev
branch.
—
Reply to this email directly, view it on GitHub
<#114 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADS5X7GHPML6V3KNAENKDLZP3677AVCNFSM6AAAAABJW64R7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYGE2TSOJTGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
The id is integer type. I thought they were not the same, but upon fixing a bug somewhere else, I can confirm now they are the same. |
Fantastic, glad to hear it. This issue should be pretty rare -- in cases where you have non-standard integral types the subject ID spaces can get misaligned, but for normal ints it should be fine. |
If your raw subject IDs are, for example, uint64s, there can be some issues in downstream processing as subject IDs are implicitly converted to signed ints and back in the subject ID split conversion process.
The text was updated successfully, but these errors were encountered: