Issue with some categorical variables from UKB main dataset #7

olivierlabayle · 2023-05-08T17:23:24Z

There is an issue with some categorical variables that are correctly written as strings but later interpretd as integers by the CSV.jl parser. The workaround is to declare each individual category as an output phenotype at the moment while this is issue is being processed on the CSV.jl package (JuliaData/CSV.jl#1086).

example: ethnicity has values 1001, 2001, etc... this will be converted to integer while this does not make any sense.

if not dealt with on the CSV.jl side, the only option is to forward the column types as a header. Alternatively, Arrow.jl may deal with it better and we could retly entirely on it in the pipeline.

olivierlabayle added the bug Something isn't working label May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with some categorical variables from UKB main dataset #7

Issue with some categorical variables from UKB main dataset #7

olivierlabayle commented May 8, 2023 •

edited

Loading

Issue with some categorical variables from UKB main dataset #7

Issue with some categorical variables from UKB main dataset #7

Comments

olivierlabayle commented May 8, 2023 • edited Loading

olivierlabayle commented May 8, 2023 •

edited

Loading