Ubiome R notebook #10

HadrienG · 2018-05-10T10:36:27Z

Ready for testing

install and load necessary packages
build data frame containing info about public data
load public data files
exploratory analysis
load private taxonomy results

Additional remarks

Some of the ubiome json files are not valid json, or are in other format (csv and tab delimited). Ideally there should be a data validation when accepting the upload files (I can file an issue in the repo if you want). Anyway the r code in the notebook should handle the different formats 🎉

install necessary packages build data frame from the api and public data info try to import json

madprime · 2018-05-10T11:56:01Z

On this repo, I think! https://github.com/OpenHumans/oh-ubiome-source

the notebook can now import the public data in (valid) json, tsv and csv

gedankenstuecke · 2018-05-11T16:28:33Z

I had some trouble getting it to run on my end:

the data/ folder needs to exist already, otherwise it'll crash. But that's easy enough to fix.

Then I had a problem with getting the reference data loaded in.

There's a lot of

Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'

And in the end the data_container isn't created. See the end of the error message here:

Just for fun I also tried loading my own data in the next steps but that also yielded an error:

HadrienG · 2018-05-13T09:47:15Z

Could you give me the whole traceback for the object taxon not found? Not sure if it happens during the json or the csv parsing.

gedankenstuecke · 2018-05-14T05:17:16Z

Sure, running

invisible(map2(jsons$download_url, paste0("data/", jsons$id, ".json"), download.file))
file.remove("data/37844.json")  # invalid json
data_path <- dir("data", pattern = '.json', full.names = TRUE)
data_container <- map(data_path, read_json_or_tbl_or_csv)

I get the following output and data_container isn't actually created.
I was wondering: did you run your code directly on notebooks.openhumans.org?

Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“7 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37870.json' file 2     9 <NA>  6 columns 7 columns 'data/37870.json' row 3    12 <NA>  6 columns 7 columns 'data/37870.json' col 4    13 <NA>  6 columns 7 columns 'data/37870.json' expected 5    15 <NA>  6 columns 8 columns 'data/37870.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“9 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 7 columns 'data/37872.json' file 2     6 <NA>  6 columns 7 columns 'data/37872.json' row 3     9 <NA>  6 columns 7 columns 'data/37872.json' col 4    10 <NA>  6 columns 8 columns 'data/37872.json' expected 5    13 <NA>  6 columns 7 columns 'data/37872.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“16 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 8 columns 'data/37874.json' file 2     4 <NA>  6 columns 7 columns 'data/37874.json' row 3     7 <NA>  6 columns 8 columns 'data/37874.json' col 4    19 <NA>  6 columns 8 columns 'data/37874.json' expected 5    20 <NA>  6 columns 8 columns 'data/37874.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“103 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37876.json' file 2     9 <NA>  6 columns 7 columns 'data/37876.json' row 3    10 <NA>  6 columns 7 columns 'data/37876.json' col 4    12 <NA>  6 columns 7 columns 'data/37876.json' expected 5    16 <NA>  6 columns 7 columns 'data/37876.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“78 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37878.json' file 2     6 <NA>  6 columns 7 columns 'data/37878.json' row 3    18 <NA>  6 columns 7 columns 'data/37878.json' col 4    20 <NA>  6 columns 7 columns 'data/37878.json' expected 5    23 <NA>  6 columns 7 columns 'data/37878.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“73 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37880.json' file 2     6 <NA>  6 columns 7 columns 'data/37880.json' row 3    19 <NA>  6 columns 7 columns 'data/37880.json' col 4    22 <NA>  6 columns 7 columns 'data/37880.json' expected 5    24 <NA>  6 columns 7 columns 'data/37880.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“87 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     7 <NA>  6 columns 7 columns 'data/37882.json' file 2     8 <NA>  6 columns 7 columns 'data/37882.json' row 3     9 <NA>  6 columns 7 columns 'data/37882.json' col 4    11 <NA>  6 columns 7 columns 'data/37882.json' expected 5    20 <NA>  6 columns 7 columns 'data/37882.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“24 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37884.json' file 2     6 <NA>  6 columns 7 columns 'data/37884.json' row 3     9 <NA>  6 columns 7 columns 'data/37884.json' col 4    19 <NA>  6 columns 7 columns 'data/37884.json' expected 5    20 <NA>  6 columns 7 columns 'data/37884.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“70 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  1 columns 2 columns 'data/37902.json' file 2    14 <NA>  1 columns 2 columns 'data/37902.json' row 3    15 <NA>  1 columns 2 columns 'data/37902.json' col 4    16 <NA>  1 columns 2 columns 'data/37902.json' expected 5    17 <NA>  1 columns 2 columns 'data/37902.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“124 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    13 <NA>  1 columns 2 columns 'data/37904.json' file 2    16 <NA>  1 columns 2 columns 'data/37904.json' row 3    18 <NA>  1 columns 2 columns 'data/37904.json' col 4    21 <NA>  1 columns 2 columns 'data/37904.json' expected 5    22 <NA>  1 columns 2 columns 'data/37904.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“79 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 3 columns 'data/37906.json' file 2    11 <NA>  1 columns 2 columns 'data/37906.json' row 3    16 <NA>  1 columns 2 columns 'data/37906.json' col 4    27 <NA>  1 columns 2 columns 'data/37906.json' expected 5    30 <NA>  1 columns 5 columns 'data/37906.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“117 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     6 <NA>  1 columns 2 columns 'data/37908.json' file 2    10 <NA>  1 columns 2 columns 'data/37908.json' row 3    13 <NA>  1 columns 2 columns 'data/37908.json' col 4    14 <NA>  1 columns 2 columns 'data/37908.json' expected 5    15 <NA>  1 columns 2 columns 'data/37908.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“92 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  6 columns 7 columns 'data/37914.json' file 2    12 <NA>  6 columns 7 columns 'data/37914.json' row 3    13 <NA>  6 columns 7 columns 'data/37914.json' col 4    17 <NA>  6 columns 7 columns 'data/37914.json' expected 5    24 <NA>  6 columns 7 columns 'data/37914.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“96 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 2 columns 'data/37922.json' file 2    11 <NA>  1 columns 2 columns 'data/37922.json' row 3    12 <NA>  1 columns 2 columns 'data/37922.json' col 4    13 <NA>  1 columns 2 columns 'data/37922.json' expected 5    17 <NA>  1 columns 2 columns 'data/37922.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“76 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     9 <NA>  6 columns 7 columns 'data/37944.json' file 2    12 <NA>  6 columns 7 columns 'data/37944.json' row 3    13 <NA>  6 columns 7 columns 'data/37944.json' col 4    14 <NA>  6 columns 7 columns 'data/37944.json' expected 5    19 <NA>  6 columns 7 columns 'data/37944.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Warning message:
“Duplicated column names deduplicated: '{\\n' => '{\\n_1' [28], '\\"taxon\\":' => '\\"taxon\\":_1' [29], '\\"parent\\":' => '\\"parent\\":_1' [31], '\\"count\\":' => '\\"count\\":_1' [33], '4706,\\n' => '4706,\\n_1' [34], '\\"count_norm\\":' => '\\"count_norm\\":_1' [35], '1000000,\\n' => '1000000,\\n_1' [36], '\\"tax_name\\":' => '\\"tax_name\\":_1' [37], '\\"tax_rank\\":' => '\\"tax_rank\\":_1' [39], '},\\n' => '},\\n_1' [41], '{\\n' => '{\\n_2' [42], '\\"taxon\\":' => '\\"taxon\\":_2' [43], '\\"parent\\":' => '\\"parent\\":_2' [45], '\\"count\\":' => '\\"count\\":_2' [47], '\\"count_norm\\":' => '\\"count_norm\\":_2' [49], '\\"tax_name\\":' => '\\"tax_name\\":_2' [51], '\\"tax_rank\\":' => '\\"tax_rank\\":_2' [53], '},\\n' => '},\\n_2' [55], '{\\n' => '{\\n_3' [56], '\\"taxon\\":' => '\\"taxon\\":_3' [57], '\\"parent\\":' => '\\"parent\\":_3' [59], '\\"count\\":' => '\\"count\\":_3' [61], '\\"count_norm\\":' => '\\"count_norm\\":_3' [63], '\\"tax_name\\":' => '\\"tax_name\\":_3' [65], '\\"tax_rank\\":' => '\\"tax_rank\\":_3' [67], '},\\n' => '},\\n_3' [69], '{\\n' => '{\\n_4' [70], '\\"taxon\\":' => '\\"taxon\\":_4' [71], '\\"parent\\":' => '\\"parent\\":_4' [73], '481,\\n' => '481,\\n_1' [74], '\\"count\\":' => '\\"count\\":_4' [75], '\\"count_norm\\":' => '\\"count_norm\\":_4' [77], '\\"tax_name\\":' => '\\"tax_name\\":_4' [79], '\\"tax_rank\\":' => '\\"tax_rank\\":_4' [81], '\\"genus\\"\\n' => '\\"genus\\"\\n_1' [82], '},\\n' => '},\\n_4' [83], '{\\n' => '{\\n_5' [84], '\\"taxon\\":' => '\\"taxon\\":_5' [85], '\\"parent\\":' => '\\"parent\\":_5' [87], '482,\\n' => '482,\\n_1' [88], '\\"count\\":' => '\\"count\\":_5' [89], '\\"count_norm\\":' => '\\"count_norm\\":_5' [91], '\\"tax_name\\":' => '\\"tax_name\\":_5' [93], '\\"tax_rank\\":' => '\\"tax_rank\\":_5' [96], '},\\n' => '},\\n_5' [98], '{\\n' => '{\\n_6' [99], '\\"taxon\\":' => '\\"taxon\\":_6' [100], '\\"parent\\":' => '\\"parent\\":_6' [102], '482,\\n' => '482,\\n_2' [103], '\\"count\\":' => '\\"count\\":_6' [104], '\\"count_norm\\":' => '\\"count_norm\\":_6' [106], '\\"tax_name\\":' => '\\"tax_name\\":_6' [108], '\\"Neisseria' => '\\"Neisseria_1' [109], '\\"tax_rank\\":' => '\\"tax_rank\\":_6' [111], '\\"species\\"\\n' => '\\"species\\"\\n_1' [112], '},\\n' => '},\\n_6' [113], '{\\n' => '{\\n_7' [114], '\\"taxon\\":' => '\\"taxon\\":_7' [115], '\\"parent\\":' => '\\"parent\\":_7' [117], '\\"count\\":' => '\\"count\\":_7' [119], '\\"count_norm\\":' => '\\"count_norm\\":_7' [121], '\\"tax_name\\":' => '\\"tax_name\\":_7' [123], '\\"tax_rank\\":' => '\\"tax_rank\\":_7' [126], '\\"species\\"\\n' => '\\"species\\"\\n_2' [127], '},\\n' => '},\\n_7' [128], '{\\n' => '{\\n_8' [129], '\\"taxon\\":' => '\\"taxon\\":_8' [130], '\\"parent\\":' => '\\"parent\\":_8' [132], '\\"count\\":' => '\\"count\\":_8' [134], '\\"count_norm\\":' => '\\"count_norm\\":_8' [136], '\\"tax_name\\":' => '\\"tax_name\\":_8' [138], '\\"tax_rank\\":' => '\\"tax_rank\\":_8' [140], '\\"family\\"\\n' => '\\"family\\"\\n_1' [141], '},\\n' => '},\\n_8' [142], '{\\n' => '{\\n_9' [143], '\\"taxon\\":' => '\\"taxon\\":_9' [144], '\\"parent\\":' => '\\"parent\\":_9' [146], '543,\\n' => '543,\\n_1' [147], '\\"count\\":' => '\\"count\\":_9' [148], '5,\\n' => '5,\\n_1' [149], '\\"count_norm\\":' => '\\"count_norm\\":_9' [150], '1062,\\n' => '1062,\\n_1' [151], '\\"tax_name\\":' => '\\"tax_name\\":_9' [152], '\\"tax_rank\\":' => '\\"tax_rank\\":_9' [154], '\\"genus\\"\\n' => '\\"genus\\"\\n_2' [155], '},\\n' => '},\\n_9' [156], '{\\n' => '{\\n_10' [157], '\\"taxon\\":' => '\\"taxon\\":_10' [158], '\\"parent\\":' => '\\"parent\\":_10' [160], '\\"count\\":' => '\\"count\\":_10' [162], '\\"count_norm\\":' => '\\"count_norm\\":_10' [164], '\\"tax_name\\":' => '\\"tax_name\\":_10' [166], '\\"tax_rank\\":' => '\\"tax_rank\\":_10' [168], '\\"family\\"\\n' => '\\"family\\"\\n_2' [169], '},\\n' => '},\\n_10' [170], '{\\n' => '{\\n_11' [171], '\\"taxon\\":' => '\\"taxon\\":_11' [172], '\\"parent\\":' => '\\"parent\\":_11' [174], '712,\\n' => '712,\\n_1' [175], '\\"count\\":' => '\\"count\\":_11' [176], '\\"count_norm\\":' => '\\"count_norm\\":_11' [178], '\\"tax_name\\":' => '\\"tax_name\\":_11' [180], '\\"tax_rank\\":' => '\\"tax_rank\\":_11' [182], '\\"genus\\"\\n' => '\\"genus\\"\\n_3' [183], '},\\n' => '},\\n_11' [184], '{\\n' => '{\\n_12' [185], '\\"taxon\\":' => '\\"taxon\\":_12' [186], '\\"parent\\":' => '\\"parent\\":_12' [188], '724,\\n' => '724,\\n_1' [189], '\\"count\\":' => '\\"count\\":_12' [190], '\\"count_norm\\":' => '\\"count_norm\\":_12' [192], '\\"tax_name\\":' => '\\"tax_name\\":_12' [194], '\\"tax_rank\\":' => '\\"tax_rank\\":_12' [197], '\\"species\\"\\n' => '\\"species\\"\\n_3' [198], '},\\n' => '},\\n_12' [199], '{\\n' => '{\\n_13' [200], '\\"taxon\\":' => '\\"taxon\\":_13' [201], '\\"parent\\":' => '\\"parent\\":_13' [203], '724,\\n' => '724,\\n_2' [204], '\\"count\\":' => '\\"count\\":_13' [205], '\\"count_norm\\":' => '\\"count_norm\\":_13' [207], '\\"tax_name\\":' => '\\"tax_name\\":_13' [209], '\\"Haemophilus' => '\\"Haemophilus_1' [210], '\\"tax_rank\\":' => '\\"tax_rank\\":_13' [212], '\\"species\\"\\n' => '\\"species\\"\\n_4' [213], '},\\n' => '},\\n_13' [214], '{\\n' => '{\\n_14' [215], '\\"taxon\\":' => '\\"taxon\\":_14' [216], '\\"parent\\":' => '\\"parent\\":_14' [218], '\\"count\\":' => '\\"count\\":_14' [220], '4,\\n' => '4,\\n_1' [221], '\\"count_norm\\":' => '\\"count_norm\\":_14' [222], '849,\\n' => '849,\\n_1' [223], '\\"tax_name\\":' => '\\"tax_name\\":_14' [224], '\\"tax_rank\\":' => '\\"tax_rank\\":_14' [227], '\\"species\\"\\n' => '\\"species\\"\\n_5' [228], '},\\n' => '},\\n_14' [229], '{\\n' => '{\\n_15' [230], '\\"taxon\\":' => '\\"taxon\\":_15' [231], '\\"parent\\":' => '\\"parent\\":_15' [233], '416916,\\n' => '416916,\\n_1' [234], '\\"count\\":' => '\\"count\\":_15' [235], '\\"count_norm\\":' => '\\"count_norm\\":_15' [237], '\\"tax_name\\":' => '\\"tax_name\\":_15' [239], '\\"Aggregatibacter' => '\\"Aggregatibacter_1' [240], '\\"tax_rank\\":' => '\\"tax_rank\\":_15' [242], '\\"species\\"\\n' => '\\"species\\"\\n_6' [243], '},\\n' => '},\\n_15' [244], '{\\n' => '{\\n_16' [245], '\\"taxon\\":' => '\\"taxon\\":_16' [246], '\\"parent\\":' => '\\"parent\\":_16' [248], '\\"count\\":' => '\\"count\\":_16' [250], '\\"count_norm\\":' => '\\"count_norm\\":_16' [252], '\\"tax_name\\":' => '\\"tax_name\\":_16' [254], '\\"tax_rank\\":' => '\\"tax_rank\\":_16' [256], '\\"family\\"\\n' => '\\"family\\"\\n_3' [257], '},\\n' => '},\\n_16' [258], '{\\n' => '{\\n_17' [259], '\\"taxon\\":' => '\\"taxon\\":_17' [260], '\\"parent\\":' => '\\"parent\\":_17' [262], '815,\\n' => '815,\\n_1' [263], '\\"count\\":' => '\\"count\\":_17' [264], '33,\\n' => '33,\\n_1' [265], '\\"count_norm\\":' => '\\"count_norm\\":_17' [266], '7012,\\n' => '7012,\\n_1' [267], '\\"tax_name\\":' => '\\"tax_name\\":_17' [268], '\\"tax_rank\\":' => '\\"tax_rank\\":_17' [270], '\\"genus\\"\\n' => '\\"genus\\"\\n_4' [271], '},\\n' => '},\\n_17' [272], '{\\n' => '{\\n_18' [273], '\\"taxon\\":' => '\\"taxon\\":_18' [274], '\\"parent\\":' => '\\"parent\\":_18' [276], '816,\\n' => '816,\\n_1' [277], '\\"count\\":' => '\\"count\\":_18' [278], '\\"count_norm\\":' => '\\"count_norm\\":_18' [280], '\\"tax_name\\":' => '\\"tax_name\\":_18' [282], '\\"tax_rank\\":' => '\\"tax_rank\\":_18' [285], '\\"species\\"\\n' => '\\"species\\"\\n_7' [286], '},\\n' => '},\\n_18' [287], '{\\n' => '{\\n_19' [288], '\\"taxon\\":' => '\\"taxon\\":_19' [289], '\\"parent\\":' => '\\"parent\\":_19' [291], '\\"count\\":' => '\\"count\\":_19' [293], '21,\\n' => '21,\\n_1' [294], '\\"count_norm\\":' => '\\"count_norm\\":_19' [295], '4462,\\n' => '4462,\\n_1' [296], '\\"tax_name\\":' => '\\"tax_name\\":_19' [297], '\\"tax_rank\\":' => '\\"tax_rank\\":_19' [299], '\\"genus\\"\\n' => '\\"genus\\"\\n_5' [300], '},\\n' => '},\\n_19' [301], '{\\n' => '{\\n_20' [302], '\\"taxon\\":' => '\\"taxon\\":_20' [303], '\\"parent\\":' => '\\"parent\\":_20' [305], '\\"count\\":' => '\\"count\\":_20' [307], '\\"count_norm\\":' => '\\"count_norm\\":_20' [309], '\\"tax_name\\":' => '\\"tax_name\\":_20' [311], '\\"tax_rank\\":' => '\\"tax_rank\\":_20' [313], '\\"genus\\"\\n' => '\\”Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Warning message:
“Duplicated column names deduplicated: '\\n      \\"count\\": 4706' => '\\n      \\"count\\": 4706_1' [14], '\\n      \\"count_norm\\": 1000000' => '\\n      \\"count_norm\\": 1000000_1' [15], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_1' [35], '\\n      \\"parent\\": 482' => '\\n      \\"parent\\": 482_1' [43], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_1' [47], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_2' [53], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_1' [59], '\\n      \\"count\\": 5' => '\\n      \\"count\\": 5_1' [62], '\\n      \\"count_norm\\": 1062' => '\\n      \\"count_norm\\": 1062_1' [63], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_2' [65], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_2' [71], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_3' [77], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_3' [83], '\\n      \\"parent\\": 724' => '\\n      \\"parent\\": 724_1' [85], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_4' [89], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_1' [92], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_1' [93], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_5' [95], '\\n      \\"parent\\": 416916' => '\\n      \\"parent\\": 416916_1' [97], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_6' [101], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_3' [107], '\\n      \\"count\\": 33' => '\\n      \\"count\\": 33_1' [110], '\\n      \\"count_norm\\": 7012' => '\\n      \\"count_norm\\": 7012_1' [111], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_4' [113], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_7' [119], '\\n      \\"count\\": 21' => '\\n      \\"count\\": 21_1' [122], '\\n      \\"count_norm\\": 4462' => '\\n      \\"count_norm\\": 4462_1' [123], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_5' [125], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_6' [131], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_7' [137], '\\n      \\"count\\": 11' => '\\n      \\"count\\": 11_1' [140], '\\n      \\"count_norm\\": 2337' => '\\n      \\"count_norm\\": 2337_1' [141], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_8' [143], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_8' [149], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_1' [152], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_1' [153], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_9' [155], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_9' [167], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_2' [170], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_2' [171], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_10' [173], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_1' [179], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_2' [191], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_4' [197], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_5' [203], '\\n      \\"count\\": 1518' => '\\n      \\"count\\": 1518_1' [206], '\\n      \\"count_norm\\": 322566' => '\\n      \\"count_norm\\": 322566_1' [207], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_10' [209], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_11' [215], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_1' [217], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_12' [221], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_2' [223], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_1' [224], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_1' [225], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_13' [227], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_11' [233], '\\n      \\"count\\": 271' => '\\n      \\"count\\": 271_1' [236], '\\n      \\"count_norm\\": 57586' => '\\n      \\"count_norm\\": 57586_1' [237], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_2' [242], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_2' [243], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_6' [245], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_12' [251], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_3' [254], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_3' [255], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_13' [257], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_1' [263], '\\n      \\"count\\": 210' => '\\n      \\"count\\": 210_1' [266], '\\n      \\"count_norm\\": 44623' => '\\n      \\"count_norm\\": 44623_1' [267], '\\n      \\"tax_rank\\": \\"order\\"\\n    }' => '\\n      \\"tax_rank\\": \\"order\\"\\n    }_1' [269], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_14' [275], '\\n      \\"count\\": 38' => '\\n      \\"count\\": 38_1' [278], '\\n      \\"count_norm\\": 8074' => '\\n      \\"count_norm\\": 8074_1' [279], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_7' [281], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_3' [284], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_3' [285], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_15' [287], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_2' [290], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_2' [291], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_16' [293], '\\n      \\"parent\\": 838' => '\\n      \\"parent\\": 838_1' [295], '\\n      \\"count\\": 14' => '\\n      \\"count\\": 14_1' [296], '\\n      \\"count_norm\\": 2974' => '\\n      \\"count_norm\\": 2974_1' [297], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_17' [299], '\\n      \\"parent\\": 1224' => '\\n      \\"parent\\": 1224_1' [301], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_2' [305], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_14' [311], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_3' [314], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_3' [315], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_3' [317], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_8' [323], '\\n      \\"parent\\": 2' => ”Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Error in FUN(X[[i]], ...): object 'taxon' not found
Traceback:

1. map(data_path, read_json_or_tbl_or_csv)
2. .f(.x[[i]], ...)
3. tryCatch({
 .     jsonlite::fromJSON(data_file)$ubiome_bacteriacounts %>% select(taxon, 
 .         parent, tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 . }, error = function(cond) {
 .     paste(cond)
 .     tryCatch({
 .         tab <- read_table2(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         assertthat::assert_that(ncol(tab) > 1)
 .         return(tab)
 .     }, error = function(bla) {
 .         paste(bla)
 .         csv <- read_csv(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         return(csv)
 .     })
 . })   # at line 3-25 of file <text>
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)
7. tryCatch({
 .     tab <- read_table2(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     assertthat::assert_that(ncol(tab) > 1)
 .     return(tab)
 . }, error = function(bla) {
 .     paste(bla)
 .     csv <- read_csv(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     return(csv)
 . })   # at line 10-23 of file <text>
8. tryCatchList(expr, classes, parentenv, handlers)
9. tryCatchOne(expr, names, parentenv, handlers[[1L]])
10. value[[3L]](cond)
11. read_csv(data_file) %>% select(taxon, parent, tax_name, tax_rank, 
  .     count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
  .     ".", fixed = TRUE)[[1]][1])   # at line 19-21 of file <text>
12. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. eval(quote(`_fseq`(`_lhs`)), env, env)
15. `_fseq`(`_lhs`)
16. freduce(value, `_function_list`)
17. function_list[[i]](value)
18. select(., taxon, parent, tax_name, tax_rank, count, count_norm)
19. select.data.frame(., taxon, parent, tax_name, tax_rank, count, 
  .     count_norm)
20. select_vars(names(.data), !(!(!quos(...))))
21. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
22. map(.x[matches], .f, ...)
23. lapply(.x, .f, ...)
24. FUN(X[[i]], ...)

HadrienG · 2018-05-14T05:57:31Z

thanks for the traceback. Yes I did run it on openhumans.

I'll take a look, but I think it's because one of the json files has unquoted strings (which is not valid json)

The warnings can safely be ignored, maybe I should wrap the call in an invisible(). Or do you know a jupyter way to suppress the warnings for one cell? Like Rmarkdown {r cell_name warnings=FALSE}

gedankenstuecke · 2018-05-14T06:12:14Z

Great, thanks so much! I was just asking as otherwise different package versions might be to blame for some weird behavior.

I'm not sure how to best turn off the warnings (my own R skills are basically limited to making things look nice thanks to the ggplot2 universe). According to google options(warn=-1) might do the trick to turn the warnings off globally.

initial ubiome commit

0806dac

install necessary packages build data frame from the api and public data info try to import json

HadrienG added 3 commits May 10, 2018 16:49

fixed public data upload

d8a0769

the notebook can now import the public data in (valid) json, tsv and csv

compare my private data to public datasets

4a46f00

added a few explanations

c253479

gedankenstuecke added the mozsprint label May 11, 2018

gedankenstuecke mentioned this pull request May 11, 2018

Writing notebooks that explore genetic data #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ubiome R notebook #10

Ubiome R notebook #10

HadrienG commented May 10, 2018 •

edited

Loading

madprime commented May 10, 2018

gedankenstuecke commented May 11, 2018

HadrienG commented May 13, 2018

gedankenstuecke commented May 14, 2018

HadrienG commented May 14, 2018 •

edited

Loading

gedankenstuecke commented May 14, 2018

Ubiome R notebook #10

Are you sure you want to change the base?

Ubiome R notebook #10

Conversation

HadrienG commented May 10, 2018 • edited Loading

Additional remarks

madprime commented May 10, 2018

gedankenstuecke commented May 11, 2018

HadrienG commented May 13, 2018

gedankenstuecke commented May 14, 2018

HadrienG commented May 14, 2018 • edited Loading

gedankenstuecke commented May 14, 2018

HadrienG commented May 10, 2018 •

edited

Loading

HadrienG commented May 14, 2018 •

edited

Loading