Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubiome R notebook #10

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Ubiome R notebook #10

wants to merge 4 commits into from

Conversation

HadrienG
Copy link

@HadrienG HadrienG commented May 10, 2018

Ready for testing

  • install and load necessary packages
  • build data frame containing info about public data
  • load public data files
  • exploratory analysis
  • load private taxonomy results

Additional remarks

Some of the ubiome json files are not valid json, or are in other format (csv and tab delimited). Ideally there should be a data validation when accepting the upload files (I can file an issue in the repo if you want). Anyway the r code in the notebook should handle the different formats 🎉

install necessary packages
build data frame from the api and public data info
try to import json
@madprime
Copy link
Member

On this repo, I think! https://github.com/OpenHumans/oh-ubiome-source

the notebook can now import the public data in (valid) json, tsv and csv
@gedankenstuecke
Copy link
Member

I had some trouble getting it to run on my end:

  1. the data/ folder needs to exist already, otherwise it'll crash. But that's easy enough to fix.

Then I had a problem with getting the reference data loaded in.

There's a lot of

Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'

And in the end the data_container isn't created. See the end of the error message here:

screen shot 2018-05-11 at 09 27 19

Just for fun I also tried loading my own data in the next steps but that also yielded an error:

screen shot 2018-05-11 at 09 25 15

@HadrienG
Copy link
Author

Could you give me the whole traceback for the object taxon not found? Not sure if it happens during the json or the csv parsing.

@gedankenstuecke
Copy link
Member

Sure, running

invisible(map2(jsons$download_url, paste0("data/", jsons$id, ".json"), download.file))
file.remove("data/37844.json")  # invalid json
data_path <- dir("data", pattern = '.json', full.names = TRUE)
data_container <- map(data_path, read_json_or_tbl_or_csv)

I get the following output and data_container isn't actually created.
I was wondering: did you run your code directly on notebooks.openhumans.org?

Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“7 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37870.json' file 2     9 <NA>  6 columns 7 columns 'data/37870.json' row 3    12 <NA>  6 columns 7 columns 'data/37870.json' col 4    13 <NA>  6 columns 7 columns 'data/37870.json' expected 5    15 <NA>  6 columns 8 columns 'data/37870.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“9 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 7 columns 'data/37872.json' file 2     6 <NA>  6 columns 7 columns 'data/37872.json' row 3     9 <NA>  6 columns 7 columns 'data/37872.json' col 4    10 <NA>  6 columns 8 columns 'data/37872.json' expected 5    13 <NA>  6 columns 7 columns 'data/37872.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“16 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 8 columns 'data/37874.json' file 2     4 <NA>  6 columns 7 columns 'data/37874.json' row 3     7 <NA>  6 columns 8 columns 'data/37874.json' col 4    19 <NA>  6 columns 8 columns 'data/37874.json' expected 5    20 <NA>  6 columns 8 columns 'data/37874.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“103 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37876.json' file 2     9 <NA>  6 columns 7 columns 'data/37876.json' row 3    10 <NA>  6 columns 7 columns 'data/37876.json' col 4    12 <NA>  6 columns 7 columns 'data/37876.json' expected 5    16 <NA>  6 columns 7 columns 'data/37876.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“78 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37878.json' file 2     6 <NA>  6 columns 7 columns 'data/37878.json' row 3    18 <NA>  6 columns 7 columns 'data/37878.json' col 4    20 <NA>  6 columns 7 columns 'data/37878.json' expected 5    23 <NA>  6 columns 7 columns 'data/37878.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“73 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37880.json' file 2     6 <NA>  6 columns 7 columns 'data/37880.json' row 3    19 <NA>  6 columns 7 columns 'data/37880.json' col 4    22 <NA>  6 columns 7 columns 'data/37880.json' expected 5    24 <NA>  6 columns 7 columns 'data/37880.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“87 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     7 <NA>  6 columns 7 columns 'data/37882.json' file 2     8 <NA>  6 columns 7 columns 'data/37882.json' row 3     9 <NA>  6 columns 7 columns 'data/37882.json' col 4    11 <NA>  6 columns 7 columns 'data/37882.json' expected 5    20 <NA>  6 columns 7 columns 'data/37882.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“24 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37884.json' file 2     6 <NA>  6 columns 7 columns 'data/37884.json' row 3     9 <NA>  6 columns 7 columns 'data/37884.json' col 4    19 <NA>  6 columns 7 columns 'data/37884.json' expected 5    20 <NA>  6 columns 7 columns 'data/37884.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“70 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  1 columns 2 columns 'data/37902.json' file 2    14 <NA>  1 columns 2 columns 'data/37902.json' row 3    15 <NA>  1 columns 2 columns 'data/37902.json' col 4    16 <NA>  1 columns 2 columns 'data/37902.json' expected 5    17 <NA>  1 columns 2 columns 'data/37902.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“124 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    13 <NA>  1 columns 2 columns 'data/37904.json' file 2    16 <NA>  1 columns 2 columns 'data/37904.json' row 3    18 <NA>  1 columns 2 columns 'data/37904.json' col 4    21 <NA>  1 columns 2 columns 'data/37904.json' expected 5    22 <NA>  1 columns 2 columns 'data/37904.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“79 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 3 columns 'data/37906.json' file 2    11 <NA>  1 columns 2 columns 'data/37906.json' row 3    16 <NA>  1 columns 2 columns 'data/37906.json' col 4    27 <NA>  1 columns 2 columns 'data/37906.json' expected 5    30 <NA>  1 columns 5 columns 'data/37906.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“117 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     6 <NA>  1 columns 2 columns 'data/37908.json' file 2    10 <NA>  1 columns 2 columns 'data/37908.json' row 3    13 <NA>  1 columns 2 columns 'data/37908.json' col 4    14 <NA>  1 columns 2 columns 'data/37908.json' expected 5    15 <NA>  1 columns 2 columns 'data/37908.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“92 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  6 columns 7 columns 'data/37914.json' file 2    12 <NA>  6 columns 7 columns 'data/37914.json' row 3    13 <NA>  6 columns 7 columns 'data/37914.json' col 4    17 <NA>  6 columns 7 columns 'data/37914.json' expected 5    24 <NA>  6 columns 7 columns 'data/37914.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“96 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 2 columns 'data/37922.json' file 2    11 <NA>  1 columns 2 columns 'data/37922.json' row 3    12 <NA>  1 columns 2 columns 'data/37922.json' col 4    13 <NA>  1 columns 2 columns 'data/37922.json' expected 5    17 <NA>  1 columns 2 columns 'data/37922.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“76 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     9 <NA>  6 columns 7 columns 'data/37944.json' file 2    12 <NA>  6 columns 7 columns 'data/37944.json' row 3    13 <NA>  6 columns 7 columns 'data/37944.json' col 4    14 <NA>  6 columns 7 columns 'data/37944.json' expected 5    19 <NA>  6 columns 7 columns 'data/37944.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Warning message:
“Duplicated column names deduplicated: '{\\n' => '{\\n_1' [28], '\\"taxon\\":' => '\\"taxon\\":_1' [29], '\\"parent\\":' => '\\"parent\\":_1' [31], '\\"count\\":' => '\\"count\\":_1' [33], '4706,\\n' => '4706,\\n_1' [34], '\\"count_norm\\":' => '\\"count_norm\\":_1' [35], '1000000,\\n' => '1000000,\\n_1' [36], '\\"tax_name\\":' => '\\"tax_name\\":_1' [37], '\\"tax_rank\\":' => '\\"tax_rank\\":_1' [39], '},\\n' => '},\\n_1' [41], '{\\n' => '{\\n_2' [42], '\\"taxon\\":' => '\\"taxon\\":_2' [43], '\\"parent\\":' => '\\"parent\\":_2' [45], '\\"count\\":' => '\\"count\\":_2' [47], '\\"count_norm\\":' => '\\"count_norm\\":_2' [49], '\\"tax_name\\":' => '\\"tax_name\\":_2' [51], '\\"tax_rank\\":' => '\\"tax_rank\\":_2' [53], '},\\n' => '},\\n_2' [55], '{\\n' => '{\\n_3' [56], '\\"taxon\\":' => '\\"taxon\\":_3' [57], '\\"parent\\":' => '\\"parent\\":_3' [59], '\\"count\\":' => '\\"count\\":_3' [61], '\\"count_norm\\":' => '\\"count_norm\\":_3' [63], '\\"tax_name\\":' => '\\"tax_name\\":_3' [65], '\\"tax_rank\\":' => '\\"tax_rank\\":_3' [67], '},\\n' => '},\\n_3' [69], '{\\n' => '{\\n_4' [70], '\\"taxon\\":' => '\\"taxon\\":_4' [71], '\\"parent\\":' => '\\"parent\\":_4' [73], '481,\\n' => '481,\\n_1' [74], '\\"count\\":' => '\\"count\\":_4' [75], '\\"count_norm\\":' => '\\"count_norm\\":_4' [77], '\\"tax_name\\":' => '\\"tax_name\\":_4' [79], '\\"tax_rank\\":' => '\\"tax_rank\\":_4' [81], '\\"genus\\"\\n' => '\\"genus\\"\\n_1' [82], '},\\n' => '},\\n_4' [83], '{\\n' => '{\\n_5' [84], '\\"taxon\\":' => '\\"taxon\\":_5' [85], '\\"parent\\":' => '\\"parent\\":_5' [87], '482,\\n' => '482,\\n_1' [88], '\\"count\\":' => '\\"count\\":_5' [89], '\\"count_norm\\":' => '\\"count_norm\\":_5' [91], '\\"tax_name\\":' => '\\"tax_name\\":_5' [93], '\\"tax_rank\\":' => '\\"tax_rank\\":_5' [96], '},\\n' => '},\\n_5' [98], '{\\n' => '{\\n_6' [99], '\\"taxon\\":' => '\\"taxon\\":_6' [100], '\\"parent\\":' => '\\"parent\\":_6' [102], '482,\\n' => '482,\\n_2' [103], '\\"count\\":' => '\\"count\\":_6' [104], '\\"count_norm\\":' => '\\"count_norm\\":_6' [106], '\\"tax_name\\":' => '\\"tax_name\\":_6' [108], '\\"Neisseria' => '\\"Neisseria_1' [109], '\\"tax_rank\\":' => '\\"tax_rank\\":_6' [111], '\\"species\\"\\n' => '\\"species\\"\\n_1' [112], '},\\n' => '},\\n_6' [113], '{\\n' => '{\\n_7' [114], '\\"taxon\\":' => '\\"taxon\\":_7' [115], '\\"parent\\":' => '\\"parent\\":_7' [117], '\\"count\\":' => '\\"count\\":_7' [119], '\\"count_norm\\":' => '\\"count_norm\\":_7' [121], '\\"tax_name\\":' => '\\"tax_name\\":_7' [123], '\\"tax_rank\\":' => '\\"tax_rank\\":_7' [126], '\\"species\\"\\n' => '\\"species\\"\\n_2' [127], '},\\n' => '},\\n_7' [128], '{\\n' => '{\\n_8' [129], '\\"taxon\\":' => '\\"taxon\\":_8' [130], '\\"parent\\":' => '\\"parent\\":_8' [132], '\\"count\\":' => '\\"count\\":_8' [134], '\\"count_norm\\":' => '\\"count_norm\\":_8' [136], '\\"tax_name\\":' => '\\"tax_name\\":_8' [138], '\\"tax_rank\\":' => '\\"tax_rank\\":_8' [140], '\\"family\\"\\n' => '\\"family\\"\\n_1' [141], '},\\n' => '},\\n_8' [142], '{\\n' => '{\\n_9' [143], '\\"taxon\\":' => '\\"taxon\\":_9' [144], '\\"parent\\":' => '\\"parent\\":_9' [146], '543,\\n' => '543,\\n_1' [147], '\\"count\\":' => '\\"count\\":_9' [148], '5,\\n' => '5,\\n_1' [149], '\\"count_norm\\":' => '\\"count_norm\\":_9' [150], '1062,\\n' => '1062,\\n_1' [151], '\\"tax_name\\":' => '\\"tax_name\\":_9' [152], '\\"tax_rank\\":' => '\\"tax_rank\\":_9' [154], '\\"genus\\"\\n' => '\\"genus\\"\\n_2' [155], '},\\n' => '},\\n_9' [156], '{\\n' => '{\\n_10' [157], '\\"taxon\\":' => '\\"taxon\\":_10' [158], '\\"parent\\":' => '\\"parent\\":_10' [160], '\\"count\\":' => '\\"count\\":_10' [162], '\\"count_norm\\":' => '\\"count_norm\\":_10' [164], '\\"tax_name\\":' => '\\"tax_name\\":_10' [166], '\\"tax_rank\\":' => '\\"tax_rank\\":_10' [168], '\\"family\\"\\n' => '\\"family\\"\\n_2' [169], '},\\n' => '},\\n_10' [170], '{\\n' => '{\\n_11' [171], '\\"taxon\\":' => '\\"taxon\\":_11' [172], '\\"parent\\":' => '\\"parent\\":_11' [174], '712,\\n' => '712,\\n_1' [175], '\\"count\\":' => '\\"count\\":_11' [176], '\\"count_norm\\":' => '\\"count_norm\\":_11' [178], '\\"tax_name\\":' => '\\"tax_name\\":_11' [180], '\\"tax_rank\\":' => '\\"tax_rank\\":_11' [182], '\\"genus\\"\\n' => '\\"genus\\"\\n_3' [183], '},\\n' => '},\\n_11' [184], '{\\n' => '{\\n_12' [185], '\\"taxon\\":' => '\\"taxon\\":_12' [186], '\\"parent\\":' => '\\"parent\\":_12' [188], '724,\\n' => '724,\\n_1' [189], '\\"count\\":' => '\\"count\\":_12' [190], '\\"count_norm\\":' => '\\"count_norm\\":_12' [192], '\\"tax_name\\":' => '\\"tax_name\\":_12' [194], '\\"tax_rank\\":' => '\\"tax_rank\\":_12' [197], '\\"species\\"\\n' => '\\"species\\"\\n_3' [198], '},\\n' => '},\\n_12' [199], '{\\n' => '{\\n_13' [200], '\\"taxon\\":' => '\\"taxon\\":_13' [201], '\\"parent\\":' => '\\"parent\\":_13' [203], '724,\\n' => '724,\\n_2' [204], '\\"count\\":' => '\\"count\\":_13' [205], '\\"count_norm\\":' => '\\"count_norm\\":_13' [207], '\\"tax_name\\":' => '\\"tax_name\\":_13' [209], '\\"Haemophilus' => '\\"Haemophilus_1' [210], '\\"tax_rank\\":' => '\\"tax_rank\\":_13' [212], '\\"species\\"\\n' => '\\"species\\"\\n_4' [213], '},\\n' => '},\\n_13' [214], '{\\n' => '{\\n_14' [215], '\\"taxon\\":' => '\\"taxon\\":_14' [216], '\\"parent\\":' => '\\"parent\\":_14' [218], '\\"count\\":' => '\\"count\\":_14' [220], '4,\\n' => '4,\\n_1' [221], '\\"count_norm\\":' => '\\"count_norm\\":_14' [222], '849,\\n' => '849,\\n_1' [223], '\\"tax_name\\":' => '\\"tax_name\\":_14' [224], '\\"tax_rank\\":' => '\\"tax_rank\\":_14' [227], '\\"species\\"\\n' => '\\"species\\"\\n_5' [228], '},\\n' => '},\\n_14' [229], '{\\n' => '{\\n_15' [230], '\\"taxon\\":' => '\\"taxon\\":_15' [231], '\\"parent\\":' => '\\"parent\\":_15' [233], '416916,\\n' => '416916,\\n_1' [234], '\\"count\\":' => '\\"count\\":_15' [235], '\\"count_norm\\":' => '\\"count_norm\\":_15' [237], '\\"tax_name\\":' => '\\"tax_name\\":_15' [239], '\\"Aggregatibacter' => '\\"Aggregatibacter_1' [240], '\\"tax_rank\\":' => '\\"tax_rank\\":_15' [242], '\\"species\\"\\n' => '\\"species\\"\\n_6' [243], '},\\n' => '},\\n_15' [244], '{\\n' => '{\\n_16' [245], '\\"taxon\\":' => '\\"taxon\\":_16' [246], '\\"parent\\":' => '\\"parent\\":_16' [248], '\\"count\\":' => '\\"count\\":_16' [250], '\\"count_norm\\":' => '\\"count_norm\\":_16' [252], '\\"tax_name\\":' => '\\"tax_name\\":_16' [254], '\\"tax_rank\\":' => '\\"tax_rank\\":_16' [256], '\\"family\\"\\n' => '\\"family\\"\\n_3' [257], '},\\n' => '},\\n_16' [258], '{\\n' => '{\\n_17' [259], '\\"taxon\\":' => '\\"taxon\\":_17' [260], '\\"parent\\":' => '\\"parent\\":_17' [262], '815,\\n' => '815,\\n_1' [263], '\\"count\\":' => '\\"count\\":_17' [264], '33,\\n' => '33,\\n_1' [265], '\\"count_norm\\":' => '\\"count_norm\\":_17' [266], '7012,\\n' => '7012,\\n_1' [267], '\\"tax_name\\":' => '\\"tax_name\\":_17' [268], '\\"tax_rank\\":' => '\\"tax_rank\\":_17' [270], '\\"genus\\"\\n' => '\\"genus\\"\\n_4' [271], '},\\n' => '},\\n_17' [272], '{\\n' => '{\\n_18' [273], '\\"taxon\\":' => '\\"taxon\\":_18' [274], '\\"parent\\":' => '\\"parent\\":_18' [276], '816,\\n' => '816,\\n_1' [277], '\\"count\\":' => '\\"count\\":_18' [278], '\\"count_norm\\":' => '\\"count_norm\\":_18' [280], '\\"tax_name\\":' => '\\"tax_name\\":_18' [282], '\\"tax_rank\\":' => '\\"tax_rank\\":_18' [285], '\\"species\\"\\n' => '\\"species\\"\\n_7' [286], '},\\n' => '},\\n_18' [287], '{\\n' => '{\\n_19' [288], '\\"taxon\\":' => '\\"taxon\\":_19' [289], '\\"parent\\":' => '\\"parent\\":_19' [291], '\\"count\\":' => '\\"count\\":_19' [293], '21,\\n' => '21,\\n_1' [294], '\\"count_norm\\":' => '\\"count_norm\\":_19' [295], '4462,\\n' => '4462,\\n_1' [296], '\\"tax_name\\":' => '\\"tax_name\\":_19' [297], '\\"tax_rank\\":' => '\\"tax_rank\\":_19' [299], '\\"genus\\"\\n' => '\\"genus\\"\\n_5' [300], '},\\n' => '},\\n_19' [301], '{\\n' => '{\\n_20' [302], '\\"taxon\\":' => '\\"taxon\\":_20' [303], '\\"parent\\":' => '\\"parent\\":_20' [305], '\\"count\\":' => '\\"count\\":_20' [307], '\\"count_norm\\":' => '\\"count_norm\\":_20' [309], '\\"tax_name\\":' => '\\"tax_name\\":_20' [311], '\\"tax_rank\\":' => '\\"tax_rank\\":_20' [313], '\\"genus\\"\\n' => '\\”Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Warning message:
“Duplicated column names deduplicated: '\\n      \\"count\\": 4706' => '\\n      \\"count\\": 4706_1' [14], '\\n      \\"count_norm\\": 1000000' => '\\n      \\"count_norm\\": 1000000_1' [15], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_1' [35], '\\n      \\"parent\\": 482' => '\\n      \\"parent\\": 482_1' [43], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_1' [47], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_2' [53], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_1' [59], '\\n      \\"count\\": 5' => '\\n      \\"count\\": 5_1' [62], '\\n      \\"count_norm\\": 1062' => '\\n      \\"count_norm\\": 1062_1' [63], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_2' [65], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_2' [71], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_3' [77], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_3' [83], '\\n      \\"parent\\": 724' => '\\n      \\"parent\\": 724_1' [85], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_4' [89], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_1' [92], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_1' [93], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_5' [95], '\\n      \\"parent\\": 416916' => '\\n      \\"parent\\": 416916_1' [97], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_6' [101], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_3' [107], '\\n      \\"count\\": 33' => '\\n      \\"count\\": 33_1' [110], '\\n      \\"count_norm\\": 7012' => '\\n      \\"count_norm\\": 7012_1' [111], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_4' [113], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_7' [119], '\\n      \\"count\\": 21' => '\\n      \\"count\\": 21_1' [122], '\\n      \\"count_norm\\": 4462' => '\\n      \\"count_norm\\": 4462_1' [123], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_5' [125], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_6' [131], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_7' [137], '\\n      \\"count\\": 11' => '\\n      \\"count\\": 11_1' [140], '\\n      \\"count_norm\\": 2337' => '\\n      \\"count_norm\\": 2337_1' [141], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_8' [143], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_8' [149], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_1' [152], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_1' [153], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_9' [155], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_9' [167], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_2' [170], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_2' [171], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_10' [173], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_1' [179], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_2' [191], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_4' [197], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_5' [203], '\\n      \\"count\\": 1518' => '\\n      \\"count\\": 1518_1' [206], '\\n      \\"count_norm\\": 322566' => '\\n      \\"count_norm\\": 322566_1' [207], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_10' [209], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_11' [215], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_1' [217], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_12' [221], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_2' [223], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_1' [224], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_1' [225], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_13' [227], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_11' [233], '\\n      \\"count\\": 271' => '\\n      \\"count\\": 271_1' [236], '\\n      \\"count_norm\\": 57586' => '\\n      \\"count_norm\\": 57586_1' [237], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_2' [242], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_2' [243], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_6' [245], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_12' [251], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_3' [254], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_3' [255], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_13' [257], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_1' [263], '\\n      \\"count\\": 210' => '\\n      \\"count\\": 210_1' [266], '\\n      \\"count_norm\\": 44623' => '\\n      \\"count_norm\\": 44623_1' [267], '\\n      \\"tax_rank\\": \\"order\\"\\n    }' => '\\n      \\"tax_rank\\": \\"order\\"\\n    }_1' [269], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_14' [275], '\\n      \\"count\\": 38' => '\\n      \\"count\\": 38_1' [278], '\\n      \\"count_norm\\": 8074' => '\\n      \\"count_norm\\": 8074_1' [279], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_7' [281], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_3' [284], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_3' [285], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_15' [287], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_2' [290], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_2' [291], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_16' [293], '\\n      \\"parent\\": 838' => '\\n      \\"parent\\": 838_1' [295], '\\n      \\"count\\": 14' => '\\n      \\"count\\": 14_1' [296], '\\n      \\"count_norm\\": 2974' => '\\n      \\"count_norm\\": 2974_1' [297], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_17' [299], '\\n      \\"parent\\": 1224' => '\\n      \\"parent\\": 1224_1' [301], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_2' [305], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_14' [311], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_3' [314], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_3' [315], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_3' [317], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_8' [323], '\\n      \\"parent\\": 2' => ”Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Error in FUN(X[[i]], ...): object 'taxon' not found
Traceback:

1. map(data_path, read_json_or_tbl_or_csv)
2. .f(.x[[i]], ...)
3. tryCatch({
 .     jsonlite::fromJSON(data_file)$ubiome_bacteriacounts %>% select(taxon, 
 .         parent, tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 . }, error = function(cond) {
 .     paste(cond)
 .     tryCatch({
 .         tab <- read_table2(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         assertthat::assert_that(ncol(tab) > 1)
 .         return(tab)
 .     }, error = function(bla) {
 .         paste(bla)
 .         csv <- read_csv(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         return(csv)
 .     })
 . })   # at line 3-25 of file <text>
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)
7. tryCatch({
 .     tab <- read_table2(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     assertthat::assert_that(ncol(tab) > 1)
 .     return(tab)
 . }, error = function(bla) {
 .     paste(bla)
 .     csv <- read_csv(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     return(csv)
 . })   # at line 10-23 of file <text>
8. tryCatchList(expr, classes, parentenv, handlers)
9. tryCatchOne(expr, names, parentenv, handlers[[1L]])
10. value[[3L]](cond)
11. read_csv(data_file) %>% select(taxon, parent, tax_name, tax_rank, 
  .     count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
  .     ".", fixed = TRUE)[[1]][1])   # at line 19-21 of file <text>
12. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. eval(quote(`_fseq`(`_lhs`)), env, env)
15. `_fseq`(`_lhs`)
16. freduce(value, `_function_list`)
17. function_list[[i]](value)
18. select(., taxon, parent, tax_name, tax_rank, count, count_norm)
19. select.data.frame(., taxon, parent, tax_name, tax_rank, count, 
  .     count_norm)
20. select_vars(names(.data), !(!(!quos(...))))
21. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
22. map(.x[matches], .f, ...)
23. lapply(.x, .f, ...)
24. FUN(X[[i]], ...)

@HadrienG
Copy link
Author

HadrienG commented May 14, 2018

thanks for the traceback. Yes I did run it on openhumans.

I'll take a look, but I think it's because one of the json files has unquoted strings (which is not valid json)

The warnings can safely be ignored, maybe I should wrap the call in an invisible(). Or do you know a jupyter way to suppress the warnings for one cell? Like Rmarkdown {r cell_name warnings=FALSE}

@gedankenstuecke
Copy link
Member

Great, thanks so much! I was just asking as otherwise different package versions might be to blame for some weird behavior.

I'm not sure how to best turn off the warnings (my own R skills are basically limited to making things look nice thanks to the ggplot2 universe). According to google options(warn=-1) might do the trick to turn the warnings off globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants