Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates and errors subsetting columns #116

Open
cswingle opened this issue Jan 31, 2023 · 6 comments
Open

Duplicates and errors subsetting columns #116

cswingle opened this issue Jan 31, 2023 · 6 comments

Comments

@cswingle
Copy link

cswingle commented Jan 31, 2023

The latest survey created by our team seems to have issues when trying to parse the survey object. The code looks like this:

survey_object <- fetch_survey_obj(svy_id)
survey_df <- parse_survey(
  survey_object,
  fix_duplicates = "error"
)

With fix_duplicates = "error" I get Error: There are duplicated rows in the responses. This is unexpected, I'm afraid. The only submissions at that point were two responses I created to test the survey from different computers and with different answers.

With fix_duplicates = "drop" I get this:

Error in `out[, col_names]`:
! Can't subset columns that don't exist.Columns `image`, `survey_id`, `collector_id`, `response_id`, `date_created`, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.
Warning messages:
1: In duplicate_drop(x) :
   There are 22 duplicate responses, duplicates are dropped in
       the results. Set fix_duplicates = 'keep' to retain them.
2: Outer names are only allowed for unnamed scalar atomic inputs

I get the same error with fix_duplicates = "keep" except the warning message comes from duplicate_keep(x).

The only think I can think of that might be different is that this survey has one question that has a series of images and the respondent chooses one of them.

If there are R objects I can send you or some sort of debugging I can go through, I'm happy to give it a try. I did try loading a bunch of the internal functions into my environment and tried working through parse_survey to see if I could see what was going wrong, but I couldn't make sense of exactly what each step was trying to do.

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dbplyr_2.2.1            RPostgres_1.4.4         surveymonkey_0.1.0.9000
 [4] glue_1.6.2              lubridate_1.8.0         forcats_0.5.2          
 [7] stringr_1.4.1           dplyr_1.0.10            purrr_0.3.5            
[10] readr_2.1.3             tidyr_1.2.1             tibble_3.1.8           
[13] ggplot2_3.3.6           tidyverse_1.3.2        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9          pillar_1.8.1        compiler_4.2.2     
 [4] cellranger_1.1.0    tools_4.2.2         bit_4.0.4          
 [7] googledrive_2.0.0   jsonlite_1.8.4      lifecycle_1.0.3    
[10] gargle_1.2.1        gtable_0.3.1        pkgconfig_2.0.3    
[13] rlang_1.0.6         reprex_2.0.2        DBI_1.1.3          
[16] cli_3.4.1           haven_2.5.1         xml2_1.3.3         
[19] withr_2.5.0         httr_1.4.4          generics_0.1.3     
[22] vctrs_0.5.1         fs_1.5.2            hms_1.1.2          
[25] bit64_4.0.5         googlesheets4_1.0.1 grid_4.2.2         
[28] tidyselect_1.2.0    R6_2.5.1            fansi_1.0.3        
[31] readxl_1.4.1        blob_1.2.3          tzdb_0.3.0         
[34] modelr_0.1.9        magrittr_2.0.3      backports_1.4.1    
[37] scales_1.2.1        ellipsis_0.3.2      rvest_1.0.3        
[40] assertthat_0.2.1    colorspace_2.0-3    utf8_1.2.2         
[43] stringi_1.7.8       munsell_0.5.0       broom_1.0.1        
[46] crayon_1.5.2 
@mattroumaya
Copy link
Owner

Hey @cswingle, sorry for the late reply!

It is unfortunately difficult to troubleshoot these sorts of issues since survey design tends to vary quite a bit. Later today, I'll take a shot at creating a branch with an additional parameter for fix_duplicates, which will hopefully just skip over any duplicate response handling and return a parsed survey.

Another issue that might be harder to resolve is that I don't believe images are fully supported in the package right now -- I don't have a premium account anymore so it's even harder to test and add new features, but hopefully the approach above will be enough to resolve this.

@mattroumaya
Copy link
Owner

Possibly related to #104

@cswingle
Copy link
Author

cswingle commented Feb 2, 2023

@mattroumaya, I could email you the JSON from survey/:id/details and surveys/:id/details API queries (and any other endpoint I have access to) if that would help diagnose the issue. The first two survey responses are dummy responses so I wouldn't be sharing anything real other than the structure of the survey and a couple responses.

@mattroumaya
Copy link
Owner

@cswingle I'm definitely happy to take a look! my email is [email protected]. I'm a bit busy this week but hoping to take a closer look tomorrow.

@mattroumaya
Copy link
Owner

@cswingle I have a pull request ready for you to test out - whenever you have the chance, you can do:

devtools::install_github('mattroumaya/surveymonkey@47c1505773521d941a414ded769ef141037ac94c')

survey_df <- 123456789 %>%
  fetch_survey_obj %>%
  parse_survey(fix_duplicates = 'none')

You might see a warning that's thrown in pivot_longer() within the parse_survey() function, but this will hopefully allow you to pull your data and then resolve it after the survey is parsed.

@cswingle
Copy link
Author

cswingle commented Feb 4, 2023

Thanks! I tried the pull but got similar errors to what I was seeing before:

:> survey_df <- 123456789 %>% fetch_survey_obj %>% parse_survey(fix_duplicates = 'none')
You have 496 requests left today before you hit the limit
You have 495 requests left today before you hit the limit
New names:`s3_key` -> `s3_key...1``s3_key` -> `s3_key...2``s3_key` -> `s3_key...3``s3_key` -> `s3_key...4``url` -> `url...5``url` -> `url...6``url` -> `url...7``url` -> `url...8``alt_text` -> `alt_text...9``alt_text` -> `alt_text...10``alt_text` -> `alt_text...11``alt_text` -> `alt_text...12``s3_key` -> `s3_key...13``s3_key` -> `s3_key...14``s3_key` -> `s3_key...15``s3_key` -> `s3_key...16``url` -> `url...17``url` -> `url...18``url` -> `url...19``url` -> `url...20``alt_text` -> `alt_text...21``alt_text` -> `alt_text...22``alt_text` -> `alt_text...23``alt_text` -> `alt_text...24``s3_key` -> `s3_key...25``s3_key` -> `s3_key...26``url` -> `url...27``url` -> `url...28``alt_text` -> `alt_text...29``alt_text` -> `alt_text...30``s3_key` -> `s3_key...31``s3_key` -> `s3_key...32``url` -> `url...33``url` -> `url...34``alt_text` -> `alt_text...35``alt_text` -> `alt_text...36``s3_key` -> `s3_key...37``s3_key` -> `s3_key...38``s3_key` -> `s3_key...39``url` -> `url...40``url` -> `url...41``url` -> `url...42``alt_text` -> `alt_text...43``alt_text` -> `alt_text...44``alt_text` -> `alt_text...45``s3_key` -> `s3_key...46``s3_key` -> `s3_key...47``url` -> `url...48``url` -> `url...49``alt_text` -> `alt_text...50``alt_text` -> `alt_text...51`
Error in `out[, col_names]`:
! Can't subset columns that don't exist.Columns `image`, `survey_id`, `collector_id`, `response_id`, `date_created`, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Outer names are only allowed for unnamed scalar atomic inputs 

Here's the last_trace():

+> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `out[, col_names]`:
! Can't subset columns that don't exist.Columns `image`, `survey_id`, `collector_id`, `response_id`, `date_created`, etc. don't exist.
---
Backtrace:

  1. ├─510188122 %>% fetch_survey_obj %>% ...
  2. ├─surveymonkey::parse_survey(., fix_duplicates = "none")
  3. │ ├─out[, col_names]
  4. │ └─tibble:::`[.tbl_df`(out, , col_names)
  5. │   └─tibble:::vectbl_as_col_location(...)
  6. │     ├─tibble:::subclass_col_index_errors(...)
  7. │     │ └─base::withCallingHandlers(...)
  8. │     └─vctrs::vec_as_location(j, n, names, call = call)
  9. └─vctrs (local) `<fn>`()
 10.   └─vctrs:::stop_subscript_oob(...)
 11.     └─vctrs:::stop_subscript(...)
 12.       └─rlang::abort(...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants