Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: System diagnostics functions #98

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

rmbielby
Copy link
Contributor

@rmbielby rmbielby commented Nov 11, 2024

Brief overview of changes

I've added a master diagnostics_test() function to sit as a master diagnostics function to run any diagnostic scripts we add. So far it encompasses the following two functions that I've also added as part of this PR:

  • check_proxy_settings() - this looks for (and removes if clean=TRUE) any rogue proxy settings in the Git configuration
  • check_github_pat() - this looks for (and removes if clean=TRUE) the system variable GITHUB_PAT, which prevents analysts from installing packages from GitHub
  • check_renv_download_method() - checks if the renv_download_method has been set in the .Renviron file and flags if either a) it's not set, or b) it's set to something other than "curl". If run with clean=TRUE, then it removes any none-curl setting and adds a line to .Renviron setting the method to curl.

Why are these changes being made?

Analysts sometimes hit issues with settings on their laptop (usually that they've set themselves after following outdated guidance) and find it difficult to diagnose and fix these issues themselves. We need a way for them to do this in an automated way.

Detailed description of changes

All the extra code is self contained in the diagnostic_test.R script:

  • diagnostic_test()
  • check_proxy_settings()
  • check_github_pat()
  • check_renv_download_method()

The functions have some consistent params:

  • clean = TRUE / FALSE - this dictates whether the function should automatically clean up any issues it finds. Default behaviour is just to output messages to the console about the issues.
  • verbose = TRUE / FALSE - adds a little bit of extra info

I've also added an override parameter for each function, that mainly just helps with testing, but could conceivably be used by us to deal with unusual cases, e.g. you can set the config keys to search for and remove in clean_proxy_settings() if there's a key other than http.proxy and https.proxy that's causing problems.

I've added tests for the individual check_* functions. Except for check_github_pat() as I couldn't find a way to identify the contents of Sys.getenv("GITHUB_PAT") when running on GitHub Actions. It seems to return something that looks like a string "***", but doesn't act like one (i.e. I can't do logic on it and str_replace_all("*", "") doesn't reduce it to a blank string). Main thought is that it's acting as some sort of secret / encrypted variable, but couldn't find anything remotely equivalent from searching online. The test that I had worked fine locally, but just won't work equivalently on GitHub Actions.

Additional information for reviewers

Issue ticket number/s and link

#63

@rmbielby rmbielby added the enhancement New feature or request label Nov 11, 2024
@rmbielby rmbielby requested a review from cjrace November 11, 2024 16:34
@rmbielby rmbielby self-assigned this Nov 11, 2024
@rmbielby rmbielby linked an issue Nov 11, 2024 that may be closed by this pull request
@rmbielby rmbielby linked an issue Nov 12, 2024 that may be closed by this pull request
Copy link
Contributor

@cjrace cjrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass with a few comments - not sure it's all quite working as it should?

proxy_config <- git_config |>
magrittr::extract2("global") |>
magrittr::extract(proxy_setting_names)
proxy_config <- purrr::keep(proxy_config, !is.na(names(proxy_config)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to remove the need for purrr entirely if you swap this line out, this might work as an alternative (though you should check it does actually do what you intended)

proxy_config <- proxy_config[!is.na(names(proxy_config))]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

message("FAIL: Git proxy setting have been left in place.")
}
} else {
message("PASS: No proxy settings found in your Git configuration.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when passing it prints everything whether verbose or not, should it just print the pass message if verbose = FALSE (default)?

image

With verbose = FALSE, still shows all information
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just printing the return variable in those cases, so if you run result <- check_proxy_settings() you'll get the behaviour you're expecting.

Comment on lines 30 to 35
#' @description
#' This script checks for "bad" proxy settings. Prior to the pandemic, analysts
#' in the DfE would need to add some proxy settings to their GitHub config.
#' These settings now prevent Git from connecting to remote archives on GitHub
#' and Azure DevOps if present, so this script identifies and (if clean=TRUE is
#' set) removes them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code (3233fe1#diff-9152ef3c45461a2c8473510069ec582ac76fa432c711c8d9279d32dcceaa2b98L21) set http_proxy and https_proxy as environment variables - I think this function should be checking for those too using Sys.getenv()?

Copy link
Contributor Author

@rmbielby rmbielby Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The solution I've done fixes a variant on this that I see in the workshops (which I think people were picking up from a David Sands video we link from the analysts' guide).

I'll add the extra check on for this version as well.

results <- c(
check_proxy_settings(clean = clean, verbose = verbose),
check_github_pat(clean = clean, verbose = verbose),
check_renv_download_method(clean = clean, verbose = verbose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an additional check for http.sslVerify in their Git config too?

system("git config --global http.sslVerify false") - was a line in the original proxy script and we should be discouraging analysts from having it in their configs anymore (as it's not longer necessary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a check to switch it to TRUE instead if it finds it's set to false.

#'
#' @examples
#' check_proxy_settings()
check_proxy_settings <- function(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to be working, I've set http.proxy in my Git config
image

But running the function even after restarting R Studio doesn't seem to pick it up? Am I doing something wrong here?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you set that as a local repo level param or global? If it's global the code should pick it up, but if it's just local to the repo it won't.

github_pat <- Sys.getenv("GITHUB_PAT")
# Replace above to remove non alphanumeric characters when run on GitHub
# Actions
cat("==================================")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be message() or have a \n for a new line at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed it entirely.

# Replace above to remove non alphanumeric characters when run on GitHub
# Actions
cat("==================================")
if (!is.na(github_pat)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is giving false positives, my GitHub PAT is unset but it's flagging as a fail. Should treat "" as the equivalent of NA?

image

To illustrate it, even a made up variable I can guarantee has never been set on my machine returns as blank quotes rather than NA

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I got the same here when I tried re-running it. Not sure why I had it as NA rather than "".

check_renv_download_method <- function(
renviron_file = "~/.Renviron",
clean = FALSE,
verbose = FALSE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like the verbose = FALSE option should only print the success message, feels a bit too verbose printing the variable as well as the message?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is just that it's returning something and it's printing that out. I could switch the behaviour to not return anything, but given it's a diagnostic tool, I figure the individual functions should return some sort of output to feed in to the master diagnostic function for info.

@@ -0,0 +1,44 @@
test_that("Check proxy settings identifies and removes proxy setting", {
# Set a dummy config parameter for the purposes of testing
git2r::config(http.proxy.test = "this-is-a-test-entry", global = TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if these tests are passing because the config isn't being set?

Could be worth adding an extra step in just to make sure the config is being set prior to being removed to give us more confidence in them?

I've just used this and it doesn't appear to have set anything when I run git config -l in a terminal?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hit the issue today that my R console is referencing a different global config file to my terminal, which is pretty annoying of R-Studio:
R-Studio: ~/OneDrive - Department for Education/.gitconfig
BASH terminal: ~/.gitconfig
Weirdly I don't remember this happening previously. Feel like I may have run gitcreds sometime in December, so maybe that's got something to do with it, but maybe you've got a similar set up.

})

test_that("Check RENV_DOWNLOAD_METHOD", {
# Check that check_proxy_settings identifies the rogue entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rogue entry it's looking for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a dfe-specific diagnostics function Add a post proxy.R clean up function
2 participants