-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parsing subdirectories out of git URLs (redux) #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I really like this version! It feels cleaner and more elegant.
I can think of one or two minor Rust API tweaks that we should consider before merging this. These are actually pretty simple, but apparently I need a lot of words for me to justify and explain what I'm thinking, so please don't be concerned by the length of this feedback!
Once again, thank you for tackling this. It's a great feature, and I'm delighted to have it.
src/v2/git_url.rs
Outdated
} | ||
|
||
/// Extract the repository part of the URL | ||
pub fn repository(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I'd love to change String
to &str
. Logically we can assume that this object owns the necessary string somehow, so we can just return a slice.
src/v2/git_url.rs
Outdated
} | ||
|
||
/// Extract the optional branch part of the git URL | ||
pub fn branch(&self) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and Option<&str>
, and so on.
src/v2/git_url.rs
Outdated
self.parse_parts().2 | ||
} | ||
|
||
fn parse_parts(&self) -> (String, Option<String>, Option<String>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(&str, Option<&str>, Option<&str>)
src/v2/git_url.rs
Outdated
( | ||
captures.at(1).unwrap().to_string(), | ||
captures.at(3).map(|branch| branch.to_string()), | ||
captures.at(5).map(|branch| branch.to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using &str
everywhere, we can delete the .to_string()
calls here and save a bunch of heap allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow yeah that's way cleaner. Not sure why I was using Strings everywhere.
src/v2/git_url.rs
Outdated
|
||
fn parse_parts(&self) -> (String, Option<String>, Option<String>) { | ||
lazy_static! { | ||
static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(#([^:]+)?(:(.+))?)?$"#).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were being purely pendantic, it might be nice to change .unwrap()
to .expect("could not parse URL_PARSE")
, but that's being a bit pendantic, and much of compose_yml
is old enough that I hadn't started doing that yet myself. :-)
Either expect
or unwrap
is fine here, because this is a true assertion violation.
Also, we might add a couple of :?
"no-capture" markers as follows:
^([^#]+)(?:#([^:]+)?(?::(.+))?)?$
I think those are in the right place. :-) This would also require changing the arguments to at
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I didn't even know about no-capture, that's awesome.
src/v2/git_url.rs
Outdated
lazy_static! { | ||
static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(#([^:]+)?(:(.+))?)?$"#).unwrap(); | ||
} | ||
let captures = URL_PARSE.captures(&self.url).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unwrap
, on the other hand, looks like a possible runtime panic, because you're matching a complex regex against a user-provided string. And unwrap
/expect
are really meant to be assertions, not error-checks—except maybe in very short "scripts".
So we need to do one of two things here:
- Provide an iron-clad argument that
captures
can never fail on any possible input, or - Redesign this class / API so that we can only construct a
GitUrl
with a valid URL.
The later might look like:
pub struct GitUrl {
url: String, // Must be private to maintain invariants.
}
impl GitUrl {
fn new<S: Into<String>>(url: S) -> Result<GitUrl> {
let url = GitUrl { url: url.into() };
// Make sure our URL can always be parsed.
url.parse_parts()?;
Ok(url)
}
}
Then later you can write:
fn parse_parts(&self) -> Result<(&str, Option<&str>, Option<&str>)> { ... }
pub fn subdirectory(&self) -> Option<&str> {
let (_, _, subdir) = self.parse_parts()
.expect("parse_parts failed on data that we already parsed once successfully");
subdir
}
The nice thing about this is that it allows us to add other validity checks to new
in the future as needed. Alternatively, we could keep the current API, but provide a comment explaining why parse_parts
can never fail on any possible input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was conscious of the unwrap
here. I'm pretty certain that it can't blow up though. As far as I can tell, the only way captures
can fail is if the url is empty, or if it starts with a hash. And either of those cases would fail the the existing should_treat_as_url
check.
Now that said, I can totally see a situation where someone might change the should_treat_as_url
regex, without realising that there's an unwrap
70 lines down that depends on it.
Conclusion: I'll go with your suggestion and make new
call parse_parts
, so it definitely can't panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at your proposed snippet above, you've removed the should_treat_as_url
check from new
. I feel like we should keep that check there, seeing as it ensures Docker will accept the URL. The parse_parts
regex will accept strings like "a#b:c"
, or even just "a"
, which is obviously not a URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, definitely include whatever is already there in new
. I wrote that new
example from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that said, I can totally see a situation where someone might change the
should_treat_as_url
regex, without realising that there's an unwrap 70 lines down that depends on it.
Yeah, a good rule of thumb is that you shouldn't call unwrap
in a doubtful situation like this. At a minimum, there should be an expect
and a explanation of why this really is an assertion, and not a run-time error condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm really close with this, just struggling with the final piece. Right now I have this:
fn parse_parts(&self) -> Result<(&str, Option<&str>, Option<&str>)> {
lazy_static! {
static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(?:#([^:]+)?(?::(.+))?)?$"#)
.expect("Could not parse regex URL_PARSE");
}
let captures = URL_PARSE.captures(&self.url).unwrap();
Ok((
captures.at(1).unwrap(),
captures.at(2),
captures.at(3),
))
}
I've wrapped the return type in Result
, and I'm handling the Result
everywhere that calls this, but as you can see I still have the unwrap
. I tried to replace unwrap()
withok_or(())?
to convert the Option
into a Result
but couldn't get the types to work. Is it because the Result
here is an error-chain thing, not a regular Result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you need to construct our local Error type here, which you can actually do with a string and .into(). Use ok_or_else(|| -> Error { format!(...).into() })
and it will construct an error for you.
src/v2/git_url.rs
Outdated
assert_eq!(with_ref.branch(), Some("mybranch".to_string())); | ||
assert_eq!(with_ref.subdirectory(), None); | ||
|
||
let with_subdir = GitUrl::new(format!("{}{}", url, "#:myfolder")).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, but let's also add a test case for format!("{}{}", url, ":myfolder"))
, specifying how that should be parsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the Docker docs, I don't think that's a valid URL? Also, the parsing as it's currently written relies on the hash as a nice, easy-to-spot delimeter. A lone colon is harder to split on, because it could be for a port number. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry! I should have been more clear. I want to make sure format!("{}{}", url, ":myfolder"))
isn't parsed as a subdir. It's a suspicious input, where the implementation might plausibly fail, and I want to make sure that we interpret it correctly.
Does this make sense?
src/v2/context.in.rs
Outdated
|
||
/// Returns a new Context which is the same as the | ||
/// this one, but without any subdirectory part | ||
pub fn without_subdirectory(&self) -> Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazingly good choice of API. Thank you for figuring this out. Maybe it might be clearer if we named it without_repository_subdirectory
?
- Also clean up some String allocations and tweak some naming
Ok I think I've addressed now except for your last point on testing, which I replied to above. Let me know what you think 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is looking really polished.
Two very minor nitpicks, and then I'll merge it. :-)
src/v2/git_url.rs
Outdated
@@ -42,7 +42,9 @@ impl GitUrl { | |||
pub fn new<S: Into<String>>(url: S) -> Result<GitUrl> { | |||
let url = url.into(); | |||
if GitUrl::should_treat_as_url(&url) { | |||
Ok(GitUrl { url: url }) | |||
let git_url = GitUrl { url }; | |||
git_url.parse_parts()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment explaining the invariant we're trying to enforce (otherwise this might be a bit mysterious):
// Invariant: `parse_parts` should succeed for all `GitUrl` values after construction.
src/v2/git_url.rs
Outdated
static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(?:#([^:]+)?(?::(.+))?)?$"#) | ||
.expect("Could not parse regex URL_PARSE"); | ||
} | ||
let captures = URL_PARSE.captures(&self.url).ok_or_else(|| -> Error { "".into() })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make that error: format!("could not parse URL {:?}", self.url).into()
.
I'm on the edge of my seat over here! 😄 |
It's great, and it's merged! This should be released as 0.0.50 in a couple of minutes. |
This is a cleaned up version of #9. See that PR for more context.