Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a manual parser for rust-timer queue #1994

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions site/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,6 @@ jemalloc-ctl = "0.5"
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
toml = "0.7"

[dev-dependencies]
insta = "1.40.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want/need insta? Managing the snapshots files can be cumbersome, and it's yet another tool to use and learn. It's fine if we want to make good use of it in the future, and not a fancy assert_eq!(format!("{obj:?}"), "Struct { field: 'bla'} ").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm refactoring this, I want to do it properly. Note that I'm not using snapshot files, I'm using the inline snapshots, which makes this much easier to grok, I think.

I'm already anticipating future changes (like your PR to add the backends) parameter. It's no fun updating tens of tests anytime you change the structure of the thing that you parse. Snapshot testing makes this much easier.

I will document in readme what to do to update the snapshots though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insta adds snapshot files locally when a test fails, the pending snap file. I saw that when adding the test I shared. It didn’t remove it when the test passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, these things. Right, I guess we can add them to .gitignore, but they should be removed after you use cargo insta review I think (the snapshots shouldn't really be modified manually).

172 changes: 154 additions & 18 deletions site/src/request_handlers/github.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,13 @@ use crate::github::{
};
use crate::load::SiteCtxt;

use std::sync::Arc;

use hashbrown::HashMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hashbrown and not std?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason, it was autoimported. But I don't see any disadvantage in using hashbrown, since we already depend on it, it doesn't really matter. We don't need DOS protection here xD

use regex::Regex;
use std::sync::Arc;

lazy_static::lazy_static! {
static ref BODY_TIMER_BUILD: Regex =
Regex::new(r"(?:\W|^)@rust-timer\s+build\s+(\w+)(?:\W|$)(?:include=(\S+))?\s*(?:exclude=(\S+))?\s*(?:runs=(\d+))?").unwrap();
static ref BODY_TIMER_QUEUE: Regex =
Regex::new(r"(?:\W|^)@rust-timer\s+queue(?:\W|$)(?:include=(\S+))?\s*(?:exclude=(\S+))?\s*(?:runs=(\d+))?").unwrap();
}

pub async fn handle_github(
Expand Down Expand Up @@ -118,26 +116,25 @@ async fn handle_rust_timer(
return Ok(github::Response);
}

if let Some(captures) = BODY_TIMER_QUEUE.captures(&comment.body) {
let include = captures.get(1).map(|v| v.as_str());
let exclude = captures.get(2).map(|v| v.as_str());
let runs = captures.get(3).and_then(|v| v.as_str().parse::<i32>().ok());
{
let conn = ctxt.conn().await;
conn.queue_pr(issue.number, include, exclude, runs).await;
}
main_client
.post_comment(
issue.number,
if let Some(queue) = parse_queue_command(&comment.body) {
let msg = match queue {
Ok(cmd) => {
let conn = ctxt.conn().await;
conn.queue_pr(issue.number, cmd.include, cmd.exclude, cmd.runs)
.await;
format!(
"Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
{COMMENT_MARK_TEMPORARY}"
),
)
.await;
)
}
Err(error) => {
format!("Error occurred while parsing comment: {error}")
}
};
main_client.post_comment(issue.number, msg).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to take special care (like sanitization) in posting the error message to GH as its text will be partially controlled by users' input commands?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea. I don't think so, it should have the same rules as when you create the comment manually. I guess that the user could make the bot print "Unknown command argument SEND MONEY TO THE FOLLOWING BITCOIN ADDRESS TO SPONSOR THE RUST FOUNDATION", but I don't think that they can XSS or something like that (that would be really bad hole in GH's security).

return Ok(github::Response);
}

Expand All @@ -163,6 +160,68 @@ async fn handle_rust_timer(
Ok(github::Response)
}

/// Parses the first occurrence of a `@rust-timer queue <...>` command
/// in the input string.
fn parse_queue_command(body: &str) -> Option<Result<QueueCommand, String>> {
let prefix = "@rust-timer";
let bot_line = body.lines().find_map(|line| {
let line = line.trim();
Kobzol marked this conversation as resolved.
Show resolved Hide resolved
line.find(prefix)
.map(|index| line[index + prefix.len()..].trim())
})?;

let args = bot_line.strip_prefix("queue").map(|l| l.trim())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH never reformats comments, right? That is, it will not introduce unexpected \ns that would separate the command from some of its arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh, that would be quite surprising I think. GH actually has a bunch of formats of the comments, like text, HTML, MD etc., but I think (hope) that here we work with the raw text. We were already requiring = to be right after include etc. before, so I think that this should be fine.

let mut args = match parse_command_arguments(args) {
Ok(args) => args,
Err(error) => return Some(Err(error)),
};
let mut cmd = QueueCommand {
include: args.remove("include"),
exclude: args.remove("exclude"),
runs: None,
};
if let Some(runs) = args.remove("runs") {
let Ok(runs) = runs.parse::<u32>() else {
return Some(Err(format!("Cannot parse runs {runs} as a number")));
};
cmd.runs = Some(runs as i32);
}

if let Some((key, _)) = args.into_iter().next() {
return Some(Err(format!("Unknown command argument {key}")));
}
Kobzol marked this conversation as resolved.
Show resolved Hide resolved

Some(Ok(cmd))
}

/// Parses command arguments from a single line of text.
/// Expects that arguments are separated by whitespace, and each argument
/// has the format `<key>=<value>`.
fn parse_command_arguments(args: &str) -> Result<HashMap<&str, &str>, String> {
let mut argmap = HashMap::new();
for arg in args.split_whitespace() {
let Some((key, value)) = arg.split_once('=') else {
return Err(format!(
"Invalid command argument `{arg}` (there may be no spaces around the `=` character)"
));
};
let key = key.trim();
let value = value.trim();
if argmap.insert(key, value).is_some() {
return Err(format!("Duplicate command argument `{key}`"));
}
}

Ok(argmap)
}

#[derive(Debug)]
struct QueueCommand<'a> {
include: Option<&'a str>,
exclude: Option<&'a str>,
runs: Option<i32>,
}

/// Run the `@rust-timer build` regex over the comment message extracting the commit and the other captures
fn build_captures(comment_body: &str) -> impl Iterator<Item = (&str, regex::Captures)> {
BODY_TIMER_BUILD
Expand Down Expand Up @@ -196,6 +255,7 @@ pub async fn get_authorized_users() -> Result<Vec<u64>, String> {
#[cfg(test)]
mod tests {
use super::*;

#[test]
fn captures_all_shas() {
let comment_body = r#"
Expand All @@ -215,4 +275,80 @@ Going to do perf runs for a few of these:
]
);
}

#[test]
fn command_missing() {
assert!(parse_queue_command("").is_none());
}

#[test]
fn unknown_command() {
assert!(parse_queue_command("@rust-timer foo").is_none());
}

#[test]
fn queue_command() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue"),
@"Some(Ok(QueueCommand { include: None, exclude: None, runs: None }))");
}

#[test]
fn queue_command_unknown_arg() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue foo=bar"),
@r###"Some(Err("Unknown command argument foo"))"###);
}

#[test]
fn queue_command_duplicate_arg() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue include=a exclude=c include=b"),
@r###"Some(Err("Duplicate command argument `include`"))"###);
}

#[test]
fn queue_command_include() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue include=abcd,feih"),
@r###"Some(Ok(QueueCommand { include: Some("abcd,feih"), exclude: None, runs: None }))"###);
}

#[test]
fn queue_command_exclude() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue exclude=foo134,barzbaz41baf"),
@r###"Some(Ok(QueueCommand { include: None, exclude: Some("foo134,barzbaz41baf"), runs: None }))"###);
}

#[test]
fn queue_command_runs() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue runs=5"),
@"Some(Ok(QueueCommand { include: None, exclude: None, runs: Some(5) }))");
}

#[test]
fn queue_command_runs_nan() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue runs=xxx"),
@r###"Some(Err("Cannot parse runs xxx as a number"))"###);
}

#[test]
fn queue_command_combination() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue include=acda,13asd exclude=c13,DA runs=5"),
@r###"Some(Ok(QueueCommand { include: Some("acda,13asd"), exclude: Some("c13,DA"), runs: Some(5) }))"###);
}

#[test]
fn queue_command_argument_spaces() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue include = abcd,das"),
@r###"Some(Err("Invalid command argument `include` (there may be no spaces around the `=` character)"))"###);
}

#[test]
fn queue_command_spaces() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@rust-timer queue include=abcd,das "),
@r###"Some(Ok(QueueCommand { include: Some("abcd,das"), exclude: None, runs: None }))"###);
}

#[test]
fn queue_command_with_bors() {
insta::assert_compact_debug_snapshot!(parse_queue_command("@bors try @rust-timer queue include=foo,bar"),
@r###"Some(Ok(QueueCommand { include: Some("foo,bar"), exclude: None, runs: None }))"###);
}
}
Loading