Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(codecs): new syslog codec #17668

Closed
wants to merge 1 commit into from

Conversation

syedriko
Copy link
Contributor

feat(syslog codec) POC of a syslog codec
This is a naive syslog codec implementation, geared towards the needs of the OpenShift logging stack

@netlify
Copy link

netlify bot commented Jun 12, 2023

Deploy Preview for vrl-playground ready!

Name Link
🔨 Latest commit 38dfb93
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/64876b4ed83dc30009ce6c26
😎 Deploy Preview https://deploy-preview-17668--vrl-playground.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@netlify
Copy link

netlify bot commented Jun 12, 2023

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit 38dfb93
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/64876b4e4231e70008b30526
😎 Deploy Preview https://deploy-preview-17668--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot added the domain: codecs Anything related to Vector's codecs (encoding/decoding) label Jun 12, 2023
@syedriko syedriko mentioned this pull request Jun 12, 2023
@syedriko syedriko changed the title WIP, do not merge: POC of a syslog codec syslog codec Nov 12, 2023
@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Nov 12, 2023
@syedriko syedriko marked this pull request as ready for review November 12, 2023 19:21
@syedriko syedriko requested a review from a team November 12, 2023 19:21
Copy link

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by review if helpful.

It's very dense, so main takeaways are:

  • Refactor (facility + severity logic): Consider making these "string to number" match blocks DRY?
  • Refactor (simplify methods with less nesting): I've additionally provided suggestions to simply encode(), get_field_or_config() + get_field() and get_timestamp().
    • encode() could be improved, but the approach is better for constructing the string vs push() / push_str() (especially when no capacity was set in advance). PRI value construction might be better hidden away as a method on your SyslogRFC enum?
  • Questions: A few queries regarding SyslogSerializerConfig struct:
    • Should it really inline the syslog data fields (specific to composing a syslog message) with the config settings of the serializer? I might have misunderstood the intent, as I am not familiar with Vector codebase.
    • Perhaps replace the #[serde(default)] attribute annotations with preference to a single annotation on the whole struct + additional #[derive(Default)]?
    • The three struct fields you customized defaults for could be better served as Option<String>?
      • The default methods could still apply when None, which (with the exception of assigning an actual fallback with app_name) would apply the NIL_VALUE fallback during encode() via your get_field_or_config() method?
      • Additionally these RFC 5424 fields roughly replace RFC 3164 tag field (which you could construct from these fields). Does it make sense to keep tag in this struct? How will it be used? (the doc comments don't highlight that it is only valid with RFC 3164)
      • NIL_VALUE variable might benefit with some added context for maintainers?

Comment on lines +142 to +192
fn encode(&mut self, event: Event, buffer: &mut BytesMut) -> Result<(), Self::Error> {
match event {
Event::Log(log) => {
let mut buf = String::from("<");
let pri = get_num_facility(&self.config.facility, &log) * 8 + get_num_severity(&self.config.severity, &log);
buf.push_str(&pri.to_string());
buf.push_str(">");
match self.config.rfc {
SyslogRFC::Rfc3164 => {
let timestamp = get_timestamp(&log);
let formatted_timestamp = format!(" {} ", timestamp.format("%b %e %H:%M:%S"));
buf.push_str(&formatted_timestamp);
buf.push_str(&get_field("hostname", &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&self.config.tag, &log));
buf.push_str(": ");
if self.config.add_log_source {
add_log_source(&log, &mut buf);
}
},
SyslogRFC::Rfc5424 => {
buf.push_str("1 ");
let timestamp = get_timestamp(&log);
buf.push_str(&timestamp.to_rfc3339_opts(SecondsFormat::Millis, true));
buf.push(' ');
buf.push_str(&get_field("hostname", &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&&self.config.app_name, &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&&self.config.proc_id, &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&&self.config.msg_id, &log));
buf.push_str(" - "); // no structured data
if self.config.add_log_source {
add_log_source(&log, &mut buf);
}
}
}
let mut payload = if self.config.payload_key.is_empty() {
serde_json::to_vec(&log).unwrap_or_default()
} else {
get_field(&&self.config.payload_key, &log).as_bytes().to_vec()
};
let mut vec = buf.as_bytes().to_vec();
vec.append(&mut payload);
buffer.put_slice(&vec);
},
_ => {}
}
Ok(())
}
Copy link

@polarathene polarathene Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn encode(&mut self, event: Event, buffer: &mut BytesMut) -> Result<(), Self::Error> {
match event {
Event::Log(log) => {
let mut buf = String::from("<");
let pri = get_num_facility(&self.config.facility, &log) * 8 + get_num_severity(&self.config.severity, &log);
buf.push_str(&pri.to_string());
buf.push_str(">");
match self.config.rfc {
SyslogRFC::Rfc3164 => {
let timestamp = get_timestamp(&log);
let formatted_timestamp = format!(" {} ", timestamp.format("%b %e %H:%M:%S"));
buf.push_str(&formatted_timestamp);
buf.push_str(&get_field("hostname", &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&self.config.tag, &log));
buf.push_str(": ");
if self.config.add_log_source {
add_log_source(&log, &mut buf);
}
},
SyslogRFC::Rfc5424 => {
buf.push_str("1 ");
let timestamp = get_timestamp(&log);
buf.push_str(&timestamp.to_rfc3339_opts(SecondsFormat::Millis, true));
buf.push(' ');
buf.push_str(&get_field("hostname", &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&&self.config.app_name, &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&&self.config.proc_id, &log));
buf.push(' ');
buf.push_str(&get_field_or_config(&&self.config.msg_id, &log));
buf.push_str(" - "); // no structured data
if self.config.add_log_source {
add_log_source(&log, &mut buf);
}
}
}
let mut payload = if self.config.payload_key.is_empty() {
serde_json::to_vec(&log).unwrap_or_default()
} else {
get_field(&&self.config.payload_key, &log).as_bytes().to_vec()
};
let mut vec = buf.as_bytes().to_vec();
vec.append(&mut payload);
buffer.put_slice(&vec);
},
_ => {}
}
Ok(())
}
fn encode(&mut self, event: Event, buffer: &mut BytesMut) -> Result<(), Self::Error> {
if let Event::Log(log) = event {
let pri_num = get_num_facility(&self.config.facility, &log) * 8 + get_num_severity(&self.config.severity, &log);
let pri = ["<", &pri_num.to_string(), ">"].concat();
let timestamp = get_timestamp(&log);
let hostname = get_field("hostname", &log);
let mut syslog_packet = match self.config.rfc {
SyslogRFC::Rfc3164 => {
let header = [
&timestamp.format("%b %e %H:%M:%S"),
&hostname,
].join(" ");
let msg = [
&get_field_or_config(&self.config.tag, &log), // `TAG` field
":", // Start of `CONTENT` field
// `payload` later appends here the remaining data for the `CONTENT` field
].concat();
[pri, header, " ", msg].concat()
},
SyslogRFC::Rfc5424 => {
// Equivalent to `TAG` field from RFC 3164:
let tag = [
&get_field_or_config(&self.config.app_name, &log),
&get_field_or_config(&self.config.proc_id, &log),
&get_field_or_config(&self.config.msg_id, &log),
].join(" ");
let header = [
"1", // `VERSION` field
&timestamp.to_rfc3339_opts(SecondsFormat::Millis, true),
&hostname,
// RFC 5424 moves equivalent data of the RFC 3164 `TAG` field
// from `MSG` part to `HEADER` part:
&tag,
// "structured data" support not implemented:
NIL_VALUE,
].join(" ");
[pri, header].concat()
}
};
// Add final delimiter before `payload` is appended:
syslog_packet.push(' ');
// Insert third-party message content prior to payload?:
if self.config.add_log_source {
add_log_source(&log, &mut syslog_packet);
}
let mut payload = if self.config.payload_key.is_empty() {
serde_json::to_vec(&log).unwrap_or_default()
} else {
get_field(&self.config.payload_key, &log).as_bytes().to_vec()
};
let mut vec = syslog_packet.as_bytes().to_vec();
vec.append(&mut payload);
buffer.put_slice(&vec);
}
Ok(())
}

Change overview

  • Matching on event is not necessary, use if let on expected enum variant instead. Removes redundant logic and indentation.
  • RFC 3164 begins the TIMESTAMP header field directly after the PRI part/field, should not have a delimiter between values (reference).
  • Common fields adjusted to be referenced as variables prior to RFC match blocks.
  • Use [ ... ].join(" ") / [ ... ].concat() methods to construct the string(s). Easier to grok by minimizing the delimiter + .push_str() / push() noise, additionally more performant AFAIK as allocated capacity for String value should be more accurate (could be improved still due to updates after match).
  • For clarity, at least during review process for those that are less familiar with the structure (and perhaps benefit of future maintenance) I've better communicated the separate packet parts pri, header, msg and added some comments for review context (one is a question for add_log_source, as that seems dependent on deployment context, rather than generic syslog support?).

I am not familiar with Vector internals, but the logic interacting with the log input seems like there might be a better approach? (especially with the payload handling converting to a byte vec to append the payload?) I could be mistaken 😅


Additional context

While I think the suggestion above is a step in the right direction, some of the logic here may be better extracted to one of the existing data structures (config / RFC) as methods to generate the pri + header + msg parts of the syslog packet value?

  • At least that logic seems less useful to encode() for grokking a breakdown of the encode process.
  • RFC 5424 fields could represent a conversion to RFC 3164 format, rather than interleaving the fields into the same config struct?

RFC 5424 provides a comparison to BSD syslog (RFC 3164) (syslog-ng docs provide a more comprehensive overview of the two RFC formats that is nicer to digest):

Collapsed for brevity (click to expand)
  • There is advice for interop with TIMESTAMP field (user local timezone + drop timezone and year information), and the inverse if ingesting RFC 3164 syslog to relay/collect.
  • RFC 3164 documents relay behaviours that RFC 5424 doesn't focus on (potentially relevant for syslog source + sink?)
  • HOSTNAME is compatible.
  • PRI value syntax and semantics are retained for compatibility.
    • NOTE: While the PRI part is defined formally as a field of the HEADER part (unlike in RFC 3164), the header now starts with the VERSION field which like the first field of the HEADER part from RFC 3164 (TIMESTAMP) begins immediately after PRI without a space delimiter. TIMESTAMP follows after VERSION + space delimiter.
  • The MSG part was previously split into TAG and CONTENT fields.
    • Now the MSG part only represents the CONTENT field while the TAG field has been relocated into the HEADER part and split into 3 separate fields (APP-NAME, PROCID, MSGID).
    • APP-NAME would be equivalent to TAG (usually a process name), while PROCID would have been the optional [value] (usually PID) that would technically have been part of the CONTENT field (along with : that'd follow before the actual message payload). PROCID is no longer wrapped with [ ], nor is : used.
    • MSGID provides additional context, but all three of these fields are optional and can be opt-out via - (aka NILVALUE).
  • The new Structured Data field (_opt-out via - / NILVALUE_) is said to be treated as part of CONTENT` if formatting to RFC 3164. Some resources note that this field isn't common to encounter in actual usage?
  • RFC 3164 implementations lack consistency.
    • Wikipedia notes RFC 3164 (2001) was an effort to document the status quo at the time, while RFC 5424 (2009) formally standardized the format.
    • Feb 2021 blogpost conveys that since RFC 5424 was published, it is now the more commonly encountered format (however some popular software like HAProxy is known to default to RFC3164 by default), Rsyslog follows RFC 5424.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference RFC 5424 syslog message is constructed elsewhere with the more convenient (but less performant) format! macro:

vector/src/sources/syslog.rs

Lines 1372 to 1388 in c7ae2a6

impl fmt::Display for SyslogMessageRfc5424 {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(
f,
"<{}>{} {} {} {} {} {} {} {}",
encode_priority(self.severity, self.facility),
self.version,
self.timestamp,
self.host,
self.appname,
self.procid,
self.msgid,
format_structured_data_rfc5424(&self.structured_data),
self.message
)
}
}

Which shows one of the benefits of extracting that logic out of this encode() method, performance/efficiency wise you may want to avoid unnecessary allocations (creating the String value via .join()/.concat() should have similar benefits to String::with_capacity()). You may want to wait for the Vector devs to weigh in on this.

References:

Comment on lines +81 to +83
/// Tag
#[serde(default)]
tag: String,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 3164 only. Can technically construct this during encode method from the RFC 5424 equivalent fields.

Is it useful to keep in this struct?

Comment on lines +88 to +90
/// Payload key
#[serde(default)]
payload_key: String,
Copy link

@polarathene polarathene Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the doc comment mention the syslog equivalent representation that this lookup key will provide?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this option. This should always be the field with the meaning message.

Comment on lines +63 to +66
/// Config used to build a `SyslogSerializer`.
#[configurable_component]
#[derive(Debug, Clone, Default)]
pub struct SyslogSerializerConfig {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You apply the tag to each individual field, would it be better to just apply it once here and provide an impl Default or derive(Default) instead?

Suggested change
/// Config used to build a `SyslogSerializer`.
#[configurable_component]
#[derive(Debug, Clone, Default)]
pub struct SyslogSerializerConfig {
/// Config used to build a `SyslogSerializer`.
#[configurable_component]
#[derive(Debug, Default, Clone, Default)]
#[serde(default)]
pub struct SyslogSerializerConfig {

Comment on lines +19 to +20
/// RFC 5424
Rfc5424

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to highlight that RFC 5424 obsoletes RFC 3164?:

This document obsoletes RFC 3164, which is an Informational document
describing some implementations found in the field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum could also technically align with the equivalent Protocol enum (used in decoding/format/syslog.rs) from the syslog_loose crate which is already added to Vector as a dependency?

There is no default implemented by that crates equivalent enum though. As Vector is only using it for parsing syslog messages, perhaps there isn't much value in re-use here 🤷‍♂️

Comment on lines +320 to +375
fn get_num_facility(config_facility: &Facility, log: &LogEvent) -> u8 {
match config_facility {
Facility::Fixed(num) => return *num,
Facility::Field(field_name) => {
if let Some(field_value) = log.get(field_name.as_str()) {
let field_value_string = String::from_utf8(field_value.coerce_to_bytes().to_vec()).unwrap_or_default();
let num_value = field_value_string.parse::<u8>();
match num_value {
Ok(num) => {
if num > 23 {
return 1 // USER
} else {
return num
}
}
Err(_) => {
let num = match field_value_string.to_uppercase().as_str() {
"KERN" => 0,
"USER" => 1,
"MAIL" => 2,
"DAEMON" => 3,
"AUTH" => 4,
"SYSLOG" => 5,
"LPR" => 6,
"NEWS" => 7,
"UUCP" => 8,
"CRON" => 9,
"AUTHPRIV" => 10,
"FTP" => 11,
"NTP" => 12,
"SECURITY" => 13,
"CONSOLE" => 14,
"SOLARIS-CRON" => 15,
"LOCAL0" => 16,
"LOCAL1" => 17,
"LOCAL2" => 18,
"LOCAL3" => 19,
"LOCAL4" => 20,
"LOCAL5" => 21,
"LOCAL6" => 22,
"LOCAL7" => 23,
_ => 24,
};
if num > 23 {
return 1 // USER
} else {
return num
}
}
}
} else {
return 1 // USER
}
}
}
}
Copy link

@polarathene polarathene Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the strum / enum approach for Facility and Severity, these get_num_facility() / get_num_severity() methods may be redundant and ca be dropped?

These two methods are only used within encode() for composing PRIVAL, which is as simple as either:

  • (facility as u8 << 3) | severity as u8
  • (facility as u8 * 8) + severity as u8

Since Facility would be set already by deserialize_facility(), I'm not sure how important all this logic here is?

My main question is with handling of log field with log.get():

if let Some(field_value) = log.get(field_name.as_str()) {
  let field_value_string = String::from_utf8(field_value.coerce_to_bytes().to_vec()).unwrap_or_default();
  // ...

You have conditions that return a fallback (Facility::User / 1)

  • One condition implies the value could exceed the bound of 23 (either as number parsed from string, or field name not matching a valid facility name).
  • If log.get() returns NONE to default to Facility::User.

Since it appears ok for the number variant to return a value directly (and thus has been set via deserialize_facility()?), I'm assuming that these two lines extract the same string value that deserialize_facility() would have accomplished (deserializing into value)?

  • I may have misunderstood your usage, and this Field enum variant may not map to the equivalent facility?
    • You store it to lookup the value via log.get() above here? (that would be either a number or syslog facility name?)
    • Does it provide some other functionality? (dynamic?)
    • Are my suggestions thus not compatible?

This method otherwise doesn't appear needed, since it'd effectively become:

Suggested change
fn get_num_facility(config_facility: &Facility, log: &LogEvent) -> u8 {
match config_facility {
Facility::Fixed(num) => return *num,
Facility::Field(field_name) => {
if let Some(field_value) = log.get(field_name.as_str()) {
let field_value_string = String::from_utf8(field_value.coerce_to_bytes().to_vec()).unwrap_or_default();
let num_value = field_value_string.parse::<u8>();
match num_value {
Ok(num) => {
if num > 23 {
return 1 // USER
} else {
return num
}
}
Err(_) => {
let num = match field_value_string.to_uppercase().as_str() {
"KERN" => 0,
"USER" => 1,
"MAIL" => 2,
"DAEMON" => 3,
"AUTH" => 4,
"SYSLOG" => 5,
"LPR" => 6,
"NEWS" => 7,
"UUCP" => 8,
"CRON" => 9,
"AUTHPRIV" => 10,
"FTP" => 11,
"NTP" => 12,
"SECURITY" => 13,
"CONSOLE" => 14,
"SOLARIS-CRON" => 15,
"LOCAL0" => 16,
"LOCAL1" => 17,
"LOCAL2" => 18,
"LOCAL3" => 19,
"LOCAL4" => 20,
"LOCAL5" => 21,
"LOCAL6" => 22,
"LOCAL7" => 23,
_ => 24,
};
if num > 23 {
return 1 // USER
} else {
return num
}
}
}
} else {
return 1 // USER
}
}
}
}
fn get_num_facility(config_facility: &Facility, log: &LogEvent) -> u8 {
config_facility as u8
}

Comment on lines +377 to +416
fn get_num_severity(config_severity: &Severity, log: &LogEvent) -> u8 {
match config_severity {
Severity::Fixed(num) => return *num,
Severity::Field(field_name) => {
if let Some(field_value) = log.get(field_name.as_str()) {
let field_value_string = String::from_utf8(field_value.coerce_to_bytes().to_vec()).unwrap_or_default();
let num_value = field_value_string.parse::<u8>();
match num_value {
Ok(num) => {
if num > 7 {
return 6 // INFORMATIONAL
} else {
return num
}
}
Err(_) => {
let num = match field_value_string.to_uppercase().as_str() {
"EMERGENCY" => 0,
"ALERT" => 1,
"CRITICAL" => 2,
"ERROR" => 3,
"WARNING" => 4,
"NOTICE" => 5,
"INFORMATIONAL" => 6,
"DEBUG" => 7,
_ => 8,
};
if num > 7 {
return 6 // INFORMATIONAL
} else {
return num
}
}
}
} else {
return 6 // INFORMATIONAL
}
}
}
}
Copy link

@polarathene polarathene Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feedback as raised for get_num_facility().

If the proposed enum is sufficient, this can be discarded to get the desired u8 value for the enum variant.

Comment on lines +435 to +444
match log.get("@timestamp") {
Some(value) => {
if let Value::Timestamp(timestamp) = value {
DateTime::<Local>::from(*timestamp)
} else {
Local::now()
}
},
_ => Local::now()
}
Copy link

@polarathene polarathene Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is a bit simpler with map_or_else():

Suggested change
match log.get("@timestamp") {
Some(value) => {
if let Value::Timestamp(timestamp) = value {
DateTime::<Local>::from(*timestamp)
} else {
Local::now()
}
},
_ => Local::now()
}
log.get("@timestamp").map_or_else(
|| Local::now(),
|timestamp| DateTime::<Local>::from(*timestamp)
)

The two function args to map_or_else():

  1. Handles the fallback (else), so you only need to cover that once.
  2. Operates on the Some() value (map), so we can handle your conversion here without the if let and match nesting 👍

For reference to reviewers:

Comment on lines +418 to +432
fn get_field_or_config(config_name: &String, log: &LogEvent) -> String {
if let Some(field_name) = config_name.strip_prefix("$.message.") {
return get_field(field_name, log)
} else {
return config_name.clone()
}
}

fn get_field(field_name: &str, log: &LogEvent) -> String {
if let Some(field_value) = log.get(field_name) {
return String::from_utf8(field_value.coerce_to_bytes().to_vec()).unwrap_or_default();
} else {
return NIL_VALUE.to_string()
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these two methods you shouldn't need an explicit return keyword AFAIK? (omit the ; for unwrap_or_default())

I am not familiar with Vector codebase, so if using return keyword explicitly is their preferred convention, ignore me :)


You could alternatively adopt the unwrap_or_else(), since the logic here is quite simple (providing a fallback value if None).

Comment on lines +92 to +94
/// Add log source
#[serde(default)]
add_log_source: bool,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few fields in this struct unrelated to syslog packet/message format, would it be better to extract out the syslog format fields into their own sub struct? Or is that an issue with how Vector (or it's config UX) works?

You can reference existing syslog support in Vector, a similar struct is already defined (may want to align the struct field names to match?):

vector/src/sources/syslog.rs

Lines 1326 to 1340 in c7ae2a6

#[derive(Deserialize, PartialEq, Clone, Debug)]
struct SyslogMessageRfc5424 {
msgid: String,
severity: Severity,
facility: Facility,
version: u8,
timestamp: String,
host: String,
source_type: String,
appname: String,
procid: usize,
message: String,
#[serde(flatten)]
structured_data: StructuredData,
}

@bruceg bruceg requested a review from a team November 14, 2023 00:31
@syedriko
Copy link
Contributor Author

@polarathene Thanks for a detailed review, I've learned a few things. There's some history of this PR here: #6863. This code was supposed to mimic the fluentd syslog plugin, with some extensions. The vector configuration is not directly exposed to the user but auto-generated by a k8s operator. (thus RFC-specific fields in configuration, like Tag). The idea behind

enum Severity {
    /// Syslog severity ordinal number
    Fixed(u8),

    /// Syslog severity name
    Field(String)
}

(if we take severity as an example) is to be able to set the severity value in configuration as a number, within the correct limits, or a string, which in turn can be the name of a severity level, like "ALERT" or a log event field reference, if it is of the $.message.take_sev_from_this_field form. The $.message syntax is from the fluentd implementation.
There is a custom extension around the "add_log_source" logic that is k8s-specific.
I'm hoping to start with this code and arrive at a generic implementation that is good for the Vector upstream but can be customized/specialized to do what this code does now.

@bruceg bruceg changed the title syslog codec feat(codecs): new syslog codec Nov 15, 2023
@gaby
Copy link

gaby commented Dec 5, 2023

@syedriko Any progress on this?

@polarathene
Copy link

polarathene commented Dec 5, 2023

Sorry, I didn't get around to putting this response through earlier 😓

This was a good chunk of it below, with a little bit more and some pseudo-code that I've excluded.

I have refactored this PR with a local build based on my original feedback. It'd need a little bit more work but if you'd like I could either open a new PR with that or do a 2nd review with the change suggestions in bulk?

I have kept the third-party additions, so it retains that functionality, although I don't think it should have either the add_log_source() or $.message. prefix logic since that seems more relevant to something like VRL with remap handling? (the k8s log source could also be adapted to RFC5242 structured data)


@polarathene Thanks for a detailed review, I've learned a few things.

You're welcome! Apologies for the verbosity 😅

  • The top-level review comment should convey anything relevant to address (with links to inline review comments providing more context).
  • I had to read up on the syslog RFCs to get a better understanding of the PR, I've referenced those documents a fair bit with links (might benefit some other reviewers 🤷‍♂️ ).
  • Unfortunately I lack experience with Vector and the codebase, so I can't provide much input on those specifics.

I'm hoping to start with this code and arrive at a generic implementation that is good for the Vector upstream but can be customized/specialized to do what this code does now.

Sounds great! ❤️

I'm lacking time to build it locally and configure for a project that I'm interested in bringing Vector into. Hopefully I can allocate some time this month or next to give it a try and ensure the syslog codec support integrates well 👍


Additional feedback

(if we take severity as an example) is to be able to set the severity value in configuration as:

  • a number, within the correct limits,
  • or a string, which in turn can be the name of a severity level, like "ALERT"
  • or a log event field reference, if it is of the $.message.take_sev_from_this_field form. The $.message syntax is from the fluentd implementation.
  • There is a custom extension around the add_log_source logic that is k8s-specific.

That functionality is retained (and simpler?) with the enum approach suggested:

// Must be within the ordinal bounds, otherwise `None` => `Err(...)`:
let s = Severity::from_repr( num ).ok_or_else(|| /* Err */ )

// String must match a valid enum variant, otherwise `Err(...)`:
let s = Severity::from_str( str ).map_err( /* Err */ )

// Regardless of how you created `s`, easily get the desired output:
println!("{}", s as u8);
// strum implements Display (so you could also use `.to_string()`):
println!("{s}");

Making the custom deserialize methods quite a bit simpler to grok, and the get_num_*() method for each redundant? Composing the PRIVAL is still simple:

// NOTE: This would be a bit nicer out of `encode()`, instead via a struct method?
let prival = (facility as u8 * 8) + severity as u8;
let pri = ["<", &prival.to_string(), ">"].concat();

I'm unable to provide much feedback for the latter two points.

  • Are they not compatible with the suggested enum approach?
  • The variant from custom key lookup would be a separate responsibility?

@StephenWakely
Copy link
Contributor

Hi. Thanks for doing this, this could be very useful to incorporate.

Is there a compelling reason to support RFC3164? Given how loosely specified 3164 is, I feel this would just add complexity that we don't really need. If we can I would prefer to drop this.

It would be really useful to be able to create structured data. This could be done by adding a map of template fields to the configuration. Something similar to the way the loki sink handles labels.

The ID field could just be a string field in the config. So the config for this would look something like:

structured_data:
  id: data@12312
  fields:
    thing: "{{ thing }}"
    thang: "{{ thang }}"

The spec allows for multiple structured data sections - but I think we should be able to just allow a single section to be specified, at least for the initial version.

Would it be possible to add this?

@polarathene
Copy link

Is there a compelling reason to support RFC3164? Given how loosely specified 3164 is, I feel this would just add complexity that we don't really need. If we can I would prefer to drop this.

Some software still seems to emit logs in that format, so it's preferred if Vector is to ingest the logs but the user still wants to maintain compatibility with the current RFC 3164 logs.

Which happens to be my case at a glance. I want to bring Vector into a project with a wide user base, but I don't want that to be too disruptive if possible. Supporting RFC 3164 doesn't seem like it requires that much more logic 🤷‍♂️

It would be really useful to be able to create structured data.

My refactor has this functionality implemented.

Would it be possible to add this?

My approach just relied on remap to provide a structured_data object. I'm not familiar with Vector config implementations, so I'm not sure when it's appropriate to have config fields for static / default data inputs vs config fields that expect a key to reference data from the log event.

@StephenWakely
Copy link
Contributor

Is there a compelling reason to support RFC3164? Given how loosely specified 3164 is, I feel this would just add complexity that we don't really need. If we can I would prefer to drop this.

Some software still seems to emit logs in that format, so it's preferred if Vector is to ingest the logs but the user still wants to maintain compatibility with the current RFC 3164 logs.

Which happens to be my case at a glance. I want to bring Vector into a project with a wide user base, but I don't want that to be too disruptive if possible. Supporting RFC 3164 doesn't seem like it requires that much more logic 🤷‍♂️

It's more UX complexity I am concerned with for 3164.

It would be really useful to be able to create structured data.

My refactor has this functionality implemented.

Would it be possible to add this?

My approach just relied on remap to provide a structured_data object. I'm not familiar with Vector config implementations, so I'm not sure when it's appropriate to have config fields for static / default data inputs vs config fields that expect a key to reference data from the log event.

I'm not sure I fully understand. What is your refactor?

@polarathene
Copy link

It's more UX complexity I am concerned with for 3164.

Could you elaborate?

On the user end with config it defaults to RFC 5424, but you can explicitly add the rfc key to request RFC 3164 instead. What UX concerns do you have?

I'm not sure I fully understand. What is your refactor?

I've forked this PR and rewritten the logic based on my review feedback.

It is simpler and easier to review IMO, I've implemented Structured Data support for both RFC 5424 and 3164 (as RFC 5424 advises by prepending it to a message when not NIL).

I'll probably raise a separate PR in the weekend since there is not much activity here since my review.

@StephenWakely
Copy link
Contributor

It's more UX complexity I am concerned with for 3164.

Could you elaborate?

According to the RFC, 5424 has made 3164 obsolete.

3164 is very loosely specified, so really if we say we are conforming to 3164 it just means 'our interpretation of a 3164 structure'. If we have an incoming 3164 message and we send out a 3164 message - there is no guarantee that they would be exactly the same text. Sending a 3164 message to a system that claims to parse 3164 doesn't provide any real guarantee that it will actually be able to parse our 3164 message.

Plus I don't like that the year gets dropped in the timestamp..

With a 5424 message it is much more precise and we can have more guarantees about the structure of the message.

I'm not saying absolutely we shouldn't support 3164, if, say, there was a well known system that was unable to parse 5124 messages then that would be reason enough for us to support it. But to my knowledge there isn't such a system, but I am all ears if you know of any?

@polarathene
Copy link

polarathene commented Dec 8, 2023

According to the RFC, 5424 has made 3164 obsolete.

Yes, and where possible it should be adopted. A project I work on that bundles popular software in Docker is still outputting logs with RFC 3164, and I plan to change that with Vector, but I initially want that RFC 3164 support, it is very minimal to support.

Here is the RFC 3164 specific portion:

SyslogRFC::Rfc3164 => {
    // TIMESTAMP field format:
    // https://datatracker.ietf.org/doc/html/rfc3164#section-4.1.2
    let timestamp = self.timestamp.format("%b %e %H:%M:%S").to_string();

    // MSG part begins with TAG field + optional context:
    // https://datatracker.ietf.org/doc/html/rfc3164#section-4.1.3
    let mut msg_start = self.tag.encode_rfc_3164();

    // When RFC 5424 "Structured Data" is available, it can be included in the `CONTENT` field:
    // https://datatracker.ietf.org/doc/html/rfc5424#appendix-A.1
    if let Some(sd) = structured_data.as_deref() {
        msg_start = msg_start + " " + sd
    }

    [
        timestamp.as_str(),
        hostname,
        &msg_start,
    ].join(" ")
}

// ...

struct Tag {
    app_name: String,
    proc_id: Option<String>,
    msg_id: Option<String>
}

// `.as_deref()` usage below avoids requiring `self.clone()`
impl Tag {
    // Roughly equivalent RFC 5424 fields can compose the start of
    // RFC 3164 MSG part (TAG + CONTENT fields):
    // https://datatracker.ietf.org/doc/html/rfc5424#appendix-A.1
    fn encode_rfc_3164(&self) -> String {
        let Self { app_name, proc_id, msg_id } = self;

        match proc_id.as_deref().or(msg_id.as_deref()) {
            Some(context) => [&app_name, "[", &context, "]:"].concat(),
            None => [&app_name, ":"].concat()
        }
    }
// ...
}

Roughly 4 lines without comments and formatting (EDIT: Forgot the Tag portion, added that not much more). Not a maintenance burden.


If we say we are conforming to 3164 it just means 'our interpretation of a 3164 structure'.
If we have an incoming 3164 message and we send out a 3164 message - there is no guarantee that they would be exactly the same text.

Vector already has support for 3164 elsewhere, and outputs a specific format from the demo logs source for example. This aligns with the same 3164 logs I encounter with the popular software bundled via the project I maintain, so that's common enough for me 🤷

Sure it might not meet the expectation of everyone else for 3164, but I don't see an issue with this as:

  • Conforms to 3164 output emitted from existing Vector support for 3164
  • Conforms to 3164 output with popular software that emits logs in that format
  • Conforms to the advice of outputting RFC 3164 via the context of RFC 5424
  • A PR author contributing the functionality has some influence earned from time spent making the contribution as a whole.

Plus I don't like that the year gets dropped in the timestamp..

Then don't use the format? That's specific to RFC 3164, it's expected.

that would be reason enough for us to support it. But to my knowledge there isn't such a system, but I am all ears if you know of any?

It's a low maintenance feature to support as shown above. If you get requests for other variations with enough demand (or contributions) you can discuss if supporting that is worthwhile.

It's a motivating factor for me to contribute an improved version of this PR with that functionality intact. I don't mind it being deprecated if it's necessary to drop it in future for some reason, but I'd like RFC 3164 so that I can use that when switching over to Vector, and transitioning away from it.

If there is a strong argument from maintainers of Vector against that, I won't have much say, but I'd appreciate it considering how low maintenance it is and the existing RFC 3164 support present in Vector.

@jcantrill
Copy link

@polarathene I am very much interested in you driving this forward given @syedriko has been pulled in a different direction for us. We currently support RFC3164 in the design that @syedriko has presented but we have upcoming usages where I can push dropping its support though would take it if the burden is low. Regarding some of the specific deployment extentions, those I could live without given they are not part of the spec. Looking forward to your new PR

@polarathene
Copy link

polarathene commented Jan 3, 2024

@polarathene I am very much interested in you driving this forward given @syedriko has been pulled in a different direction for us.

Sure, I'll try find some time to raise the PR of what I have done. Should be done before mid jan 👍 (hopefully earlier depending on workload elsewhere)

We currently support RFC3164 in the design that @syedriko has presented but we have upcoming usages where I can push dropping its support though would take it if the burden is low.

As stated, for my usage logs are in RFC3164 too. I can likewise probably address that, but I don't see the extra code to support RFC 3164 to be much of a maintenance burden.

Regarding some of the specific deployment extentions, those I could live without given they are not part of the spec.

Great! In my fork I implemented structured data (for RFC 3164 it is prepended to the log message, as RFC 5424 advises). You can optionally use VRL to construct the same content in your config instead of the hard-coded approach that was done here.


UPDATE: Had a busy January and had to delay tackling this. Hoping to have time soon to make progress on pushing this forward.

UPDATE (March 14th): Actively working on my fork to get it in shape for review 👍

@jszwedko jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label Feb 14, 2024
@gaby
Copy link

gaby commented Jun 15, 2024

@polarathene Any update on this?

@polarathene
Copy link

@polarathene Any update on this?

Sorry I got burnt out earlier.

I am able to publish the progress I had, but it was not in a state I was comfortable with for review.

It still needs more work to be ready, but perhaps it could get some review prior and maybe someone else can assist as I realize I've not been able to reliably push forward on this as I've wanted to with all my other commitments 😓

@jcantrill
Copy link

I am able to publish the progress I had ... perhaps it could get some review

@polarathene Following-up if this ever happen? Can we either move this PR forward or see your proposed changes so that we can make progress

@polarathene
Copy link

Following-up if this ever happen?

No, nobody expressed interest since I wrote that comment.

see your proposed changes so that we can make progress

I just wrapped up a large task elsewhere, so I have time to publish the branch (and potentially rebase if there is no major conflicts to resolve). I'll tackle that tomorrow, getting late here 👍


My branch diverged quite a bit from the one here, which while an improvement.. I'm not sure if it'll help much towards making progress if nobody else with the skills wants to assist 😅

I do want to complete the work, it's just been difficult to budget my volunteer time for. So if nobody helps, I will eventually be able to block time out again for coming back to this. I sunk a considerable chunk of time into the branch already, would be an awful waste to leave it to rot.

@polarathene
Copy link

polarathene commented Jul 20, 2024

It seems like I actually did push my branch up months ago and forgot about that: https://github.com/polarathene/vector/tree/feat/syslog-codec

Diff (as if it was opened as a PR): master...polarathene:feat/syslog-codec


The last commit was April, and locally it looks like I was refactoring some structs and fields to address some concerns with configuration + VRL. I had to shift priorities and I think ran into a tricky problem, so that progress was not completed.

I haven't tried building the public branch I linked, but you could give that a go? (it's not been rebased to upstream, so there may be conflicts)

The commits should be fairly tidy and scoped with associated context in each commit message. I don't have time to write up much guidance, especially since I haven't touched the branch for several months.

If the branch itself fails to build successfully, that's probably what I was addressing at the time and why I didn't announce anything earlier. Pretty sure I had it working locally with the uncommitted changes.


Hopefully that's helpful enough for now?

I'll see if I can grok what I have locally, but I'm not sure if I'll have the time to get familiar with the uncommitted changes and push those improvements/refactors today.

I could open a PR with what I have on the public branch, but I wouldn't want to waste reviewer time (my local refactoring + no time to write up a proper PR description). Feel free to give it a glance/try and provide feedback though 👍

@Freakachoo
Copy link

Any updates on this? Really useful and needed!

@polarathene
Copy link

Really useful and needed!

I haven't worked on it for about a month or so. I rarely have the time spare lately. It would probably help to know how you'd like to use/configure it.

  • How important is it to have config for setting default/fallback values?
  • How important is it to have customization for keys vs managing it all via VRL?

I'm not sure if anyone else is trying to work with what I shared earlier, I assume they'd chime in here if they were. Not sure if anyone has tried to even run it 😅

Feedback on the last shared iteration would also be helpful to know if that's going in the right direction or not. I might have more time spare by October or November to finish up my branch.

@jcantrill
Copy link

I could open a PR with what I have on the public branch, but I wouldn't want to waste reviewer time

I do not know exactly what stalled this PR but I would recommend if you believe this one should be abandoned in favor of the code you have written then create a new PR. @syedriko is part of our organization and I would like to see this long standing ask pushed forward one way (merging this PR) or another (coming to consensus on your proposal). As is stands now we have nothing and are making no progress, and we can't make progress without a pull request.

@polarathene
Copy link

I do not know exactly what stalled this PR but I would recommend if you believe this one should be abandoned in favor of the code you have written then create a new PR

I was the only one that participated in review, that feedback wasn't applied. The one here has issues IIRC that were resolved in my branch.

As is stands now we have nothing and are making no progress, and we can't make progress without a pull request.

I provided a diff link to my branch earlier: master...polarathene:feat/syslog-codec

That is the equivalent of making that branch as a new PR. Once I can allocate some time to finish up the local refactoring I was working on, I can update that branch and raise an actual PR with some examples for building/testing if helpful.

My work on it is purely voluntary, no employer backing, so dedicating my spare time on it is tricky 😓

When you say that we have nothing and making no progress without a PR, are you offering to participate in review? So far no one has provided any feedback on the linked branch/diff, so if a new PR left in a WIP status is opened will anything change?

@StephenWakely
Copy link
Contributor

Apologies, this was on my radar, but it slipped by the side. I'll have a look over it as soon as I can.

If I recall, I think the main changes we would like to see would be to use semantic meanings to fill out the fields rather than have them as configuration options. This way we can use vrl to set these fields as needed.

@jcantrill
Copy link

jcantrill commented Sep 13, 2024

I provided a diff link to my branch earlier: master...polarathene:feat/syslog-codec

That is the equivalent of making that branch as a new PR

Efforts are appreciated but AFAIK it is not equivalent to a PR; I'm pretty certain you are only able to comment and provide input with a pull request. Reviewers do not get notifications about responses and changes to branch diffs

are you offering to participate in review?

I defer to those who are better qualified then myself.

If I recall, I think the main changes we would like to see would be to use semantic meanings to fill out the fields rather than have them as configuration options. This way we can use vrl to set these fields as needed.

@syedriko is ⬆️ something you can address or maybe review the referenced branch and come to consensus

@syedriko
Copy link
Contributor Author

syedriko commented Sep 14, 2024

It's been a while since I submitted this PR and I think it would be good to go back and revisit why it is the way it is. This is not supposed to be a general-purpose Vector syslog encoder. It is a port of a legacy fluentd syslog plugin. Here https://gist.github.com/syedriko/2aa01a199e8a5d4026e1c42f618e684e I summed up the configuration this PR is driven from and how it affects what goes on the wire.
There are things in this PR that are unlikely to belong in a general-purpose Vector syslog sink, such as the "$.message.<field_name>" syntax for referring to the log event fields.
What I would like to achieve is a generic syslog sink that can support the functionality of this PR. In other words, if given this PR's configuration, another configuration can be written, for the actual generic syslog sink, that achieves the same end result.

Copy link
Contributor

@StephenWakely StephenWakely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay in going over this. This will be a very useful feature once we get it through!

There are a number of changes that need to be made, mostly to do with relying on field meanings rather than configuring options. That way it does allow for more flexibility, the user can set these fields as needed message by message.

Let me know if you have any questions.

/// Facility
#[serde(default)]
#[serde(deserialize_with = "deserialize_facility")]
facility: Facility,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this field optional. If None we should attempt to get the value from the message with the meaning facility.

/// Severity
#[serde(default)]
#[serde(deserialize_with = "deserialize_severity")]
severity: Severity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this field optional. If None we should attempt to get the value from the message with the meaning severity.

Comment on lines +88 to +90
/// Payload key
#[serde(default)]
payload_key: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this option. This should always be the field with the meaning message.


/// Add log source
#[serde(default)]
add_log_source: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what this option is for, but I think it should be removed as add_log_source seems very specific. We should work out a better way to make this suitable for a wider range of usecases.


/// App Name, RFC 5424 only
#[serde(default = "default_app_name")]
app_name: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this field optional. If None we should attempt to get the value from the message with the meaning service.


/// Proc ID, RFC 5424 only
#[serde(default = "default_nil_value")]
proc_id: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is not needed.

If the message contains a field with a meaning of proc_id, we should use that field. Otherwise, this field should be the PID of Vector. https://doc.rust-lang.org/std/process/fn.id.html

We will need to update the syslog decoder so that when it adds the proc id field to the message it adds it with a meaning of proc_id.

Comment on lines +96 to +106
/// App Name, RFC 5424 only
#[serde(default = "default_app_name")]
app_name: String,

/// Proc ID, RFC 5424 only
#[serde(default = "default_nil_value")]
proc_id: String,

/// Msg ID, RFC 5424 only
#[serde(default = "default_nil_value")]
msg_id: String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this option and take the value from the event with a meaning of msgid. We should also update the decoder to insert msgid with this meaning.

String::from(NIL_VALUE)
}

fn add_log_source(log: &LogEvent, buf: &mut String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure quite what this is trying to do, but I'm pretty sure it should be removed. If users want these details as part of the message they can use vrl to construct it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephenWakely What do you mean by "field with a meaning of X"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, fields can be given a meaning. This allows us to have fields coming from different sources with different names, but we are still able to get the field by their meaning without worrying about what their name is.

For example, the syslog source creates a field called hostname, the journald source creates a field called host. Both mean the same thing, but have different names.

The Papertrail sink needs to use the host field in its message. So rather than get the field by it's name it get's it by the meaning. See here.

This can also be managed in VRL with the set_semantic_meaning function.

You'll notice in get_host that each log message can be Vector or Legacy. The concept of meaning is in transition and only applies to the Vector namespace. There is an RFC that goes into it here. A long term goal for us is to eventually remove the legacy namespace - but that is probably a long way away.

For the purposes of this PR, I would call the functions in log_event where there exists one - eg get_message. For other fields, eg procid, we can just hardcode the field names for the Legacy namespace.

eg.

        match self.namespace() {
            LogNamespace::Vector => self.get_by_meaning("procid"),
            LogNamespace::Legacy => self.get("procid"),
        }

Base it on the names set by the syslog decoder.

Hope that makes sense. Any problems, give me a shout..

@polarathene
Copy link

@StephenWakely I am a bit confused why you reviewed the PR given my prior feedback. I assume you were aware of the branch I linked? Do you want me to make a draft PR for master...polarathene:feat/syslog-codec

I handled the facility and severity better IMO, along with other improvements. I have some refactoring to return to locally that still needs to be wrapped up before I push to the branch again, it had already addressed feedback from your recent review here where I had replaced payload_key to use message.

I strongly believe the work I was doing is in a far better state to continue from. If there is promise of actual feedback / support to progress my branch further I will try allocate time to get the local changes sorted before the end of the month. Once that's done it's mostly getting some user feedback that the design works well for others and aligns with what Vector maintainers expect, we can sort out some tests and get it merged.

If you instead want to continue with development of this PR branch and ignore mine, please let me know. I may have identified some bugs in the past and resolved them, each commit in my branch has detailed history for additional context (not that I expect anyone else to read that).

@mangkoran
Copy link

mangkoran commented Sep 16, 2024

Sorry to chime in, but I have several thoughts for @polarathene and their works

I am a bit confused why you reviewed the PR given my prior feedback. I assume you were aware of the branch I linked? Do you want me to make a draft PR for master...polarathene:feat/syslog-codec

I think @jcantrill has mentioned in #17668 (comment)

I do not know exactly what stalled this PR but I would recommend if you believe this one should be abandoned in favor of the code you have written then create a new PR.

If you think your PR is in a far better state to continue from, you could proceed to open a new separate PR as @jcantrill suggested. Reviewers couldn't give proper feedback without an issue or PR ticket of the related branch. As for this PR, it was opened by @syedriko to track feedback for their branch.

By no means am I saying that one branch is better than the other, but in order for reviewers to leave proper feedback for a branch it needs to have an issue or PR ticket. This would allow reviewers to make comparison between branches and might proceed with the better one.

I have been following this feature request silently as I have no capacity to provide feedback for the code. I appreciate all individuals that are involved here and couldn't wait for the feature to be available ❤️

@polarathene
Copy link

If you think your PR is in a far better state to continue from, you could proceed to open a new separate PR as @jcantrill suggested

I had stated why I had not done so multiple times prior to that suggestion:

I could open a PR with what I have on the public branch, but I wouldn't want to waste reviewer time (my local refactoring + no time to write up a proper PR description).
Feel free to give it a glance/try and provide feedback though 👍

So far no one has provided any feedback on the linked branch/diff, so if a new PR left in a WIP status is opened will anything change?

If someone is willing to commit to providing a review, I'll open a PR for them. If a maintainer requests it, I'll open a PR for them.

Now that someone has taken time to contribute a review, they ignored my own prior review and recent comments about the work I had done with a link to the diff.

I could absolutely open a PR if they want to review it properly as well, but like I said I was hoping to just get general feedback if it was going in the right direction or if there was anything that stood out at a glance. That sort of thing doesn't necessarily require line by line review comments.


This would allow reviewers to make comparison between branches and might proceed with the better one.

I'm not sure how you're going to improve making a comparison from the link I provided vs a PR? That link is the exact same diff view you'd get from a PR. The only difference at this point is there is no separate discussion with review comment functionality.

in order for reviewers to leave proper feedback for a branch it needs to have an issue or PR ticket.

I understand this. I do reviews frequently.

It does not change the fact that there is little additional value if that PR is going to be without a helpful description, and reflect a state for review that will be refactored. I honestly did not intend for such a long delay on my end, so in hindsight while slightly redundant and wasteful to reviewer time, the feedback I may have received by now would have been useful.

It's a couple clicks for me to turn that diff link into a draft PR. I can do it as soon as someone is willing to offer the intention to provide feedback. Otherwise I think it would be more respectful to the project maintainers to review it once I can write up a proper PR description and ensure the branch itself is in a better state along with instructions for users like yourself to build and try it for additional feedback.

@jcantrill
Copy link

I can do it as soon as someone is willing to offer the intention to provide feedback

@polarathene I'm betting @StephenWakely would be kind enough to review your PR since he already provided feedback on this one. IMO, maintainers can decide if you are 'wasting their time' by choosing to review (or not) your submission. The fact is, without a PR, nothing is moving forward.

This is not supposed to be a general-purpose Vector syslog encoder.

@syedriko My original expectation was this would be a general purpose syslog encoder. This would be my preference and aligns with comments made by @StephenWakely . If @polarathene 's changes and code (and PR) accomplish this, I'm certain we can move our specific logic into VRL.

@syedriko
Copy link
Contributor Author

This is not supposed to be a general-purpose Vector syslog encoder.

@syedriko My original expectation was this would be a general purpose syslog encoder. This would be my preference and aligns with comments made by @StephenWakely .

The end result, yes, absolutely, the code in this PR as I submitted it, decidedly not, and it's the delta between the two we're converging on in this thread.

@polarathene, your branch is more elegant Rust than mine. I think this PR has served its purpose and it's time for master...polarathene:feat/syslog-codec to become the new PR and implement @StephenWakely's comments in it.

@StephenWakely
Copy link
Contributor

@syedriko thank you for your efforts and for kicking this off. If you are ok with moving ahead with @polarathene's branch then lets proceed with that.

@polarathene apologies I hadn't realised there was a competing branch. Can you create a PR from your branch and I will take a look. I will make sure to be more timely with the review this time! Can you apply any of the feedback I have made to this PR regarding using the meanings to retrieve data from the event?

@polarathene
Copy link

Can you create a PR from your branch and I will take a look. I will make sure to be more timely with the review this time!

I can open a PR yes, but it will be with an empty description with only the code itself and commit messages for context.

I'm not sure if the current public branch is in a state that was usable, my local refactoring ran into an issue I wasn't sure how to approach / resolve.

Can you apply any of the feedback I have made to this PR regarding using the meanings to retrieve data from the event?

Not at this point, I can make some time to address feedback within October, or once a PR is up apply code suggestions. However as mentioned I'd need to diff the local changes as I know with the refactor payload_key does not exist anymore.

This is why I have been hesitant open a PR to seek a proper review. I will open a draft PR for you, and in the weekend look into what local changes I can commit to push. General feedback will be appreciated.


I'd love to finish the PR sooner but have no employer at present, thus it's tricky to afford dedicating time to this over other priorities 😓

@polarathene
Copy link

polarathene commented Sep 18, 2024

I've opened a PR: #21307

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding) domain: sinks Anything related to the Vector's sinks meta: awaiting author Pull requests that are awaiting their author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants