Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syslog drain app error messages in app log stream #633

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

corporatemax
Copy link

@corporatemax corporatemax commented Nov 25, 2024

Description

This change should resolve #579 .

I pulled the SyslogConnector.emitLoggregatorErrorLog function into the app_log_emitter.go file and moved its dependent variables accordingly and capsuled it with the AppLogEmitter struct.
Furthermore, I used the AppLogEmitter in the RetryWriter and the WriterFactory to provide error messages to the Application Developer who tries to configure a syslog drain but does not know why there are no logs showing up in the syslog drain target.
I did this also in the FilteredBindingFetcher as there missconfigurations leads to errors there.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

If you have any questions, or want to get attention for a PR or issue please reach out on the #logging-and-metrics channel in the cloudfoundry slack

Copy link

linux-foundation-easycla bot commented Nov 25, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corporatemax Thanks for the PR and the time invested in it. Generally it looks good. I have only few minor comments and I would wait for the other approvers to tell what they think.

src/pkg/egress/syslog/loggregator_emitter.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/loggregator_emitter.go Outdated Show resolved Hide resolved

// WriteLog writes a message in the application log stream using a LogClient.
func (appLogEmitter *LoggregatorEmitter) WriteLog(appID string, message string) {
if appLogEmitter.logClient == nil || appID == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should write a warning in the Syslog Agent's logs that the emitter is not set and the app dev won't know that something went wrong... On the other hand, I understand that this is only to safeguard the EmitLog call as everything is initialized properly in the syslog_agent.go cmd.

I feel like if we don't write a log message for the syslog agent, we are covering the misconfiguration... Of course, this might not be the write place as the log will be written on every WriteLog call.

@ctlong Any thoughts on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the location of this file is sufficient as I am also calling it in the filtered binding fetcher. Should I move this in its separate package? If yes where exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as is for now until we think of something better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chombium it sounds like you're worried that we do not have any mechanisms to prevent future misconfiguration of the AppLogEmitter when it's initialized in syslog_agent.go, is that right?

I think we could adequately cover this case with a new test in syslog_agent_test.go. That would seem like a valuable addition to me.

src/pkg/egress/syslog/retry_writer_test.go Outdated Show resolved Hide resolved
"golang.org/x/net/context"
"log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this reformatting? If so please run golangci-lint run ./... in the /src folder

Copy link
Author

@corporatemax corporatemax Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this was goland - I ran the formatter on my side and commited the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very strange to me. Usually, the standard libraries are grouped at the top... can you please double-check the formatting here, thanks

src/pkg/ingress/bindings/filtered_binding_fetcher_test.go Outdated Show resolved Hide resolved
src/pkg/ingress/bindings/filtered_binding_fetcher_test.go Outdated Show resolved Hide resolved
src/pkg/ingress/bindings/filtered_binding_fetcher_test.go Outdated Show resolved Hide resolved
@@ -138,7 +122,7 @@ func (w *SyslogConnector) Connect(ctx context.Context, b Binding) (egress.Writer
w.droppedMetric.Add(float64(missed))
drainDroppedMetric.Add(float64(missed))

w.emitLoggregatorErrorLog(b.AppId, fmt.Sprintf("%d messages lost for application %s in user provided syslog drain with url %s", missed, b.AppId, anonymousUrl.String()))
w.loggregatorEmitter.WriteLog(b.AppId, fmt.Sprintf("%d messages lost for application %s in user provided syslog drain with url %s", missed, b.AppId, anonymousUrl.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a system/operator message and we should adjust the text as we have a concrete App for which some logs are lost. IMO x messages lost for syslog drain with url y is enough. On the other hand if the same syslog drain url is used for multiple apps the message would be clearer this way. Though, the app id will be sent anyhow as label/tag of the log. I'm not totally sure if we should change the message.

@ctlong any thought on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the final log message output to the user in this case? I think the log client, or some of the other logic in AppLogEmitter may prefix the message with some information. In any case, you're right, we should evaluate the full log envelope and see if anything should be tweaked to ensure that it is legible to an app developer.

@corporatemax corporatemax changed the title Syslog drain app error messages Syslog drain app error messages in app log stream Nov 26, 2024
@corporatemax corporatemax marked this pull request as ready for review November 27, 2024 10:53
@corporatemax corporatemax requested a review from a team as a code owner November 27, 2024 10:53
@corporatemax corporatemax force-pushed the syslog-drain-app-error-messages branch from bf6dd19 to 9f789db Compare December 2, 2024 08:39
chombium
chombium previously approved these changes Dec 2, 2024
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corporatemax Thanks for looking into the review and making adjustments. I'm happy with the how the things look like.

I wonder what @ctlong has to say about this PR.

@chombium
Copy link
Contributor

chombium commented Dec 2, 2024

@corporatemax Can you please check the unit tests? Do they fail because they run as GH Action or it's something else?

@corporatemax
Copy link
Author

corporatemax commented Dec 3, 2024

@chombium the test states

unexpected response from internal bindings endpoint. status code: 404

I assume this test is flaky regarding the opened ports.
Can you please rerun the unittests?

@chombium
Copy link
Contributor

chombium commented Dec 6, 2024

@corporatemax I was able to reproduce the error locally on Ubuntu Linux 20.04 LTS when running both all the tests and only the failing test suite.

All tests run:

go run github.com/onsi/ginkgo/v2/ginkgo -r --randomize-all --randomize-suites --fail-on-pending --keep-going --race --trace

and running the single suite:

go run github.com/onsi/ginkgo/v2/ginkgo -r --randomize-all --randomize-suites --fail-on-pending --keep-going --race --trace --focus "Syslog Agent App Suite"

At first I thought that we have some problem when all of the tests run together, but the problem is isolated, but it seems that the the problem is within the Syslog Agent App Suite.

I've also tried running single test like:

go run github.com/onsi/ginkgo/v2/ginkgo -r  --fail-on-pending --keep-going --race --trace  --focus "generates metrics" cmd/syslog-agent/app

and that run properly.

It's seems to me there are problems with how the tests are setup and run when running the whole Syslog Agent App Suite

@chombium chombium dismissed their stale review December 6, 2024 14:58

I've rerun the tests as they run in the GitHub action and I could reproduce the problem. Therefore I will revoke my approval until all test are running properly

}
}

func (f WriterFactory) NewWriter(ub *URLBinding) (egress.WriteCloser, error) {
func (f WriterFactory) NewWriter(ub *URLBinding, emitter AppLogEmitter) (egress.WriteCloser, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the utility of providing WriterFactory with an AppLogEmitter and passing another AppLogEmitter to WriterFactory.NewWriter()? It seems to me like they are the same AppLogEmitter and so could be de-duped, but maybe I'm missing something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, you are right. When I introduced the app log emitter I ran down multiple execution strings and so it came that the WriterFactory got two instances.

@corporatemax corporatemax force-pushed the syslog-drain-app-error-messages branch from fb4fc55 to 7336dcf Compare January 8, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending Merge | Prioritized
Development

Successfully merging this pull request may close these issues.

Improve application error logs about Syslog drain problems
3 participants