Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for metadata logs and display path #1007

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mikaylathompson
Copy link
Collaborator

Description

We saw in customer testing yesterday that not having the metadata logs path easily available complicates user debugging.
This PR checks for a new metadata migration log from the python code and prints it out in the output.

Results:
   Issue(s) detected
Issues:
   Unexpected failure: Couldn't open the repo directory for some reason

ERROR:console_link.models.metadata:Metadata migration failed: Command '['/root/metadataMigration/bin/MetadataMigration', '--otel-collector-endpoint', 'http://otel-collector:4317', 'migrate', '--snapshot-name', 'snapshot_2023_01_01', '--target-host', 'https://opensearchtarget:9200', '--min-replicas', '0', '--file-system-repo-path', '/snapshot/test-console', '--target-username', 'admin', '--target-password', '********', '--target-insecure']' returned non-zero exit status 120.
Metadata migration failed: Command '['/root/metadataMigration/bin/MetadataMigration', '--otel-collector-endpoint', 'http://otel-collector:4317', 'migrate', '--snapshot-name', 'snapshot_2023_01_01', '--target-host', 'https://opensearchtarget:9200', '--min-replicas', '0', '--file-system-repo-path', '/snapshot/test-console', '--target-username', 'admin', '--target-password', '********', '--target-insecure']' returned non-zero exit status 120.

Logs for this run are available at /shared-logs-output/1b494f5645df/metadata/metadata_2024-09-24_15-42-34.log
(.venv) sh-5.2# 

Issues Resolved

n/a

Testing

Manual (aghgh, I should add a unit test)

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Sep 24, 2024

Codecov Report

Attention: Patch coverage is 70.83333% with 7 lines in your changes missing coverage. Please review.

Project coverage is 80.21%. Comparing base (7159aea) to head (7ad3f4a).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...e/lib/console_link/console_link/models/metadata.py 70.83% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1007      +/-   ##
============================================
- Coverage     89.63%   80.21%   -9.42%     
- Complexity        0     2721    +2721     
============================================
  Files            43      365     +322     
  Lines          2364    13614   +11250     
  Branches          0      939     +939     
============================================
+ Hits           2119    10921    +8802     
- Misses          245     2122    +1877     
- Partials          0      571     +571     
Flag Coverage Δ
gradle-test 78.23% <ø> (?)
python-test 89.81% <70.83%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Mikayla Thompson <[email protected]>
Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a great idea.

I'd recommend some changes though.

  1. always print the filename of the log file at the beginning of the process (not all errors may be exceptions that kill the program).
  2. Use the pid in the filename - I think that ${java:pid} should work (see stackoverflow). That would make which logfile is relevant unambiguous (all you would need to know is the pid).
  3. All of the console invocations of other programs should do this (e.g. CreateSnapshot)
  4. We should be certain of the filepath that we're showing.
  • If the python wrapper code needs to know where the logfiles are, it should make sure that it's looking in the right spot by specifying it and passing it to the program as it is launched. the log4j2 properties file would take an optional environment variable of the console program's logbase directory - not the pid part though. You'd still want to cons the pid variable in the properties file.
  • In lieu of passing the directory, you could also have the console program print out its log location, though that doesn't make sense for every invocation (outside the console) and when it changes, this migration-console cli experience will break. Below is the code to use log4j2 directly (not slf) to get the filenames for the appenders.
       LoggerContext context = (LoggerContext) LogManager.getContext(false);
       Configuration config = context.getConfiguration();

       // Loop through all appenders
       config.getAppenders().forEach((name, appender) -> {
           if (appender instanceof FileAppender) {
               // Print the log file location
               FileAppender fileAppender = (FileAppender) appender;
               System.out.println("Log file location: " + fileAppender.getFileName());
           }
       });

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change, this is a nice touch to help with debugging intuitively in the migration console.

This seems like an case where the program writing the logging output is better suited to know where the log file is located rather than in the console link application.

return self._run_command_and_build_result(command_runner)


def get_latest_metadata_logs_path() -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method creates it creates coupling between this code and the logger code without an interface, if the behavior changed this code can broken with out an obvious reason. As Greg mentions there are ways we can determine from the logging system where the log file path is at that would prevent this cross-language coupling that are better served by being in the java side ecosystem.

What do you think about shifting this solution in that space?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the comment about the program knowing where logs are being written, though I do agree with the sentiment that the current situation should be unambiguous. Right now, none of the application code for any of the libraries or applications needs a hard dependency on Log4J2. Nearly every class, main ones included have dependencies on Slf4j. Log4J2 is bound into Slf4j at deployment/runtime. Having application code reach back into the logging system to find out where its logs live requires a much bigger interface. Furthermore, what if a deployment of an application is using a network logger (like what we previously had done w/ OTEL). That wouldn't be a filename at all. What would the expectations be for that code? How would one control whether that code would or wouldn't be activated to show the user something.

I see the log locations and the logging system that we're using entirely as a dependency that's injected. That's why I was recommending configuring the file outside the process in the process's configuration. If we want logs to be harmonized for apps that run on the console, we'll probably need that to be managed by the thing invoking the application. In that case, if the caller knew anyway, why require the application to also know?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants