LogAnalyzer

#Test component name

#SONiC LogAnalyzer

Related documents

N/A

###Scope

The scope is to be able to analyze log content produced by SONIC during the run of PTF SONIC tests. The analysis will be performed after running each PTF test.

##Overview

PTF tests are running on a host machine connected to the DUT switch, and perform functional testing and validation of the SONIC. As part of test outcome validation, we also need to validate all the logs produced by SONIC system to check for any possible errors reported.

We will introduce new test component, LogAnalyzer, written in Python language, which will be performing log analysis.

Ansible script for deployment of LogAnalyzer

There will be an Ansible script orchestrating whole test setup and run. The script will setup DUT, and PTF host. In respect to log analysis it'll perform following actions:

Deploy PTF test on test host
Deploy LogAnalyzer and all needed files on switch DUT.
Run LogAnalyzer to add start-marker
Start PTF test
After PFT test finishes, collect the test report
Run LogAnalyzer to add end-marker
Run LogAnalyzer to perform analysis
Collect the report output and provide test summary

The Ansible script will deploy LogAnalyzer as well as all configuration files needed by it to run.

####Log Files to be analyzed SONIC is a collection of several executables, logging into various locations. Currently logging is done in

/var/log/syslog
/var/log/bgpd.log
/var/log/zebra.log

LogAnalyzer will be parametrized to accept a list of log files it'll need to analyze, to allow any set of files to be specified in general.

####Log rotation To avoid log rotation happening while tests are running, Ansible script will perform log rotation at initialization.

###LogAnalyzer functionality

LogAnalyzer will perform analysis of logs produced by running SONIC system. LogAnalyzer will generate output file with analysis summary, which will be consumed by the Ansible script.

Output produced by LogAnalyzer will be consumed by the Ansible script. LogAnalyzer will analyze portion of the logs during which PTF test has ran.

####Analysis Markers

To analyze only logs produced during test run, the Ansible script will invoke LogAnalyzer before starting test activity, so LogAnalyzer would know the starting point of the analysis. On invocation LogAnalyzer will write a unique start-marker into the log files. This will define starting point of log analysis. All logs following after start-marker will be considered for analysis.

When PTF test finishes, Ansible script will invoke LogAnalyzer to put unique end-marker into the log file. This defines ending point of log analysis. Any messages following end-marker will not be considered during log analysis.

#####Marker format

Start Marker format: "start-LogAnalyzer-runID"

End marker format: "end-LogAnalyzer-runID"

'runID' - is a string passed in by the caller which identifies the current run. Ansible script will provide the runID.

The runID will be concatenation of test name and time-date (with resolution including seconds) stamp to uniquely identify test and running date.

When instructed to perform analysis, LogAnalyzer will iterate over specified log files and

find start-marker
perform analysis
stop analysis once discovered end-marker.

####Analysis of log messages

#####Set of messages to match for

As part of input LogAnalyzer will be provided with file representing the set of types of messages to look for and detect as errors or warnings. The file will contain a set of regular expressions, log.analysis.match.set.txt. The file will be placed on the DUT as part of setup and path provided to LogAnalyzer at startup.

For each PTF test there will be provided its own version of the file with content specific for the test. The file will be authored by the test developer.

The messages will contain following keywords: ERROR, ERR, WARN, WARNING, panic.

When LogAnalyzer scans the log files it will match log content against the marching set. All matching messages will be collected in a separate file, log.analysis.result.log.

#####Ignore list for non-relevant messages

SONIC consists of various components which perform logging. Some of the components will BGP related logs, some other components log about learned IP addresses. During runtime, any of the SONIC components can asynchronously output an error or warning message, which is not related to the current test being ran. For example, while running trap tests, there can be error messages about learned IP addresses. Such messages should be ignored, otherwise we'll face many false positives.

To address this issue we will provide data file to LogAnalyzer where all non-relevant messages will be listed, as regular expressions - log.analysis.ignore.set.txt.

Another example of such messages are platform specific messages.

For each PTF test there will be its private version of the file with the ignore content. This file will be created along with the PTF test development.

Analysis of a logs

For each log file LogAnalyzer will analyzer all messages between start-marker and end-marker in following way:

Compare message against expressions in log.analysis.ignore.set.txt.
- if message matches - skip it.
otherwise, compare against set of expressions in log.analysis.match.set.txt
- if messages matches collect message into log.analysis.result.log

####Return value When LogAnalyzer finishes it will return number of detected messages matching regex expressions. value 0 will mean no concerning messages detected.

Summary of log analysis

LogAnalyzer will produce a log.analysis.summary.log with summary of found issues. The summary will be collected by the Ansible script and presented to user after test runs.

The format will be:

LOG ANALYSIS SUMMARY:

MESSAGES DETECTED: NUMBER

DUMP file

If LogAnalyzer detects errors in logs, it will need to also collect a DUMP file representing full content of SAI and underlying SDK.

Backup of logs in case of errors/warnings

In event of identifying errors LogAnalyzer will return non-0 value to the invoking Ansible script, which will in turn create backup of log files for post-mortem analysis.

#####LogAnalyzer Command line parameters

Log analyzer will run in 2 modes: to initialize analysis, and to perform analysis.

######Initialize analysis

Ansible script will invoke LogAnalyzer and instruct to insert start-marker into all log files.

The parameters specified for this phase will be:

action="init"
log_file_list=path,{path}
start_marker="string"

######Analysis phase

After PTF test runs and finishes, the Ansible script will invoke LogAnalyzer to perform analysis of the logs.

action='analyze'
'analysis_src=path' - Path to
1. log.analysis.match.set.txt
2. log.analysis.ignore.set.txt
Destination path where following files will be placed:
1. 'analysis_destination=path'
  1. log.analysis.result.log
  2. log.analysis.summary.log
  3. Backed up log file(s).
log_file_list=path,{path} 1.List of log files to analyze.
start_marker="string"
end_marker="string"

#####Interaction of PTF test with LogAnalyzer PTF tests will not directly interact with LogAnalyzer, because this function will be performed by Ansible script.

Interaction of Ansible script with LogAnalyzer

Ansible script will provide LogAnalyzer with input all input parameters. The Ansible script will create ssh session to DUT through which it will:

invoke LogAnalyzer, before starting PTF test - to put start-marker into log file.
invoke LogAnalyzer at test end to put end-marker
invoke LogAnalyzer to perform analysis and create output file. LogAnalyser return code will indicate whether any errors were found.
copy all output files produced by LogAnalyzer from DUT.

If LogAnalyzer returns non-0 value

the script will back up all test files and logs from DUT to a directory which will have same name as runID (see definition of the start marker).
the content of log.analysis.summary.log will be printed to the console, following output of PTF test.

Sample

command line

python /tmp/loganalyzer.py --action analyze --run_id fib.2016-12-01.22-16-24 --out_dir /tmp/test_results -m test_match_messages.txt,common_match.txt

file test_match_messages.txt

s, "removeNeighbor: Neighbor is still referenced"

file common_match.txt

r, ".ERR", ".WARN", "crash"

r, "kernel:.*Oops", "kernel:.*hung", "kernel.*oom\s"

r, "kernel:.*scheduling", "kernel:.*atomic", "kernel:.*panic"

r, "kernel:.*\serr", "kernel:.*allocation", "kernel:.*kill",

r, "kernel:.kmemleak.","kernel:.* Err:"

s, "ERR"

Sample output

Result file:

-----------Matches found in file:/var/log/syslog-----------

Dec 2 00:10:15 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Dec 2 00:10:16 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Dec 2 00:10:17 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Dec 2 00:10:18 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Matches:4 Total matches:4

Summary file:

LOG ANALYSIS SUMMARY

FILE: /var/log/syslog MATCHES 4

TOTAL MATCHES: 4

###Scale / Performance

LogAnalyzer is not intended to measure Scale/Performance, but as a result of log analysis it may report performance related error conditions appeared during a test run.

LogAnalyzer should not cause noticeable degradation of performance of the DUT during analysis.

###Related DUT CLI commands

###Related DUT configuration files

DUT will be provided with configuration files for LogAnalyzer to operate on. The files are specified in the section 'Analysis of log messages'. In summary those are:

###Related SAI APIs

LogAnalyzer will analyze messages from SONIC, not targeting SAI. SAI and messages will not be analyzed.

###Output from LogAnalyzer Please refer to 'Analysis of log messages' section above.

##Test structure

###Setup configuration

PTF host needs to be connected to a port through which it will send packets to the switch, and needs to have connection via ports through which the switch will send forward received packet back to the host for validation.

###Configuration scripts

Test cases

LogAnalyzer is a test tool, which will be invoked by a running PTF test. The test cases will be defined in Test plan document for specific PTF tests, not in this document.

Test case #1

Home
FAQ
For Users
For Developers
Sonic Roadmap Planning
Report Issues
Subgroups/Working Groups
Presentations
- OCP Summit 2016
- OCP Summit 2017
Community Meetings
HLD Review Meetings
Join Us
- Mailing list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly