Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python DB Logger #136

Merged
merged 29 commits into from
Nov 1, 2023
Merged

Add Python DB Logger #136

merged 29 commits into from
Nov 1, 2023

Conversation

rtdtwo
Copy link
Contributor

@rtdtwo rtdtwo commented Mar 7, 2023

What does this change do?

This change adds Python DB Logger module to Nacculator, to log directly to our logging database.

Why was this change made?

This change fixes #134

Verification

  1. First, copy the example env file:
    cp db_logger.env.example .env
  2. Then, start a MySQL instance. This will create the database and the relevant tables and set up Adminer for easy database visualization.
    docker compose -f .containers/mysql/docker-compose.yml up -d
  3. Run Nacculator with the flags and configurations of your choice.
  4. Once Nacculator has run successfully, you should find a report.xls file in Nacculator's root directory.
  5. Open the XLS file and verify that the expected logs (Success, Skip, Failure) are logged.
  6. Next, go to Adminer (http://localhost:8080). DB logger database is already configured. Login with the database credentials found in the env file.
  7. Verify the table detail_log has the logs.

Both the database and XLS will have the same logs you would expect from the stdout and stderr file outputs.

No other configurations are needed, DB Logger has already been set up to start logging automatically when Nacculator script starts running.

Affirmations

  • I matched the style of the existing code.
  • I added and updated any relevant documentation (inline, README, CHANGELOG, and such).
  • I used Python's type hinting.
  • I ran the automated tests and ensured they ALL passed.
  • I ran the linter and ensured my changes have not introduced ANY warnings or errors.
  • I have made an effort to minimize code changes and have not included any cruft (preference files, *.pyc files, old comments, print-debugging, unused variables).
  • I have made an effort maintain a clear commit history (haven't merged other branches or rebased improperly).
  • I have written the code myself or have given credit where credit is due.

@rtdtwo rtdtwo requested a review from s-emerson March 7, 2023 22:13
@rtdtwo rtdtwo self-assigned this Mar 7, 2023
README.md Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@s-emerson
Copy link
Contributor

I am unable to install the db_logger package using the directions in https://github.com/ctsit/python_db_logger . I'll need help getting this set up and working when you get a chance, so that I can test the code.

@rtdtwo
Copy link
Contributor Author

rtdtwo commented Mar 23, 2023

@s-emerson Everything works as expected in a local run.

db_logger.env.example Outdated Show resolved Hide resolved
Copy link
Contributor

@michael-bentz michael-bentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be easier to test after updating docker-compose.yml.

.containers/mysql/docker-compose.yml Outdated Show resolved Hide resolved
@rtdtwo
Copy link
Contributor Author

rtdtwo commented Apr 5, 2023

@s-emerson I have added adminer which should be accessible at http://localhost:8080 once the containers are up. Docker should also automatically create all tables. A local run confirms all of this works. Please try running Nacculator again and let me know if you face any issues.

@rtdtwo
Copy link
Contributor Author

rtdtwo commented Apr 7, 2023

@s-emerson Updated the details steps in "verification" section of the pull request.

@s-emerson
Copy link
Contributor

i was able to run nacculator with dblogger installed, though i still had issues with the local-only installation that i think will make this fail for anyone who is not part of uf (especially with python_db_logger as a private repo).

the logging is not keeping track of warnings that nacculator catches. I have attached screencaps of the regular error output vs the logged csv file.
Screen Shot 2023-04-12 at 11 31 25 AM
Screen Shot 2023-04-12 at 11 31 15 AM

i am also still getting an issue in the output ivp.txt meant to submit to nacc-
"INSERT INTO detail_log (script_name, script_run_time, message, level, process_id)VALUES('db_logger.py', '2023-04-12 15:27:13.074792','[START] ptid: 110001','INFO', '44191')
CRITICAL DB ERROR! Logging to database not possible!: 1146 (42S02): Table 'log_db.detail_log' doesn't exist"

@rtdtwo
Copy link
Contributor Author

rtdtwo commented Apr 13, 2023

@s-emerson I added the warning logs, let me check why the critical db error shows up. It never showed up for me but I am working on a database with tables already created, that could be a reason why.

@rtdtwo
Copy link
Contributor Author

rtdtwo commented Apr 17, 2023

@s-emerson I did a run on a clean setup following the steps mentioned in this PR, however, I am unable to reproduce the issue. Can you try deleting the docker container and all its images/volumes and try again with the latest code?

@s-emerson
Copy link
Contributor

@s-emerson What are the entry point scripts for nacculator? For example in the case of pubingest, it's https://github.com/ctsit/ufvivo-pubingest/blob/master/ufvivo/pubingest/__main__.py.

As far as I'm aware, the entry point script for nacculator is https://github.com/ctsit/nacculator/blob/master/nacc/redcap2nacc.py (which has a function "main" at the bottom). There's also the filters, which run using https://github.com/ctsit/nacculator/blob/master/nacc/run_filters.py (this is the script that does the api call to redcap and automatically downloads the project data).

@sinarest1608
Copy link
Contributor

@mbentz-uf
I have added the changes.

@s-emerson
Can you try running it again? You need not run the docker container since Report Handler does not require a mysql connection.

@s-emerson
Copy link
Contributor

@sinarest1608 i tried running it again with the doctor container down and i got the same error.

@sinarest1608
Copy link
Contributor

@s-emerson
Can you list the steps to run nacculator so that I can try replicating the issue?

@s-emerson
Copy link
Contributor

@sinarest1608

with an API token from the REDCap project pasted into the nacculator_cfg.ini file and the redcap_server host set to our REDCap instance:

$ git checkout feature/ADRC-2460-add-db-logger
$ git pull
$ source .venv/bin/activate
$ pip3 install .
$ nacculator_filters nacculator_cfg.ini
$ redcap2nacc -fvp <run_08-24-2023/final_Update.csv >run_08-24-2023/fvp_nacc_complete.txt

though this also fails when you try to run nacculator_filters nacculator_cfg.ini with the issue:
AttributeError: module 'logging' has no attribute 'Handlers'

@sinarest1608
Copy link
Contributor

@s-emerson

  1. From what project should I generate the API token to test this? Is there a test project for that?
  2. How do I set this?

the redcap_server host set to our REDCap instance

  1. Would running this impact any data I should take care of?

@sinarest1608
Copy link
Contributor

sinarest1608 commented Sep 7, 2023

@s-emerson
I have fixed report_handler and It should work as expected now.

@s-emerson
Copy link
Contributor

i gave nacculator a try by using nacculator_filters to download from the redcap api and then running redcap2nacc with the -ivp, -fvp, and -m flags.

everything works fine, i didn't get any issues with the ivp_complete.txt or ivp_errors.txt files (and so on), which is great. so nacculator retains its previous functionality.

however, when logging errors to the spreadsheet report-2023-09-08 (for example), i notice that several errors come back listed as "Unknown" instead of the error message that nacculator outputs to stderr. it looks like this is because they are caused by invalid data (like missing visit date) that don't normally get added to the list of errors that nacculator outputs and are instead printed by python itself. is there a way to also add these to the log?

i ran into a similar issue with the -m flag, except all of the errors were "Unknown" even when nacculator had a log message for the actual issue (milestone date was out of range, which i expected to see).

@sinarest1608
Copy link
Contributor

@s-emerson
I ran naacculator and it was terminated with this error
image

@s-emerson
Copy link
Contributor

@s-emerson I ran naacculator and it was terminated with this error image

This error is caused by an issue with the data and should be resolved. You might need to merge in the changes from develop in order for this to skip properly if it pops up again.

@sinarest1608
Copy link
Contributor

i gave nacculator a try by using nacculator_filters to download from the redcap api and then running redcap2nacc with the -ivp, -fvp, and -m flags.

everything works fine, i didn't get any issues with the ivp_complete.txt or ivp_errors.txt files (and so on), which is great. so nacculator retains its previous functionality.

however, when logging errors to the spreadsheet report-2023-09-08 (for example), i notice that several errors come back listed as "Unknown" instead of the error message that nacculator outputs to stderr. it looks like this is because they are caused by invalid data (like missing visit date) that don't normally get added to the list of errors that nacculator outputs and are instead printed by python itself. is there a way to also add these to the log?

i ran into a similar issue with the -m flag, except all of the errors were "Unknown" even when nacculator had a log message for the actual issue (milestone date was out of range, which i expected to see).

@s-emerson
I looked into report_handler implementation and everything seems to be working as expected. I tried using different flags you mentioned and could see "Unknown" in place of the error messages. I looked through the implementation for python_db_logger and it also had the same placeholder for the error messages. I also checked the console logs and there is no specific error messages for these PTIDs. For example:
Screenshot 2023-10-30 at 7 15 15 PM

@sinarest1608
Copy link
Contributor

@s-emerson
Upon investigating further, I found that the errors with reason "Unknown" were actually exceptions (example the invalid data that you mentioned) that were begin caught but not printed. I have added the changes so that the report now has the exception messages. It should work as expected.

@s-emerson
Copy link
Contributor

the code works well and the logs print out nicely in .xlsx format for all flags.

one thing i notice is that the error printouts in the .xlsx log file don't have as much detail as the printouts in stderr. for example, for one ptid, the stderr printout said "Length of field DRUGID in form A4D with value "example error" is not valid. 11 != 6 which gives detail about the form the error is in, which field has the issue, and the value that is causing the issue. but in the log output, all that it says is "Assertion failed".

can you have it print out the actual error message that nacculator gives? or is the .xlsx log supposed to be more general?

@s-emerson
Copy link
Contributor

The code looks good and works well. Thank you, Kshitij!

@s-emerson s-emerson merged commit 446c3ba into develop Nov 1, 2023
@s-emerson s-emerson deleted the feature/ADRC-2460-add-db-logger branch November 1, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants