Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging of dev branch looks complete #2810

Merged
merged 46 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
bed9a5a
update to materialized views testing scripts
sgoggins May 10, 2024
5384e0f
Update build_docker.yml
sgoggins May 14, 2024
16f6d6b
commented out the rebuilding of the dm_ tables. This should be rebuil…
sgoggins May 21, 2024
fcbb819
Merge pull request #2804 from chaoss/main
sgoggins May 21, 2024
2c8e856
checking pr file errors with more logging
sgoggins May 24, 2024
04b9987
more debugging of pr files
sgoggins May 24, 2024
8ad033f
update
sgoggins May 24, 2024
d66bf45
debugging files
sgoggins May 24, 2024
72b28a7
resorting to traceback
sgoggins May 24, 2024
e13776c
fixing error
sgoggins May 24, 2024
023477f
logging removeal
sgoggins May 29, 2024
3de5ba9
use correct get active repo count
ABrain7710 Jun 3, 2024
0c5d734
Pass correct thign
ABrain7710 Jun 3, 2024
2979091
Fix
ABrain7710 Jun 4, 2024
f8217ed
Pass DatabaseSession
ABrain7710 Jun 4, 2024
fc686e5
fixing scorecard
sgoggins Jun 4, 2024
7b914d7
fixing augur startup issue
sgoggins Jun 4, 2024
e2217e5
updated for facade repo out of sync error
sgoggins Jun 4, 2024
c2ee07c
fixing dependency logic issue
sgoggins Jun 4, 2024
c3f4eaa
node dependency checker
sgoggins Jun 4, 2024
90b149c
version update
sgoggins Jun 4, 2024
30b325d
update
sgoggins Jun 4, 2024
0f00cde
fixing
sgoggins Jun 4, 2024
e7d305d
fixing open ssf score card
sgoggins Jun 4, 2024
2d945ca
possibly fixed PR worker task
sgoggins Jun 4, 2024
84b7fea
fixing missing task manifest info
sgoggins Jun 4, 2024
f3002c1
fixing random key auth
sgoggins Jun 4, 2024
6b2667a
can you overload __init__ in Python?
sgoggins Jun 4, 2024
082d5d2
updating things
sgoggins Jun 4, 2024
6d25bd9
trying a hack
sgoggins Jun 4, 2024
85e7b2d
hacking
sgoggins Jun 4, 2024
da9952b
update
sgoggins Jun 4, 2024
e69f65a
added Repo object
sgoggins Jun 4, 2024
bf04ebc
messages fix
sgoggins Jun 4, 2024
d8ccbba
Merge remote-tracking branch 'origin/pr-file-patch' into dev-fixes
sgoggins Jun 4, 2024
18ef774
update session
sgoggins Jun 5, 2024
ad1bc2d
update for colelctionstatus
sgoggins Jun 5, 2024
22312a7
fixing pr files
sgoggins Jun 5, 2024
2c4341d
files model
sgoggins Jun 5, 2024
8a97792
pr fix
sgoggins Jun 5, 2024
b6adf94
update to message collection
sgoggins Jun 5, 2024
7989c12
we never get clones data so I commented it out. You need an API key …
sgoggins Jun 5, 2024
33890da
Updating version
sgoggins Jun 5, 2024
ed6f5e1
updated Dockerfile version info
sgoggins Jun 5, 2024
7cb9d97
dependencies fix
sgoggins Jun 5, 2024
744bcf5
increaing db sleep due to errors like /home/sean/github/rh-k12/augur/…
sgoggins Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/build_docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ on:
push:
branches:
- main
- dev
pull_request:
branches:
- main
- dev
release:
types:
- published
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Augur NEW Release v0.70.0
# Augur NEW Release v0.71.0

Augur is primarily a data engineering tool that makes it possible for data scientists to gather open source software community data. Less data carpentry for everyone else!
The primary way of looking at Augur data is through [8Knot](https://github.com/oss-aspen/8knot) ... A public instance of 8Knot is available at https://metrix.chaoss.io ... That is tied to a public instance of Augur at https://ai.chaoss.io
Expand All @@ -10,7 +10,7 @@ The primary way of looking at Augur data is through [8Knot](https://github.com/o
## NEW RELEASE ALERT!
### [If you want to jump right in, updated docker build/compose and bare metal installation instructions are available here](docs/new-install.md)

Augur is now releasing a dramatically improved new version to the main branch. It is also available here: https://github.com/chaoss/augur/releases/tag/v0.70.0
Augur is now releasing a dramatically improved new version to the main branch. It is also available here: https://github.com/chaoss/augur/releases/tag/v0.71.0

- The `main` branch is a stable version of our new architecture, which features:
- Dramatic improvement in the speed of large scale data collection (100,000+ repos). All data is obtained for 100k+ repos within 2 weeks.
Expand Down
27 changes: 22 additions & 5 deletions augur/application/logs.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#SPDX-License-Identifier: MIT

Check warning on line 1 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 C0114: Missing module docstring (missing-module-docstring) Raw Output: augur/application/logs.py:1:0: C0114: Missing module docstring (missing-module-docstring)
from __future__ import annotations
import logging
import logging.config
Expand All @@ -24,9 +24,9 @@
ERROR_FORMAT_STRING = "%(asctime)s [PID: %(process)d] %(name)s [%(funcName)s() in %(filename)s:L%(lineno)d] [%(levelname)s]: %(message)s"

# get formatter for the specified log level
def getFormatter(logLevel):

Check warning on line 27 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1710: Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements) Raw Output: augur/application/logs.py:27:0: R1710: Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)

if logLevel == logging.INFO:

Check warning on line 29 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1705: Unnecessary "elif" after "return", remove the leading "el" from "elif" (no-else-return) Raw Output: augur/application/logs.py:29:4: R1705: Unnecessary "elif" after "return", remove the leading "el" from "elif" (no-else-return)
return logging.Formatter(fmt=SIMPLE_FORMAT_STRING)

elif logLevel == logging.DEBUG:
Expand All @@ -36,12 +36,29 @@
return logging.Formatter(fmt=ERROR_FORMAT_STRING)

# create a file handler and set the format and log level
def create_file_handler(file, formatter, level):
handler = FileHandler(filename=file, mode='a')
handler.setFormatter(fmt=formatter)
handler.setLevel(level)
# def create_file_handler(file, formatter, level):
# handler = FileHandler(filename=file, mode='a')
# handler.setFormatter(fmt=formatter)
# handler.setLevel(level)

# return handler

return handler
def create_file_handler(file, formatter, level):
try:
# Ensure the directory exists
directory = os.path.dirname(file)
if not os.path.exists(directory):
os.makedirs(directory)

# Create the file handler
handler = logging.FileHandler(filename=file, mode='a')
handler.setFormatter(formatter)
handler.setLevel(level)

return handler
except Exception as e:
print(f"Failed to create file handler: {e}")
return None

# function to create two file handlers and add them to a logger
def initialize_file_handlers(logger, file, log_level):
Expand Down Expand Up @@ -70,7 +87,7 @@

def get_log_config():

from augur.application.db.engine import DatabaseEngine

Check warning on line 90 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 C0415: Import outside toplevel (augur.application.db.engine.DatabaseEngine) (import-outside-toplevel) Raw Output: augur/application/logs.py:90:4: C0415: Import outside toplevel (augur.application.db.engine.DatabaseEngine) (import-outside-toplevel)

# we are using this session instead of the
# DatabaseSession class because the DatabaseSession
Expand Down Expand Up @@ -98,8 +115,8 @@
return section_dict


#TODO dynamically define loggers for every task names.

Check warning on line 118 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0511: TODO dynamically define loggers for every task names. (fixme) Raw Output: augur/application/logs.py:118:1: W0511: TODO dynamically define loggers for every task names. (fixme)
class TaskLogConfig():

Check warning on line 119 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 C0115: Missing class docstring (missing-class-docstring) Raw Output: augur/application/logs.py:119:0: C0115: Missing class docstring (missing-class-docstring)
def __init__(self, all_tasks, disable_log_files=False,reset_logfiles=False,base_log_dir=ROOT_AUGUR_DIRECTORY + "/logs/"):

log_config = get_log_config()
Expand All @@ -111,7 +128,7 @@
try:
print("(tasks) Reseting log files")
shutil.rmtree(base_log_dir)
except FileNotFoundError as e:

Check warning on line 131 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0612: Unused variable 'e' (unused-variable) Raw Output: augur/application/logs.py:131:12: W0612: Unused variable 'e' (unused-variable)
pass

if log_config["log_level"].lower() == "debug":
Expand Down Expand Up @@ -170,7 +187,7 @@
return self.logger_names


class AugurLogger():

Check warning on line 190 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 C0115: Missing class docstring (missing-class-docstring) Raw Output: augur/application/logs.py:190:0: C0115: Missing class docstring (missing-class-docstring)
def __init__(self, logger_name, disable_log_files=False,reset_logfiles=False,base_log_dir=ROOT_AUGUR_DIRECTORY + "/logs/"):

log_config = get_log_config()
Expand All @@ -182,7 +199,7 @@
try:
print("(augur) Reseting log files")
shutil.rmtree(base_log_dir)
except FileNotFoundError as e:

Check warning on line 202 in augur/application/logs.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0612: Unused variable 'e' (unused-variable) Raw Output: augur/application/logs.py:202:12: W0612: Unused variable 'e' (unused-variable)
pass

if log_config["log_level"].lower() == "debug":
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import requests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0114: Missing module docstring (missing-module-docstring)

import logging

logger = logging.getLogger(__name__)

def get_NPM_data(package):
url = "https://registry.npmjs.org/%s" % package
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0209: Formatting a regular string which could be an f-string (consider-using-f-string)

r = requests.get(url)

Check warning on line 8 in augur/tasks/git/dependency_libyear_tasks/libyear_util/npm_libyear_utils.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W3101: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely (missing-timeout) Raw Output: augur/tasks/git/dependency_libyear_tasks/libyear_util/npm_libyear_utils.py:8:8: W3101: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely (missing-timeout)
if r.status_code < 400:
return r.json()
return {}
Expand Down Expand Up @@ -42,10 +45,16 @@


def get_lastest_minor(version, data):
versions = data['versions']
try:
versions = data['versions']
except Exception as e:
logger.info(f'error is {e} on the NPM. Hey, its NODEJS, of course it does not work :D ')
raise e

try:
index = list(versions.keys()).index(version)
except ValueError as e:
logger.info(f'error is {e} on the NPM. Some kind of value error. Probably a VALUES error for Node, #AmIRight?')
raise e

major,minor,patch = split_version(version)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0612: Unused variable 'patch' (unused-variable)

Expand Down
15 changes: 14 additions & 1 deletion augur/tasks/git/dependency_tasks/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from augur.tasks.git.dependency_tasks.dependency_util import dependency_calculator as dep_calc
from augur.tasks.util.worker_util import parse_json_from_subprocess_call
from augur.tasks.git.util.facade_worker.facade_worker.utilitymethods import get_absolute_repo_path
from augur.tasks.github.util.github_random_key_auth import GithubRandomKeyAuth


def generate_deps_data(logger, repo_git):
Expand Down Expand Up @@ -82,10 +83,22 @@ def generate_scorecard(logger, repo_git):
#setting the environmental variable which is required by scorecard

with get_session() as session:

#key_handler = GithubRandomKeyAuth(logger)
key_handler = GithubApiKeyHandler(logger)
os.environ['GITHUB_AUTH_TOKEN'] = key_handler.get_random_key()

# This seems outdated
#setting the environmental variable which is required by scorecard
#key_handler = GithubApiKeyHandler(session, session.logger)
#os.environ['GITHUB_AUTH_TOKEN'] = key_handler.get_random_key()

try:
required_output = parse_json_from_subprocess_call(logger,['./scorecard', command, '--format=json'],cwd=path_to_scorecard)
except Exception as e:
session.logger.error(f"Could not parse required output! Error: {e}")
raise e

# end

logger.info('adding to database...')
logger.debug(f"output: {required_output}")
Expand Down
26 changes: 14 additions & 12 deletions augur/tasks/git/util/facade_worker/facade_worker/rebuildcache.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,8 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
# ("DELETE c.* FROM dm_repo_group_weekly c "
# "JOIN repo_groups p ON c.repo_group_id = p.repo_group_id WHERE "
# "p.rg_recache=TRUE")
execute_sql(clear_dm_repo_group_weekly)

# session.execute_sql(clear_dm_repo_group_weekly)

clear_dm_repo_group_monthly = s.sql.text("""
DELETE
Expand All @@ -411,7 +412,8 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
# ("DELETE c.* FROM dm_repo_group_monthly c "
# "JOIN repo_groups p ON c.repo_group_id = p.repo_group_id WHERE "
# "p.rg_recache=TRUE")
execute_sql(clear_dm_repo_group_monthly)

# session.execute_sql(clear_dm_repo_group_monthly)

clear_dm_repo_group_annual = s.sql.text("""
DELETE
Expand All @@ -425,7 +427,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
# ("DELETE c.* FROM dm_repo_group_annual c "
# "JOIN repo_groups p ON c.repo_group_id = p.repo_group_id WHERE "
# "p.rg_recache=TRUE")
execute_sql(clear_dm_repo_group_annual)
# session.execute_sql(clear_dm_repo_group_annual)

clear_dm_repo_weekly = s.sql.text("""
DELETE
Expand All @@ -442,7 +444,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
# "JOIN repo r ON c.repo_id = r.repo_id "
# "JOIN repo_groups p ON r.repo_group_id = p.repo_group_id WHERE "
# "p.rg_recache=TRUE")
execute_sql(clear_dm_repo_weekly)
# session.execute_sql(clear_dm_repo_weekly)

clear_dm_repo_monthly = s.sql.text("""
DELETE
Expand All @@ -459,7 +461,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
# "JOIN repo r ON c.repo_id = r.repo_id "
# "JOIN repo_groups p ON r.repo_group_id = p.repo_group_id WHERE "
# "p.rg_recache=TRUE")
execute_sql(clear_dm_repo_monthly)
# session.execute_sql(clear_dm_repo_monthly)

clear_dm_repo_annual = s.sql.text("""
DELETE
Expand All @@ -476,7 +478,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
# "JOIN repo r ON c.repo_id = r.repo_id "
# "JOIN repo_groups p ON r.repo_group_id = p.repo_group_id WHERE "
# "p.rg_recache=TRUE")
execute_sql(clear_dm_repo_annual)
# session.execute_sql(clear_dm_repo_annual)

clear_unknown_cache = s.sql.text("""
DELETE
Expand Down Expand Up @@ -574,7 +576,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
"r.repo_group_id, info.a, info.b, info.c")
).bindparams(tool_source=facade_helper.tool_source,tool_version=facade_helper.tool_version,data_source=facade_helper.data_source)

execute_sql(cache_projects_by_week)
# session.execute_sql(cache_projects_by_week)

cache_projects_by_month = s.sql.text(
("INSERT INTO dm_repo_group_monthly (repo_group_id, email, affiliation, month, year, added, removed, whitespace, files, patches, tool_source, tool_version, data_source) "
Expand Down Expand Up @@ -610,7 +612,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
"r.repo_group_id, info.a, info.b, info.c"
)).bindparams(tool_source=facade_helper.tool_source,tool_version=facade_helper.tool_version,data_source=facade_helper.data_source)

execute_sql(cache_projects_by_month)
# session.execute_sql(cache_projects_by_month)

cache_projects_by_year = s.sql.text((
"INSERT INTO dm_repo_group_annual (repo_group_id, email, affiliation, year, added, removed, whitespace, files, patches, tool_source, tool_version, data_source) "
Expand Down Expand Up @@ -650,7 +652,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):



execute_sql(cache_projects_by_year)
# session.execute_sql(cache_projects_by_year)
# Start caching by repo

facade_helper.log_activity('Verbose','Caching repos')
Expand Down Expand Up @@ -690,7 +692,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
"a.repo_id, info.a, info.b, info.c"
)).bindparams(tool_source=facade_helper.tool_source,tool_version=facade_helper.tool_version,data_source=facade_helper.data_source)

execute_sql(cache_repos_by_week)
# session.execute_sql(cache_repos_by_week)

cache_repos_by_month = s.sql.text((
"INSERT INTO dm_repo_monthly (repo_id, email, affiliation, month, year, added, removed, whitespace, files, patches, tool_source, tool_version, data_source)"
Expand Down Expand Up @@ -726,7 +728,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
"a.repo_id, info.a, info.b, info.c"
)).bindparams(tool_source=facade_helper.tool_source,tool_version=facade_helper.tool_version,data_source=facade_helper.data_source)

execute_sql(cache_repos_by_month)
# session.execute_sql(cache_repos_by_month)

cache_repos_by_year = s.sql.text((
"INSERT INTO dm_repo_annual (repo_id, email, affiliation, year, added, removed, whitespace, files, patches, tool_source, tool_version, data_source)"
Expand Down Expand Up @@ -760,7 +762,7 @@ def rebuild_unknown_affiliation_and_web_caches(facade_helper):
"a.repo_id, info.a, info.b, info.c"
)).bindparams(tool_source=facade_helper.tool_source,tool_version=facade_helper.tool_version,data_source=facade_helper.data_source)

execute_sql(cache_repos_by_year)
# session.execute_sql(cache_repos_by_year)

# Reset cache flags

Expand Down
26 changes: 13 additions & 13 deletions augur/tasks/github/messages/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
from augur.tasks.init.celery_app import AugurCoreRepoCollectionTask
from augur.application.db.data_parse import *
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0401: Wildcard import augur.application.db.data_parse (wildcard-import)

from augur.tasks.github.util.github_paginator import GithubPaginator
from augur.tasks.github.util.github_random_key_auth import GithubRandomKeyAuth
from augur.tasks.github.util.github_task_session import GithubTaskManifest
from augur.tasks.util.worker_util import remove_duplicate_dicts
from augur.tasks.github.util.util import get_owner_repo
from augur.application.db.models import Message, PullRequestMessageRef, IssueMessageRef, Contributor
from augur.application.db.lib import get_repo_by_repo_git, bulk_insert_dicts, get_issues_by_repo_id, get_pull_requests_by_repo_id

from augur.application.db.models import PullRequest, Message, Issue, PullRequestMessageRef, IssueMessageRef, Contributor, Repo, CollectionStatus
from augur.application.db import get_engine, get_session
from sqlalchemy.sql import text

platform_id = 1

Expand All @@ -27,8 +27,8 @@ def collect_github_messages(repo_git: str) -> None:
Repo.repo_git == repo_git).one().repo_id

owner, repo = get_owner_repo(repo_git)

task_name = f"{owner}/{repo}: Message Task"


if is_repo_small(repo_id):
message_data = fast_retrieve_all_pr_and_issue_messages(repo_git, logger, manifest.key_auth, task_name)
Expand Down Expand Up @@ -133,7 +133,7 @@ def process_large_issue_and_pr_message_collection(repo_id, repo_git: str, logger
process_messages(all_data, task_name, repo_id, logger, augur_db)


def process_messages(messages, task_name, repo_id, logger):
def process_messages(messages, task_name, repo_id, logger, augur_db):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R0914: Too many local variables (39/30) (too-many-locals)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R0912: Too many branches (18/12) (too-many-branches)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R0915: Too many statements (78/50) (too-many-statements)


tool_source = "Pr comment task"
tool_version = "2.0"
Expand All @@ -152,13 +152,13 @@ def process_messages(messages, task_name, repo_id, logger):

# create mapping from issue url to issue id of current issues
issue_url_to_id_map = {}
issues = get_issues_by_repo_id(repo_id)
issues = augur_db.session.query(Issue).filter(Issue.repo_id == repo_id).all()
for issue in issues:
issue_url_to_id_map[issue.issue_url] = issue.issue_id

# create mapping from pr url to pr id of current pull requests
pr_issue_url_to_id_map = {}
prs = get_pull_requests_by_repo_id(repo_id)
prs = augur_db.session.query(PullRequest).filter(PullRequest.repo_id == repo_id).all()
for pr in prs:
pr_issue_url_to_id_map[pr.pr_issue_url] = pr.pull_request_id

Expand Down Expand Up @@ -229,13 +229,13 @@ def process_messages(messages, task_name, repo_id, logger):
contributors = remove_duplicate_dicts(contributors)

logger.info(f"{task_name}: Inserting {len(contributors)} contributors")
bulk_insert_dicts(logger, contributors, Contributor, ["cntrb_id"])
augur_db.insert_data(contributors, Contributor, ["cntrb_id"])

logger.info(f"{task_name}: Inserting {len(message_dicts)} messages")
message_natural_keys = ["platform_msg_id", "pltfrm_id"]
message_return_columns = ["msg_id", "platform_msg_id"]
message_string_fields = ["msg_text"]
message_return_data = bulk_insert_dicts(logger, message_dicts, Message, message_natural_keys,
message_return_data = augur_db.insert_data(message_dicts, Message, message_natural_keys,
return_columns=message_return_columns, string_fields=message_string_fields)
if message_return_data is None:
return
Expand All @@ -258,11 +258,11 @@ def process_messages(messages, task_name, repo_id, logger):

logger.info(f"{task_name}: Inserting {len(pr_message_ref_dicts)} pr messages ref rows")
pr_message_ref_natural_keys = ["pull_request_id", "pr_message_ref_src_comment_id"]
bulk_insert_dicts(logger, pr_message_ref_dicts, PullRequestMessageRef, pr_message_ref_natural_keys)
augur_db.insert_data(pr_message_ref_dicts, PullRequestMessageRef, pr_message_ref_natural_keys)

logger.info(f"{task_name}: Inserting {len(issue_message_ref_dicts)} issue messages ref rows")
issue_message_ref_natural_keys = ["issue_id", "issue_msg_ref_src_comment_id"]
bulk_insert_dicts(logger, issue_message_ref_dicts, IssueMessageRef, issue_message_ref_natural_keys)
augur_db.insert_data(issue_message_ref_dicts, IssueMessageRef, issue_message_ref_natural_keys)

logger.info(f"{task_name}: Inserted {len(message_dicts)} messages. {len(issue_message_ref_dicts)} from issues and {len(pr_message_ref_dicts)} from prs")

Expand All @@ -287,4 +287,4 @@ def process_github_comment_contributors(message, tool_source, tool_version, data
# This is done by searching all the dicts for the given key that has the specified value
def find_dict_in_list_of_dicts(data, key, value):

return next((item for item in data if item[key] == value), None)
return next((item for item in data if item[key] == value), None)
43 changes: 29 additions & 14 deletions augur/tasks/github/pull_requests/files_model/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,26 @@
from augur.tasks.github.util.gh_graphql_entities import GraphQlPageCollection
from augur.application.db.models import *
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0401: Wildcard import augur.application.db.models (wildcard-import)

from augur.tasks.github.util.util import get_owner_repo
from augur.application.db.lib import bulk_insert_dicts, execute_sql
from augur.application.db.util import execute_session_query
import traceback

def pull_request_files_model(repo,logger, key_auth):
def pull_request_files_model(repo_id,logger, augur_db, key_auth):

# query existing PRs and the respective url we will append the commits url to
pr_number_sql = s.sql.text("""
SELECT DISTINCT pr_src_number as pr_src_number, pull_requests.pull_request_id
FROM pull_requests--, pull_request_meta
WHERE repo_id = :repo_id
""").bindparams(repo_id=repo.repo_id)
""").bindparams(repo_id=repo_id)
pr_numbers = []
#pd.read_sql(pr_number_sql, self.db, params={})

result = execute_sql(pr_number_sql)#.fetchall()
result = augur_db.execute_sql(pr_number_sql)#.fetchall()
pr_numbers = [dict(row) for row in result.mappings()]

query = augur_db.session.query(Repo).filter(Repo.repo_id == repo_id)
repo = execute_session_query(query, 'one')

owner, name = get_owner_repo(repo.repo_git)

pr_file_rows = []
Expand Down Expand Up @@ -59,20 +63,31 @@ def pull_request_files_model(repo,logger, key_auth):
'values' : values
}


logger.debug(f"query: {query}; key_auth: {key_auth}; params: {params}")
file_collection = GraphQlPageCollection(query, key_auth, logger,bind=params)

pr_file_rows += [{
'pull_request_id': pr_info['pull_request_id'],
'pr_file_additions': pr_file['additions'] if 'additions' in pr_file else None,
'pr_file_deletions': pr_file['deletions'] if 'deletions' in pr_file else None,
'pr_file_path': pr_file['path'],
'data_source': 'GitHub API',
'repo_id': repo.repo_id,
} for pr_file in file_collection if pr_file and 'path' in pr_file]
logger.debug(f"Results of file_collection: {file_collection}")

for pr_file in file_collection:
logger.debug(f"CHECK: {repr(file_collection)}")
if pr_file and 'path' in pr_file:
logger.debug(f"Checks out for {repr(pr_file)} and {repr(file_collection)}")

try:
pr_file_rows += [{
'pull_request_id': pr_info['pull_request_id'],
'pr_file_additions': pr_file['additions'] if 'additions' in pr_file else None,
'pr_file_deletions': pr_file['deletions'] if 'deletions' in pr_file else None,
'pr_file_path': pr_file['path'],
'data_source': 'GitHub API',
'repo_id': repo_id,
} for pr_file in file_collection if pr_file and 'path' in pr_file]
except Exception as e:
logger.error(f"PR Files Traceback: {''.join(traceback.format_exception(None, e, e.__traceback__))}")



if len(pr_file_rows) > 0:
#Execute a bulk upsert with sqlalchemy
pr_file_natural_keys = ["pull_request_id", "repo_id", "pr_file_path"]
bulk_insert_dicts(logger, pr_file_rows, PullRequestFile, pr_file_natural_keys)
augur_db.insert_data(pr_file_rows, PullRequestFile, pr_file_natural_keys)
Loading
Loading