Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message Collection Timeout: Large Repo/First Run #2889

Open
sgoggins opened this issue Aug 9, 2024 · 2 comments
Open

Message Collection Timeout: Large Repo/First Run #2889

sgoggins opened this issue Aug 9, 2024 · 2 comments
Assignees
Labels
API Related to Augur's metrics API bug Documents unexpected/wrong/buggy behavior critical-fix Should be addressed before any other issue/PRs database Related to Augur's unifed data model deployed version Live problems with deployed versions

Comments

@sgoggins
Copy link
Member

sgoggins commented Aug 9, 2024

There is an issue with long running message collection first jobs, where they sometimes do not finish for large repos due to the following error:

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
    self.dialect.do_execute(
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL connection has been closed unexpectedly


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 43, in collect_github_messages
    process_large_issue_and_pr_message_collection(repo_id, repo_git, logger, manifest.key_auth, task_name, augur_db)
  File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 109, in process_large_issue_and_pr_message_collection
    process_messages(all_data, task_name, repo_id, logger, augur_db)
  File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 212, in process_messages
    augur_db.insert_data(contributors, Contributor, ["cntrb_id"])
  File "/home/ubuntu/github/augur/augur/application/db/session.py", line 188, in insert_data
    raise e
  File "/home/ubuntu/github/augur/augur/application/db/session.py", line 175, in insert_data
    connection.execute(stmnt)
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1416, in execute
    return meth(
           ^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1848, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1988, in _exec_single_context
    self._handle_dbapi_exception(
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
    self.dialect.do_execute(
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly

[SQL: INSERT INTO augur_data.contributors (cntrb_id, cntrb_login, cntrb_email, cntrb_full_name, cntrb_company, cntrb_created_at, cntrb_location, cntrb_canonical, cntrb_last_used, gh_user_id, gh_login, gh_url, gh_html_url, gh_node_id, gh_avatar_url, gh_gravatar_id, gh_followers_url, gh_following_url, gh_gists_url, gh_starred_url, gh_subscriptions_url, gh_organizations_url, gh_repos_url, gh_events_url, gh_received_events_url, gh_type, gh_site_admin, tool_source, tool_version, data_source) VALUES (%(cntrb_id_m0)s::UUID, %(cntrb_login_m0)s, %(cntrb_email_m0)s, %(cntrb_full_name_m0)s, %(cntrb_company_m0)s, %(cntrb_created_at_m0)s, %(cntrb_location_m0)s, %(cntrb_canonical_m0)s, %(cntrb_last_used_m0)s, %(gh_user_id_m0)s, %(gh_login_m0)s, %(gh_url_m0)s, %(gh_html_url_m0)s, %(gh_node_id_m0)s, %(gh_avatar_url_m0)s, %(gh_gravatar_id_m0)s, %(gh_followers_url_m0)s, %(gh_following_url_m0)s, %(gh_gists_url_m0)s, %(gh_starred_url_m0)s, %(gh_subscriptions_url_m0)s, %(gh_organizations_url_m0)s, %(gh_repos_url_m0)s, %(gh_events_url_m0)s, %(gh_received_events_url_m0)s, %(gh_type_m0)s, %(gh_site_admin_m0)s, %(tool_source_m0)s, %(tool_version_m0)s, %(data_source_m0)s), (%(cntrb_id_m1)s::UUID, %(cntrb_login_m1)s, %(cntrb_email_m1)s, %(cntrb_full_name_m1)s, %(cntrb_company_m1)s, %(cntrb_created_at_m1)s, %(cntrb_location_m1)s, %(cntrb_canonical_m1)s, %(cntrb_last_used_m1)s, %(gh_user_id_m1)s, %(gh_login_m1)s, %(gh_url_m1)s, %(gh_html_url_m1)s, %(gh_node_id_m1)s, %(gh_avatar_url_m1)s, %(gh_gravatar_id_m1)s, %(gh_followers_url_m1)s, %(gh_following_url_m1)s, %(gh_gists_url_m1)s, %(gh_starred_url_m1)s, %(gh_subscriptions_url_m1)s, %(gh_organizations_url_m1)s, %(gh_repos_url_m1)s, %(gh_events_url_m1)s, %(gh_received_events_url_m1)s, %(gh_type_m1)s, %(gh_site_admin_m1)s, %(tool_source_m1)s, %(tool_version_m1)s, %(data_source_m1)s), (%(cntrb_id_m2)s::UUID, %(cntrb_login_m2)s, %(cntrb_email_m2)s, %(cntrb_full_name_m2)s, %(cntrb_company_m2)s, %(cntrb_created_at_m2)s, %(cntrb_location_m2)s, %(cntrb_canonical_m2)s, %(cntrb_last_used_m2)s, %(gh_user_id_m2)s, %(gh_login_m2)s, %(gh_url_m2)s, %(gh_html_url_m2)s, %(gh_node_id_m2)s, %(gh_avatar_url_m2)s, %(gh_gravatar_id_m2)s, %(gh_followers_url_m2)s, %(gh_following_url_m2)s, %(gh_gists_url_m2)s, %(gh_starred_url_m2)s, %(gh_subscriptions_url_m2)s, %(gh_organizations_url_m2)s, %(gh_repos_url_m2)s, %(gh_events_url_m2)s, %(gh_received_events_url_m2)s, %(gh_type_m2)s, %(gh_site_admin_m2)s, %(tool_source_m2)s, %(tool_version_m2)s, %(data_source_m2)s), (%(cntrb_id_m3)s::UUID, %(cntrb_login_m3)s, %(cntrb_email_m3)s, %(cntrb_full_name_m3)s, %(cntrb_company_m3)s, %(cntrb_created_at_m3)s, %(cntrb_location_m3)s, %(cntrb_canonical_m3)s, %(cntrb_last_used_m3)s, %(gh_user_id_m3)s, %(gh_login_m3)s, %(gh_url_m3)s, %(gh_html_url_m3)s, %(gh_node_id_m3)s, %(gh_avatar_url_m3)s, %(gh_gravatar_id_m3)s, %(gh_followers_url_m3)s, %(gh_following_url_m3)s, %(gh_gists_url_m3)s, %(gh_starred_url_m3)s, %(gh_subscriptions_url_m3)s, %(gh_organizations_url_m3)s, %(gh_repos_url_m3)s, %(gh_events_url_m3)s, %(gh_received_events_url_m3)s, %(gh_type_m3)s, %(gh_site_admin_m3)s, %(tool_source_m3)s, %(tool_version_m3)s, %(data_source_m3)s) ON CONFLICT (cntrb_id) DO UPDATE SET cntrb_id = excluded.cntrb_id, cntrb_login = excluded.cntrb_login, cntrb_email = excluded.cntrb_email, cntrb_full_name = excluded.cntrb_full_name, cntrb_company = excluded.cntrb_company, cntrb_created_at = excluded.cntrb_created_at, cntrb_location = excluded.cntrb_location, cntrb_canonical = excluded.cntrb_canonical, cntrb_last_used = excluded.cntrb_last_used, gh_user_id = excluded.gh_user_id, gh_login = excluded.gh_login, gh_url = excluded.gh_url, gh_html_url = excluded.gh_html_url, gh_node_id = excluded.gh_node_id, gh_avatar_url = excluded.gh_avatar_url, gh_gravatar_id = excluded.gh_gravatar_id, gh_followers_url = excluded.gh_followers_url, gh_following_url = excluded.gh_following_url, gh_gists_url = excluded.gh_gists_url, gh_starred_url = excluded.gh_starred_url, gh_subscriptions_url = excluded.gh_subscriptions_url, gh_organizations_url = excluded.gh_organizations_url, gh_repos_url = excluded.gh_repos_url, gh_events_url = excluded.gh_events_url, gh_received_events_url = excluded.gh_received_events_url, gh_type = excluded.gh_type, gh_site_admin = excluded.gh_site_admin, tool_source = excluded.tool_source, tool_version = excluded.tool_version, data_source = excluded.data_source]
[parameters: {'cntrb_id_m0': UUID('01000027-9900-0000-0000-000000000000'), 'cntrb_login_m0': 'ghost', 'cntrb_email_m0': None, 'cntrb_full_name_m0': None, 'cntrb_company_m0': None, 'cntrb_created_at_m0': None, 'cntrb_location_m0': None, 'cntrb_canonical_m0': None, 'cntrb_last_used_m0': None, 'gh_user_id_m0': 10137, 'gh_login_m0': 'ghost', 'gh_url_m0': 'https://api.github.com/users/ghost', 'gh_html_url_m0': 'https://github.com/ghost', 'gh_node_id_m0': 'MDQ6VXNlcjEwMTM3', 'gh_avatar_url_m0': 'https://avatars.githubusercontent.com/u/10137?v=4', 'gh_gravatar_id_m0': '', 'gh_followers_url_m0': 'https://api.github.com/users/ghost/followers', 'gh_following_url_m0': 'https://api.github.com/users/ghost/following{/other_user}', 'gh_gists_url_m0': 'https://api.github.com/users/ghost/gists{/gist_id}', 'gh_starred_url_m0': 'https://api.github.com/users/ghost/starred{/owner}{/repo}', 'gh_subscriptions_url_m0': 'https://api.github.com/users/ghost/subscriptions', 'gh_organizations_url_m0': 'https://api.github.com/users/ghost/orgs', 'gh_repos_url_m0': 'https://api.github.com/users/ghost/repos', 'gh_events_url_m0': 'https://api.github.com/users/ghost/events{/privacy}', 'gh_received_events_url_m0': 'https://api.github.com/users/ghost/received_events', 'gh_type_m0': 'User', 'gh_site_admin_m0': False, 'tool_source_m0': 'Pr comment task', 'tool_version_m0': '2.0', 'data_source_m0': 'Github API', 'cntrb_id_m1': UUID('0106da09-7500-0000-0000-000000000000'), 'cntrb_login_m1': 'stephengillie', 'cntrb_email_m1': None, 'cntrb_full_name_m1': None, 'cntrb_company_m1': None, 'cntrb_created_at_m1': None, 'cntrb_location_m1': None, 'cntrb_canonical_m1': None, 'cntrb_last_used_m1': None, 'gh_user_id_m1': 114952565, 'gh_login_m1': 'stephengillie', 'gh_url_m1': 'https://api.github.com/users/stephengillie', 'gh_html_url_m1': 'https://github.com/stephengillie', 'gh_node_id_m1': 'U_kgDOBtoJdQ', 'gh_avatar_url_m1': 'https://avatars.githubusercontent.com/u/114952565?v=4', 'gh_gravatar_id_m1': '', 'gh_followers_url_m1': 'https://api.github.com/users/stephengillie/followers', 'gh_following_url_m1': 'https://api.github.com/users/stephengillie/following{/other_user}', 'gh_gists_url_m1': 'https://api.github.com/users/stephengillie/gists{/gist_id}', 'gh_starred_url_m1': 'https://api.github.com/users/stephengillie/starred{/owner}{/repo}' ... 20 parameters truncated ... 'gh_login_m2': 'wingetbot', 'gh_url_m2': 'https://api.github.com/users/wingetbot', 'gh_html_url_m2': 'https://github.com/wingetbot', 'gh_node_id_m2': 'MDQ6VXNlcjYzODE2OTk5', 'gh_avatar_url_m2': 'https://avatars.githubusercontent.com/u/63816999?v=4', 'gh_gravatar_id_m2': '', 'gh_followers_url_m2': 'https://api.github.com/users/wingetbot/followers', 'gh_following_url_m2': 'https://api.github.com/users/wingetbot/following{/other_user}', 'gh_gists_url_m2': 'https://api.github.com/users/wingetbot/gists{/gist_id}', 'gh_starred_url_m2': 'https://api.github.com/users/wingetbot/starred{/owner}{/repo}', 'gh_subscriptions_url_m2': 'https://api.github.com/users/wingetbot/subscriptions', 'gh_organizations_url_m2': 'https://api.github.com/users/wingetbot/orgs', 'gh_repos_url_m2': 'https://api.github.com/users/wingetbot/repos', 'gh_events_url_m2': 'https://api.github.com/users/wingetbot/events{/privacy}', 'gh_received_events_url_m2': 'https://api.github.com/users/wingetbot/received_events', 'gh_type_m2': 'User', 'gh_site_admin_m2': False, 'tool_source_m2': 'Pr comment task', 'tool_version_m2': '2.0', 'data_source_m2': 'Github API', 'cntrb_id_m3': UUID('01023116-4900-0000-0000-000000000000'), 'cntrb_login_m3': 'azure-pipelines[bot]', 'cntrb_email_m3': None, 'cntrb_full_name_m3': None, 'cntrb_company_m3': None, 'cntrb_created_at_m3': None, 'cntrb_location_m3': None, 'cntrb_canonical_m3': None, 'cntrb_last_used_m3': None, 'gh_user_id_m3': 36771401, 'gh_login_m3': 'azure-pipelines[bot]', 'gh_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D', 'gh_html_url_m3': 'https://github.com/apps/azure-pipelines', 'gh_node_id_m3': 'MDM6Qm90MzY3NzE0MDE=', 'gh_avatar_url_m3': 'https://avatars.githubusercontent.com/in/9426?v=4', 'gh_gravatar_id_m3': '', 'gh_followers_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/followers', 'gh_following_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/following{/other_user}', 'gh_gists_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/gists{/gist_id}', 'gh_starred_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/starred{/owner}{/repo}', 'gh_subscriptions_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/subscriptions', 'gh_organizations_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/orgs', 'gh_repos_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/repos', 'gh_events_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/events{/privacy}', 'gh_received_events_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/received_events', 'gh_type_m3': 'Bot', 'gh_site_admin_m3': False, 'tool_source_m3': 'Pr comment task', 'tool_version_m3': '2.0', 'data_source_m3': 'Github API'}]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1119, in _rollback_impl
    self.engine.dialect.do_rollback(self.connection)
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 692, in do_rollback
    dbapi_connection.rollback()
psycopg2.InterfaceError: connection already closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 22, in collect_github_messages
    with GithubTaskManifest(logger) as manifest:
  File "/home/ubuntu/github/augur/augur/tasks/github/util/github_task_session.py", line 26, in __exit__
    self.augur_db.close()
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2466, in close
    self._close_impl(invalidate=False)
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2535, in _close_impl
    transaction.close(invalidate)
  File "<string>", line 2, in close
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1362, in close
    transaction.close()
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2576, in close
    self._do_close()
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2714, in _do_close
    self._close_impl()
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2700, in _close_impl
    self._connection_rollback_impl()
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2692, in _connection_rollback_impl
    self.connection._rollback_impl()
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1121, in _rollback_impl
    self._handle_dbapi_exception(e, None, None, None, None)
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1119, in _rollback_impl
    self.engine.dialect.do_rollback(self.connection)
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 692, in do_rollback
    dbapi_connection.rollback()
sqlalchemy.exc.InterfaceError: (psycopg2.InterfaceError) connection already closed
(Background on this error at: https://sqlalche.me/e/20/rvf5)

Suggested Fix:

  • Test for loss of connection
  • If connection is lost, open a new one instead of losing all the data in a transaction and setting status to fail
  • Big issue is that "core collection" includes a lot of jobs, and when this one fails, all of them get rerun on the same repository again and again.

Specifics:

@sgoggins sgoggins converted this from a draft issue Aug 9, 2024
@sgoggins
Copy link
Member Author

sgoggins commented Aug 9, 2024

This query :

select msg_id, msg_timestamp, data_collection_date 
from message 
where repo_id=50183 order by msg_timestamp desc; 

shows that the message data is actually quite current:

msg_id	        msg_timestamp	        data_collection_date
1187399851	2024-08-07 23:24:23	2024-08-08 07:31:31
1187971112	2024-08-07 16:07:19	2024-08-08 16:24:42
1187971111	2024-08-07 15:50:44	2024-08-08 16:24:42

A reverse order of the same query show our earliest messages are from April, 2020:

msg_id	        msg_timestamp	        data_collection_date
32392606	2020-04-30 21:38:44	2023-05-11 06:24:54
32392607	2020-04-30 21:38:53	2023-05-11 06:24:54
32392608	2020-04-30 21:41:03	2023-05-11 06:24:54

So, our message collection seems "complete" as April 2020 is when this project started.

@sgoggins sgoggins added bug Documents unexpected/wrong/buggy behavior deployed version Live problems with deployed versions API Related to Augur's metrics API database Related to Augur's unifed data model critical-fix Should be addressed before any other issue/PRs labels Aug 9, 2024
@ABrain7710
Copy link
Contributor

Waiting to see if this occurs again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Related to Augur's metrics API bug Documents unexpected/wrong/buggy behavior critical-fix Should be addressed before any other issue/PRs database Related to Augur's unifed data model deployed version Live problems with deployed versions
Projects
Status: In Progress
Development

No branches or pull requests

2 participants