Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrosheet using deprecated GitHub authentication #455

Open
B-D-T opened this issue Sep 24, 2024 · 3 comments
Open

retrosheet using deprecated GitHub authentication #455

B-D-T opened this issue Sep 24, 2024 · 3 comments

Comments

@B-D-T
Copy link

B-D-T commented Sep 24, 2024

The PyGithub package uses a new(er) form of PAT authentication to access the GitHub REST API. The code in retrosheet.py needs to be updated to reflect the new approach.

Steps to replicate

Call any function that uses the g = Github(GH_TOKEN) line. For example,

pybaseball.rosters(2023)  
# or the more explicit :  
pybaseball.retrosheet.rosters(2023)

Issue details

The retrosheet.py file reads a GH_TOKEN environment variable that holds the user's GitHub personal access token.

GH_TOKEN=os.getenv('GH_TOKEN', '')

It then passes the token when instantiating a Github object:

g = Github(GH_TOKEN)

That triggers the following within the github library:

File c:\...\Lib\site-packages\github\MainClass.py:230, in Github.__init__(self, login_or_token, password, jwt, app_auth, base_url, timeout, user_agent, per_page, verify, retry, pool_size, seconds_between_requests, seconds_between_writes, auth)
    225 elif login_or_token is not None:
    226     warnings.warn(
    227         "Argument login_or_token is deprecated, please use " "auth=github.Auth.Token(...) instead",
    228         category=DeprecationWarning,
    229     )
--> 230     auth = github.Auth.Token(login_or_token)
    231 elif jwt is not None:
    232     warnings.warn(
    233         "Argument jwt is deprecated, please use "
    234         "auth=github.Auth.AppAuth(...) or "
    235         "auth=github.Auth.AppAuthToken(...) instead",
    236         category=DeprecationWarning,
    237     )

Perhaps this is related to #414, although the notes make it look like that one is all set.

How to fix

I got it working on my end using code similar to the following:

from github import Github, Auth
...
auth = Auth.Token(GH_TOKEN)
g = Github(auth = auth)
@jackseeburger
Copy link

Did you add this code to the library itself or run it separately? I tried just throwing it in my notebook but am having the same issues with retrosheet

@DattiloJohn
Copy link

@B-D-T what exactly did you change to get this to work?

@B-D-T
Copy link
Author

B-D-T commented Dec 23, 2024

Hi @jackseeburger & @DattiloJohn:

Below is a sample of my code. In short, I duplicated certain sections of code from the pybaseball library, but with the necessary changes to authenticate without using an environment variable.

Here's an example of what I changed, using the rosters() method in the pybaseball.retrosheet module as the sample.

pybaseball code

def rosters(season):

    ###############
    # NOTE: This is the problematic part as it's looking for the environment variable
    GH_TOKEN=os.getenv('GH_TOKEN', '')
    ###############

    try:
        g = Github(GH_TOKEN)
        repo = g.get_repo('chadwickbureau/retrosheet')
        tree = repo.get_git_tree('master')
        for t in tree.tree:
            if t.path == 'rosters':
                subtree = t
        rosters = [t.path for t in repo.get_git_tree(subtree.sha).tree if str(season) in t.path]
        if len(rosters) == 0:
            raise ValueError(f'Rosters not available for {season}')
    except RateLimitExceededException:
        warnings.warn('Github rate limit exceeded. Cannot check if the file you want exists.', UserWarning)

    df_list = [_roster(team = r[:3], season = season, checked=False) for r in rosters]

    return pd.concat(df_list)

My code

from github import Github, Auth as GithubAuth
import pybaseball

class GetDataBySeason:
    def __init__(self) -> None:
        ###############
        # NOTE: Authentication without using an environment variable

        # Set your personal access token here
        gh_personal_access_token_pybaseball: str = 'github_pat_11...'

        # Pass the token to the GithubAuth.Token class
        auth = GithubAuth.Token(gh_personal_access_token_pybaseball)

        # Create a Github object with the token
        self.gh_api_obj = Github(auth = auth)
        ###############
    
    def get_retrosheet_roster_by_season(self, season):
        try:
            # NOTE This uses the authenticated `Github` object
            repo = self.gh_api_obj.get_repo('chadwickbureau/retrosheet')
            season_folder = [f.path[f.path.rfind('/')+1:] for f in repo.get_contents(f'seasons/{season}')]
            rosters = [t for t in season_folder if t.endswith('.ROS')]
            if len(rosters) == 0:
                raise ValueError(f'Rosters not available for {season}')
        except:
            print('Github rate limit exceeded. Cannot check if the file you want exists.', UserWarning)

        df_list = [self.get_retrosheet_roster_by_season_team(team = r[:3], season = season, checked=False) for r in rosters]

        return pd.concat(df_list)
    
    def get_retrosheet_roster_by_season_team(self, season, team, checked = False):
        # Similar code to replicate pybaseball's `_roster` method
        ...

I didn't want to replicate the methods, since that's not sustainable at scale, but doing that served my one-off purpose at the time. It's probably better to just make changes to your local pybaseball source code; I only didn't explore that route because I move across devices a lot, and because I didn't want to lose my changes when doing future updates to the pybaseball package.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants