Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ export data to postgres through hasura graphql api #15

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Conversation

hgwood
Copy link
Member

@hgwood hgwood commented Nov 1, 2019

Goal

The goal is to export all the data fetched from GitHub into a database, with history, so that can access the data easily and quickly, and explore the evolution of the data.

Implementation

I'm using Hasura on top of Postgres to define the SQL schema as well as expose a GraphQL API to insert and read the data easily.

Instead of modifying the existing code to insert to postgres instead of writing JSON files, I've written a separate code file hasura.js that reads the JSON files generated by index.js and loads them into postgres. All of the data is inserted into postgres using one big GraphQL query. Each time, the entire data graph is inserted as new entities in order to keep older entities intact.

How to load up the data into Postgres

  • create .env and fill GITHUB_ID, GITHUB_OAUTH and GITHUB_ORGA
  • run npm start generate
  • run docker-compose up -d
  • add HASURA_ADMIN_SECRET=hasura and HASURA_GRAPHQL_URL=http://localhost:8080/v1/graphql to .env
  • run node -r dotenv/config src/hasura.js

The Hasura container may start faster than the Postgres one, and fail to connect. In this case run docker-compose restart hasura.

Examples of queries

Once the data is loaded up, browser to http://localhost:8080 (password is hasura). In the GraphiQL tab, we can test some GraphQL queries:

History of the number of repositories for a member

query numberOfRepositoriesForMember($login: String!) {
  member(
    where: {
      owner: { login: { _eq: $login } }
    }
    order_by: {
      organization: { fetched_at: desc }
    }
  ) {
    organization {
      fetched_at
    }
    repositories_aggregate {
      aggregate {
        count
      }
    }
  }
}
{
  "data": {
    "member": [
      {
        "organization": {
          "fetched_at": "2019-10-30T15:08:32.387283+00:00"
        },
        "repositories_aggregate": {
          "aggregate": {
            "count": 2
          }
        }
      },
      {
        "organization": {
          "fetched_at": "2019-10-30T15:03:46.044145+00:00"
        },
        "repositories_aggregate": {
          "aggregate": {
            "count": 2
          }
        }
      },
      {
        "organization": {
          "fetched_at": "2019-10-30T14:55:38.577492+00:00"
        },
        "repositories_aggregate": {
          "aggregate": {
            "count": 1
          }
        }
      }
    ]
  }
}

History of contributions of a member

query numberOfContributorsForAllRepositoriesOfMember($login: String!) {
  member(
    where: {
      owner: { login: { _eq: $login } }
    }
    order_by: {
      organization: { fetched_at: desc }
    }
  ) {
    organization {
      fetched_at
    }
    contribution_stats {
      total_commit_contributions
      total_issue_contributions
      total_pull_request_contributions
      total_pull_request_review_contributions
      total_repository_contributions
    }
  }
}
{
  "data": {
    "member": [
      {
        "organization": {
          "fetched_at": "2019-11-01T22:19:37.413963+00:00"
        },
        "contribution_stats": {
          "total_commit_contributions": 2,
          "total_issue_contributions": 3,
          "total_pull_request_contributions": 4,
          "total_pull_request_review_contributions": 0,
          "total_repository_contributions": 4
        }
      },
      {
        "organization": {
          "fetched_at": "2019-11-01T22:00:26.591278+00:00"
        },
        "contribution_stats": {
          "total_commit_contributions": 2,
          "total_issue_contributions": 3,
          "total_pull_request_contributions": 4,
          "total_pull_request_review_contributions": 0,
          "total_repository_contributions": 4
        }
      }
    ]
  }
}

Improvements to make

  • There is currently of lot of migration files. They could be squashed into one file.
  • I think the fetched_at field that is currently on the organization entity and that denote the time that data was fetched, needs to be moved to its own entity.
  • There is currently no way, from a contributor entity, to get back to a member entity (if the contributor is also a member).

@hgwood hgwood self-assigned this Nov 1, 2019
@hgwood hgwood requested a review from bpetetot November 1, 2019 23:32
@hgwood
Copy link
Member Author

hgwood commented Nov 26, 2019

This branch is now based on multi-organizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants