Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telescope Hacktoberfest Guide and Tools #3738

Open
humphd opened this issue Oct 26, 2022 · 2 comments
Open

Telescope Hacktoberfest Guide and Tools #3738

humphd opened this issue Oct 26, 2022 · 2 comments
Labels
type: discussion Requires conversation type: enhancement New feature or request type: research Requires researching and commenting what you have found

Comments

@humphd
Copy link
Contributor

humphd commented Oct 26, 2022

I want to see us add a new "feature" to Telescope. This issue will serve as a starting point, but we'll need to file specific issues for the different parts. I'll get the ball rolling.

Telescope users have the following in common:

  • are Seneca students
  • take (or took) the open source courses
  • work (or worked) on the Telescope project in some capacity
  • participate (or participated) in Hacktoberfest
  • blog (or blogged) about their work
  • left a record of the PRs they created and projects they worked on in one of our wikis

One of the things I read over and over again from students is that they say they struggle to find issues to work on during Hacktoberfest. The reality isn't that they struggle to find issues (there are millions), but rather that they struggle to reconcile their current skillset with the expectations of the course and the time available. They also struggle with imposter syndrome, and imagine that they can't work on many projects that they actually could do just fine.

We have a number of tools to help with these problems, based on what I wrote in the list above. First, every student blogs about their PRs, and has done so for years. Second, we have wiki pages with lots of info about what people worked on (i.e., links to issues, PRs). These wiki pages, therefore, contain all kinds of info about projects that previous students found useful for their purposes, and which might still be valuable. Furthermore, their blog posts provide guidance and insights on how to work on them.

We need to mine this data and make it available to the next set of students

Also, I'd like to see us collect info about how to search for issues, how to evaluate projects, etc. based on the current work people have done in October 2022. We should create a guide document of some kind that lays out a template for how to get started and how to be effective.

Here are the previous five years of Hacktoberfest wiki pages with every student's GitHub info we can mine for data:

In the past, I've manually done some of this work:

I have some scripts I use to get stats:

https://github.com/humphd/github-contrib-stats

Things that would be good to extract from those wiki pages, and the GitHub URLs they include:

  • the name of every repo that people have contributed to.
  • do any repos come up again and again?
  • are these projects still active?
  • what languages do they use?
  • which labels were used on the issues that people worked on? Can we re-use those to make new queries on the same projects?
  • how long did it take (on average) for PRs to get reviewed? merged?

Since this old data is static, I wonder if we could do any AI/machine learning on it to extract any lessons? Any data/ml folks want to try? For example, we can use the GitHub API to pull all kinds of data (JSON) about each PR and could use that to extract features we might train a model on. Could we build something we can use in the future to evaluate whether a given PR is a good fit?

@humphd humphd added the type: enhancement New feature or request label Oct 26, 2022
@sirinoks sirinoks added type: discussion Requires conversation type: research Requires researching and commenting what you have found labels Oct 29, 2022
@sirinoks
Copy link
Contributor

So, to clarify what we are talking about, and correct me if I'm wrong. And I'll add some of my own ideas too.
A microservice within telescope, that uses GitHub user information. Specifically:

  • Contribution activity
  • Languages used
  • Forked, starred and contributed to projects
  • Tags and labels of user's created issues/PRs
  • Blog posts

Then, connect this data to other data of the same kind to find:

  • Most commonly contributed to projects
  • Activity of the project, presence and quality of their README's and CONTRIBUTING docs (?)
  • Mostly used tech/languages/labels
  • Combination of those things that leads to the most likely accepted PR vs a failed PR
  • Analyse the information to assume difficulty of a newly selected project later on

For such a microservice, we would need:

  • Connecting to GitHub data, reading it and processing it
  • UI for displaying findings, such as graphs, tables, lists, or other ways we will come up with
  • User's ability to select their preferred languages, and rate repos they tried themselves
  • Select on a tech stack to solve this.

Developer tools

Now, about the tools. Firstly, do we still want this to be a microservice within Telescope, or is it a completely new and different thing? What I roughly know of AI/Machine learning is - better use Python. We can, however, stick with JS, seems to be okay too.

We need to see who's down to try this, and ask them - what they want to use.

Steps to take

What this project will involve:

  • Machine learning back end - select which?
  • Constructed from data UI front end - graphs, modules, suggestions, contribution log
  • Github auth
  • Data transfer format (looks like JSON?)

We can so far simplify it into a smaller step task. Before we even dive into machine learning, we can start with simply connecting to github, and displaying info on user's. Specifically, we can create a contribution log, which would just be a list of latest contributions of those who singed into our system, showing all the informaiton we want to gather. Can start with just username and repo name. Later - adding in language, tags, time. Finally, more detailed stuff like "how long it took for PR to get merged", "who were the reviewers", etc.

Then, a UI for the user to rate their experience. How difficult was the task? How easy was the project to figure out? (setup, instructions, documentation) How familiar were you with the tools used?

Even before we go into unknown and cool machine learning, there's still a lot of stuff we can do.

@humphd
Copy link
Contributor Author

humphd commented Oct 29, 2022

I'm not sure if we need a microservice for this or not. We could actually start with a static HTML page, since all the data is historic.

Extracting and generating the data to create this HTML will require some code, though.

I think doing this within Telescope is a good idea, since that's where this data comes from, and that's who will likely use it. However, I'm not sure where to "put" it yet. We can solve that later when we have something to put somewhere!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: discussion Requires conversation type: enhancement New feature or request type: research Requires researching and commenting what you have found
Projects
None yet
Development

No branches or pull requests

2 participants