This repository has been archived by the owner on Aug 29, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 12
Qualitative Analysis v2 #9
Comments
This was referenced Jan 30, 2024
Open
whilefoo
pushed a commit
to ubiquity-whilefoo/comment-incentives-2
that referenced
this issue
Feb 18, 2024
feat: read org config
@FernandVEYRIER if you want to work on this repository, perhaps instead you can work on this task but package it as a standalone module. Then we can import it in the new version once we get that far with it? |
@pavlovcik It feels like this repo should eventually be deprecated in favor of the newer version isn't it? |
Yes but the research will carry over. Also the deliverables can be packaged neatly into portable modules that we can use in the next version! |
Sure thing. This should be broken down into smaller tasks to be carried on then |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Overview
Our current implementation for qualitative analysis is really janky. I implemented it myself. I asked ChatGPT to determine how "on topic" comments are and to reply only with an array of floats. It tends to send back unexpected amounts of elements in the array, when instead it is supposed to send exactly one element per one corresponding GitHub issue comment. I have a "brute force" approach where I tell it to keep retrying if the amount doesn't match, but it seems to break on large comments anyways.
This obviously is not the correct way to understand text similarity but it was a quick and dirty prototype that kind of works, so I shipped it.
The reason why this research is essential for the bot is because in the future the bot can truly understand all conversations and tasks happening within our network.
R&D Tasks
cf/baai/bge-base-en-v1.5
public.vector-embeddings
table withid
andembedding
Appendix
Useful Capabilities Derived From Embeddings
Why are generating vector embeddings important for our strategic priorities?
Task Matchmaking
One killer feature is that we can do talent matchmaking with new tasks posted to our network. For example:
TASK A
was to doX
and was completed byCONTRIBUTOR A
TASK B
is posted in the future to also doX
, then the bot can understand this and suggest thatCONTRIBUTOR A
is assigned (and can even link back to the previously completed similar issue as proof.)Reporting
Another useful capability is to be able to query in natural language to the bot (perhaps on Telegram) for any up-to-date knowledge related to any organization's live operations/products using our system. This could be very useful for investor updates or developer onboarding.
Assistive Time Estimates
We can track how long it took exactly for contributors to turn around the deliverable for similar tasks. We can measure starting from the
assign
time to their last commit time (instead of when its merged because that is mostly due to the reviewer team's lag.) We can also add more confidence/weight based on how similar the task description is, and how much XP the contributor has on the repository (if they have any XP it is assumed that they are already onboarded to the repository, as time estimates are designed to not include onboarding time.)Assistive Priority Estimates
This may have lower usability compared to assistive time estimates, because organizations may have different strategic priorities, but we can crowdsource how high of priority similar tasks are.
Combining All Of The Above
If we can pull this off well, then we can create a magical experience of a team posting a new issue, and the bot fully handling the pricing based on crowd sourced time and priority estimates.
Then matchmaking the optimal talent to execute the task.
It can be a truly "hands off" management experience for getting things done at DAOs.
Cloudflare Embeddings
cf/baai/bge-base-en-v1.5
1WORKSPACE_ID
to your Cloudflare workspace id.I've never worked with vector embeddings but I understand the general concept of it. We should use Cloudflare's free service to generate and store embeddings of every GitHub comment when the comment incentives are being calculated.
I think that we can make a
public.vector-embeddings
table, and the ID can be the GitHub comment ID. Then we can store an embedding in the next column over.Finally I understand that we should be able to compute how "on topic" a comment is, based on the GItHub issue specification that the comment was posted on.
Footnotes
https://huggingface.co/BAAI/bge-base-en-v1.5 ↩
The text was updated successfully, but these errors were encountered: