-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] regenerate the GitHub repo network data #34
Comments
It is difficult somehow since right now for new OpenRank, we do not have network with repo data only, the network we use contains repo and user at same time, so I need to find the export script out and modify to fit current data model. And I need to understand it because the data is just part of the problem, I also need to generate the position of each node by 3d force-layout algorithm. |
Understood! If the script for data export comes out, will it be a task of OpenDigger cron? |
Hopefully, but also the category information by Louvain algorithm is not in any current used database and not accurate actually, so label information is another difficult part. |
I found the script and I think things have been changed a lot in the past 2 years, we need to reconsider how to do this rather than simply renew the data. The changes includes:
So right now, if we want a new dataset with a full year data of 2022, the export will be quite time consuming since we don't have yearly data directly, we need to calculate yearly data first for 40 millions nodes and 75 millions relationships. And if we want to add the network dynamic evolving data, we should export the network month by month. WDYT. |
And if we want to demonstrate network dynamic evolvement, we need to handle the continuous 3d force layout which is also a quite difficult problem. |
Thanks for your explanation, @frank-zsy. I'm wondering if we set a high threshold so only a small part of repos and users are exported, then will the cost of computing the repo-repo relation network(by month) be affordable?
This feature may be a second priority. |
So you still want a repo-repo relationship network? How about a repo-user activity network? We don't really a lot of data to show for users, that is true. |
Do you think OpenGalaxy is a possible way to present Repo OpenRank details? |
Of course it can, but it is another model and need some refactor to fit the data. |
Of course, the tech domain is a field in the data, type can be also a field. |
Can you use repo OpenRank details data to generate an evolving network in OpenGalaxy? I think it is also hard here. |
I can with an ECharts force-layout graph, based on the demo you wrote. However as you pointed out, as the force layout algorithm underpinning the graph layout, animation for evolution has little meaning(visit this codepen): 2023-02-21.15.52.49.movSimilarly, for OpenDigger with a force layout algorithm underpinning it, the animation(if we could implement it) should be meaningless too. The idea that using OpenGalaxy to present OpenRank details for repos comes in my mind because I think node enhancement can be applied to the graph. So when we hover or click on a node we can review more details. I didn't think about the evolution animation in fact. |
The set data implementation above does not provide a smooth change process, although I think the data export and visualization are independent, I can export the data first and try to give a continuous 3d layout. How to use the data to generate smooth network evolving animation could be a future task. |
I will try to export a static repo-user activity network for 202301 and upload to oss.x-lab.info/open_galaxy/v2/ , there will be no You can check out the data to see if it is good enough to present. The node count will be 94,789 and the edges count will be 133,960. |
I have uploaded the data, I can not tell if it is correct or not under render. The size or PageRank field has changed from |
Thank you! I will try it later. |
I think the |
It seems that the size of a node depends on its degrees according to the codebase: open-galaxy/src/galaxy/native/renderer.js Lines 202 to 215 in 2e0443a
The biggest problem in the new graph in my view is the spaces between nodes. |
Not really, from the code you can see that the graph load process is meta -> position -> link -> label(node). open-galaxy/src/galaxy/service/graphLoader.js Lines 47 to 51 in 2e0443a
In the default implementation, the size of the nodes are determined by the degree which means the code you referred will be used to calculated the size. But since we need to change the size due to OpenRank value, I add the code below to re-calculate the node size and color after label data loaded. open-galaxy/src/galaxy/native/renderer.js Lines 152 to 174 in 2e0443a
Since |
Thank you for pointing that @frank-zsy! I just forgot the original code was also adpated for our needs. Now the graph looks much better (blue is repo, yellow is user/bot): OG-newdata.movOverview: I'm going to create a PR and |
Great, you can also open an issue in OpenDigger and I will add a cron task to OpenDigger to generate data for OpenGalaxy. |
Anyhow, we now can depend on data of 2023-01 :) closed via X-lab2017/open-digger#1208 |
As #32 mentioned we want this repo back to life, so we should also regenerate the whole GitHub repo network for the purpose. The current one is too old, I think.
Hi @frank-zsy, can you help on this?
The text was updated successfully, but these errors were encountered: