-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about calculating support by considering pattern occurrences inside each graph #4
Comments
The feature you requested is not available now in this repo, but you may try the following code to achieve your goal. def _get_support(self, projected):
return len(projected) |
Thank you very much for the reply. I tested the code you suggested on dataset As for the case I mentioned, a 2 times larger result, after further inspections, I found that this only happens in one-edge subgraphs and only when this certain subgraph has two vertexes sharing the same label. For example, one occurrence of the result pattern below will be counted twice in undirected graph mode. However, in directed graph mode, this pattern will only be counted once. So I think the code works still smoothly.
In addition, I'm sorry that I have a question about how your test datasets are generated because I'd like to conduct more experiments. Did you follow the rules instructed in the Synthetic Datasets section of gSpan: Graph-Based Substructure Pattern Mining, by X. Yan and J. Han. Proc. 2002 of Int. Conf. on Data Mining (ICDM'02). , or use other data generation tools? Finally, I notice that you didn't add a LICENSE to this project, so I wonder if I could use and adjust your code (with proper reference) as the mining process part of one of my MIT Licensed project? Thank you very much. |
Indeed, we cannot get a correct answer only by modifying You can adjust my code with reference. |
Thank you very much for the suggestion. I think I would do some further research on that problem. And thanks for the permission, but if it would not be a bother, could you provide me with some more information about the datasets you used? It would be helpful to be able to conduct more tests with more different datasets. Thank you very much. |
Please refer to Section 3.1 of http://glaros.dtc.umn.edu/gkhome/fetch/papers/fsgICDM01.pdf I don't have the code or tool to synthesize graph data now, but it is not difficult to write code to do that. |
Thank you very much for your time and replies, they have been very helpful, and I would like to do some study about that paper now. |
@w-zx do you fix the problem now , can you share your code? |
Hi, could you please tell me if you addressed the issue or if you still need support? |
Hi, this work is great and very helpful, but I notice that the policy to calculate the support of a certain pattern is to count the same pattern for only one valid time inside each graph.
For example, if a dataset contains 2 graphs: t # 0 and t # 1, a certain pattern occurs 3 times inside graph t # 0 and occurs 4 times inside t # 1, the result of mining will be this pattern with the support of 2, not 3+4=7, which is the situation I've been trying to do.
I looked through into the code and found in
gspan.py
line 314I think this function is used to calculate the support of each pattern, as
set
is used, only different graph(gid) will be counted, and the situation inside each graph is not considered.In order to achieve the goal I mentioned above, I changed
pdfs.gid
intopdfs.edge
, I suppose that by counting different edges, it will get the real support of each pattern.Now, this part of code looks like this:
However, after several tests on dataset
graph.data.simple.5
andgraph.data.5
, I compared the result of the algorithm with my counting result by hand, and found that the result by the algorithm is always 2 times larger than the real result(eg. 5 times by hand, but 10 times by algorithm), and this is the command I used:So I think it is not about directed or undirected graph, and I wonder if you could help me and tell me whether I adjusted the wrong code or whether this goal could be realized.
Thank you very much.
The text was updated successfully, but these errors were encountered: