-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try to improve performance of isomorphism checks and pattern matching #28
Comments
I have a new kind of pattern matching, that kind of resembles my CSP-based isomorphism check algorithm. It is running pretty fast - at least in comparison to my old algorithm and the current tests I'm using. Also it does now work correctly as it seems, even for the stranger ones of the negative-node-bearing rules I just came across. I'll try to implement all those RoadWorkExample patterns using it, and I hope it won't cause any major bugs again, like the ones I just fixed.
now the pattern matching works like the csp-based isomorphism check algorithm and seems to be a lot faster. |
since now, both the probably best isomorphism check and my only pattern matching algorithm are based around something that resembles solving a Contraint Satisfaction Problem, I could use heuristics for that, to improve performance. there's:
how well are those applicable?
A feasable combination, as teached in the A.I. lecture, would be to first use (1), then choose between those with the same number of candidates using (2). Possibly even selecting with (3). But the way my algorithm works, I cant just change the order of my checks, or go ahead and skip several ones. Without changing anything else pretty much, I could select the first node with (1) and (2) and order the matches for all nodes by (3) - but only regarding the initial situation, before any chosen candidates are considered. At least my code would look more academic, if I tried to do all this - but I have to just try it out, to see if it really helps at all.
(intuitively I think, for checks that mostly fail, this will backfire - because the additional calculations will only delay the fail after complete checking. but for successful matches, this will probably improve performance notably - so the ratio of average success or fail per check is important for the potential speedup) |
I implemented all sorts of heuristics, but it's still pretty improvised code with bad performance and all kind of unnecessary calculations. One of these versions runs a bit faster, than my other handlers. For example the previously fastest handler for the roadwork example took 7 minutes on my pc, while one of these new handlers takes 5 minutes - despite the kind of bad code it uses. I'm sure, that it wouldn't take much effort to do a proper implementation of this concept of a 'true' CSP-oriented isomorphism check, that uses all performant heuristics it can utilize, whenever possible. |
it wasn't really working, I disabled it for now by making the class abstract. later I'll probably implement it the way it was intended to, potentially being my best IsomorphismHandler when it's done.
It did already complete within under 4 minutes. But there's still room for improvement... |
this covers both the sort-based isomorphism checks themselves and the sort-based reachability graph calculation, too - making this variant the most effective one for the full-size RoadworkExample (and every bigger reachability graph)
Quick update: Additionally I adjusted the building of reachability graphs with my 'normal form'/'sort'-based algorithm to make better use of the normal form, and I also increased performance of the sorting itself. For the roadwork example this drastically improved the performance from the slowest or one of the slowest algorithms to the best one. Now it takes 106471.3 ms for the 'normal form'-based reachability graph, while SDMLib takes 75162.6 ms - so It's just taking 40% longer for YAGE now, instead of 210% longer (which was with another - csp-based - algorithm, that was performing best back then) like claimed in my bachelor thesis. SDMLib is now just 30% quicker than YAGE for this example - with bigger examples, YAGE would probably perform even better - potentially beating SDMlib because of the O(n) complexity of each comparisons within the normalized reachability graph. Of course building the normalized graphs takes some effort, too - but it is way better, than doing full-blown isomorphic checks for each pair of graphs. |
Another test now showed, that SDMLib actually does scale pretty well. For a bigger map of the RoadworkExample - driving SDMLib almost to an OutOfMemory error - YAGE took not 40% but 70% longer than SDMLib. The reason probably is, that SDMLib uses hashes to eliminate most unnecessary isomorphism checks - which is similar to the concept of the sort-based algorithm, but more performant. While YAGE is able to avoid classic isomorphism checks alltogether, in this example it didn't bring the better scaling in performance that could be expected. For examples though, where SDMLib's hashing wouldn't help that much, you could probably see the effect. One way to keep increasing performance, would probably be, to introduce something like SDMLib's hashing in YAGE, too. For the Strings used in sort-based reachability graphs, one could simply hash the Strings for a better performance - a general solution though, would be to hash as much information of the graph, as is easily accessible, that would have to be different for each non-isomorphic graph. For example the number of nodes, the number of edges or the sum of the hashvalue of data from each edge - and the same for attributes. I suspect SDMLib to use a similar approach. If all this information should be extracted, this shouldn't happen too often. For example this should happen for each result-graph within building the reachability graph. the hash could then be used to eliminate most unnecessary isomorphism checks and be stored for further comparison, if the graph turns out to be new. This then would potentially help all implementations of isomorphism checks, not just the one for the sort-based one. Though the sort-based reachability graph building works different and should probably use its own kind of hashing - just with the Strings.
|
Okay, after introducing hashing of graphs within the reachability graph, YAGE now has the same runtime as SDMLib for the original RoadworkExample (using a csp-based algorithm again). SDMLib: 74907 ms; YAGE: 76004 ms. Ok, YAGE took 1.1 seconds longer, but thats just 1.5% longer runtime. edit: another csp-based algorithm - using more heuristics - even is faster than SDMLib for the original RoadworkExample (YAGE: 64512 ms; SDMLib: 76004 ms). So SDMLib takes 18% longer runtime for this test (while in comparision to using no heuristics, I managed to eliminate a third of the computational effort with the pure usage of basic csp heuristics - without them, I wouldn't be able to beat SDMLib here). More tests coming up. |
The average of five tests each shows 64141 ms for the fastest YAGE algorithm (csp with high heuristics usage) for the RoadworkExample and 74784 ms for SDMLib, which is a slight 16%-17% slower for this test. YAGE's csp-based algorithm with low heuristics usage scores 73229 ms. Without heuristics YAGE's depth-first backtracking takes 96041 ms. YAGE's sort-based/normal form algorithm takes 106591 ms. The combinatorial algorithm and a badly implemented new parallel algorithm though take many times longer than all previously mentioned algorithms. It seems that the algorithms that perform well - or even are faster than SDMLib - are scaling worse than SDMLib though. Tests for the current implementation with a bigger map for the RoadworkExample are running for the next few hours. edit: (test with a bigger map for the RoadworkExample almost causing a OutOfMemory error in SDMLib)
the original example:
the minimal example:
It already looks like YAGE is not bad, but has an issue with scaling. I have to look into this... |
I now started to not check isomorphisms by calculating homomorphisms first and checking if they work backwards, but directly checking isomorphisms themselves in all depth-first backtracking type algorithms. this drasticly increases the efficiency of the backtracking algorithms and most notably their heuristics, that kind of got sabotaged by the old technique. So here the new performance test runs - again with the RoadworkExample: Bigger map with 13 tracks:
Original map with 11 tracks:
Minimal map with 5 tracks:
Finally one algorithm always is the best - and better than SDMlib even with the big examples. But still, SDMLib seems to scale better, so for much bigger examples on machines with the proper RAM and so on - SDMLib would probably beat YAGE. On the other hand: maybe the scalability of YAGE could be improved. I have to look deeper into that... So the newest improvements mostly helped the csp high heuristics usage algorithm (compared to last version):
|
try to improve performance of isomorphism checks and pattern matching!
The text was updated successfully, but these errors were encountered: