More rule tweaks #1567

TiagoCavalcante · 2021-08-25T23:32:59Z

No description provided.

bench.py

rocky · 2021-08-30T13:04:27Z

@TiagoCavalcante Looks good - thanks. But before we merge would you do a benchmark on mathics-benchmarks on this?

I suspect the time difference will be small, but I'd be interested all the same.

TiagoCavalcante · 2021-08-30T13:31:08Z

The results were very weird.

Before:

10000 iterations of ElementData...
  2.436314 secs for: ElementData["Gold", "AtomicNumber"]     

10000 iterations of NextPrime...
  3.416719 secs for: NextPrime[1, 1]

After:

10000 iterations of ElementData...
  2.569624 secs for: ElementData["Gold", "AtomicNumber"]

10000 iterations of NextPrime...
  3.259799 secs for: NextPrime[1, 1]

Shouldn't _String be faster than _?StringQ?

rocky · 2021-08-30T14:50:10Z

Shouldn't _String be faster than _?StringQ?

Why not write some benchmarks to check this or try to isolate this?

I am guessing that one can write Mathics definitions with the two kinds of type checking.

And also we can change the Python code - maybe add another builtin temporarily testing the alternative way on simple primitives (and then more complex ones...)

I am not surprised though that we are surprised. If we knew this stuff perfectly we probably would be having fewer problems.

mmatera · 2021-08-30T22:35:06Z

For a short string:

In[2]:= Timing[Do[MatchQ["ww",_String],{10000}]]
Out[2]= {1.67904, Null}

In[3]:= Timing[Do[MatchQ["ww",_?StringQ],{10000}]]
Out[3]= {3.25842, Null}

For the same string stored in a variable:

In[19]:= Timing[Do[MatchQ[s,_String],{10000}]]
Out[19]= {1.70189, Null}

In[20]:= Timing[Do[MatchQ[s,_?StringQ],{10000}]]
Out[20]= {3.39103, Null}

For a plane symbol:

In[7]:= Timing[Do[MatchQ[L,_String],{10000}]]
Out[7]= {1.56889, Null}

In[8]:= Timing[Do[MatchQ[L,_?StringQ],{10000}]]
Out[8]= {3.33777, Null}

For real numbers:

In[12]:= Timing[Do[MatchQ[27.321212,_String],{10000}]]
Out[12]= {1.59748, Null}

In[13]:= Timing[Do[MatchQ[27.321212,_?StringQ],{10000}]]
Out[13]= {3.25175, Null}

For a "complicate" expression

In[25]:= Timing[Do[MatchQ[Sqrt[1+F[G[t]]],_String],{10000}]]
Out[25]= {18.0476, Null}

In[26]:= Timing[Do[MatchQ[Sqrt[1+F[G[t]]],_?StringQ],{10000}]]
Out[26]= {21.1116, Null}

In[27]:= Timing[Do[Sqrt[1+F[G[t]]],{10000}]]
Out[27]= {16.0676, Null}

mmatera · 2021-08-30T22:45:05Z

mathics/builtin/numbers/numbertheory.py

@@ -487,7 +487,7 @@ class NextPrime(Builtin):
    }

    def apply(self, n, k, evaluation):


Try this instead:

def apply(self, n, k, evaluation): "NextPrime[n_, k_Integer]" py_k = k.to_python(n_evaluation=evaluation) py_n = n.to_python(n_evaluation=evaluation) if type(py_k) is not int or type(py_n): return

This should do the same trick, without the overhead in the pattern matching.

mmatera · 2021-08-30T22:47:18Z

mathics/builtin/physchemdata.py

@@ -123,7 +123,7 @@ def apply_all_properties(self, evaluation):
        return from_python(sorted(_ELEMENT_DATA[0]))

    def apply_name(self, name, prop, evaluation):
-        "ElementData[name_?StringQ, prop_]"
+        "ElementData[name_String, prop_]"


Same thing:

def apply_name(self, name, prop, evaluation): "ElementData[name_, prop_]" if not isinstance(name, String): return

I like pattern matching, but if performance doesn't, we should obey it.

@mmatera I'm thinking, why does pattern matching have a so big overhead? Doesn't it just test against a type and get a specific function for that type? If yes, why the overhead is so big?

I repaired that there isn't System`String in quick_pattern_test.

@mmatera I'm thinking, why does pattern matching have a so big overhead? Doesn't it just test against a type and get a specific function for that type? If yes, why the overhead is so big?

It is not specifically pattern matching although that may have its own specific slowness. It is any function call of which pattern matching is one kind.

This is the kind of thing we need to study and improve. Specific details and data can be found in #1568 (comment) We need to study that and try things to fix it.

mmatera · 2021-08-30T23:21:32Z

This could be also useful:

In[29]:= F[s_String,n_]:=Do[If[MatchQ[s,_String],1,0],{n}]
Out[29]:= None

In[30]:= G[s_,n_]:=Do[If[MatchQ[s,_String],1,0],{n}]
Out[30]:= None

In[31]:= H[s_?StringQ,n_]:=Do[If[MatchQ[s,_String],1,0],{n}]
Out[31]:= None

In[32]:= {Timing[Do[F["sss",0],{10000}]],Timing[Do[G["sss",0],{10000}]],Timing[Do[H["sss",0],{10000}]]}
Out[32]:= {{4.82359, Null}, {4.52345, Null}, {6.85825, Null}}

In[33]:= {Timing[Do[F["sss",1],{10000}]],Timing[Do[G["sss",1],{10000}]],Timing[Do[H["sss",1],{10000}]]}
Out[33]:= {{7.84266, Null}, {7.82672, Null}, {10.1756, Null}}

In[34]:= {Timing[Do[F["sss",10],{10000}]],Timing[Do[G["sss",10],{10000}]],Timing[Do[H["sss",10],{10000}]]}
Out[34]:= {{36.467, Null}, {36.4047, Null}, {43.4621, Null}}

In[35]:= {Timing[Do[F[2,0],{10000}]],Timing[Do[G[2,0],{10000}]],Timing[Do[H[2,0],{10000}]]}
Out[35]:= {{0.694378, Null}, {4.91281, Null}, {1.22358, Null}}

TiagoCavalcante · 2021-08-31T01:06:57Z

Thanks for the benchmarks @mmatera! I hadn't time to do them, and probably they won't be as complete as yours.

…/Mathics into quickpatterntest

rocky · 2021-08-31T02:48:29Z

For a short string:

In[2]:= Timing[Do[MatchQ["ww",_String],{10000}]]
Out[2]= {1.67904, Null}

In[3]:= Timing[Do[MatchQ["ww",_?StringQ],{10000}]]
Out[3]= {3.25842, Null}

For the same string stored in a variable:

In[19]:= Timing[Do[MatchQ[s,_String],{10000}]]
Out[19]= {1.70189, Null}

In[20]:= Timing[Do[MatchQ[s,_?StringQ],{10000}]]
Out[20]= {3.39103, Null}

For a plane symbol:

In[7]:= Timing[Do[MatchQ[L,_String],{10000}]]
Out[7]= {1.56889, Null}

In[8]:= Timing[Do[MatchQ[L,_?StringQ],{10000}]]
Out[8]= {3.33777, Null}

For real numbers:

In[12]:= Timing[Do[MatchQ[27.321212,_String],{10000}]]
Out[12]= {1.59748, Null}

In[13]:= Timing[Do[MatchQ[27.321212,_?StringQ],{10000}]]
Out[13]= {3.25175, Null}

For a "complicate" expression

In[25]:= Timing[Do[MatchQ[Sqrt[1+F[G[t]]],_String],{10000}]]
Out[25]= {18.0476, Null}

In[26]:= Timing[Do[MatchQ[Sqrt[1+F[G[t]]],_?StringQ],{10000}]]
Out[26]= {21.1116, Null}

In[27]:= Timing[Do[Sqrt[1+F[G[t]]],{10000}]]
Out[27]= {16.0676, Null}

My take is on this data is that basically the overhead difference is about the same for everything. When you get to expressions though the overhead in function calls swamps differences there are in pattern matching - this is taking 10 times longer and other tests show that this is not in computation time.

So again we need to focus on function calls.

mmatera · 2021-08-31T03:00:37Z

So again we need to focus on function calls.

The problem here is that, from the WL side, function calls are just pattern matching, and pattern matching is expensive because it involves many Python function calls, which at the time are very expensive. So, anything we can do to reduce the number of Python function calls will improve the performance.

rocky · 2021-08-31T06:23:56Z

So again we need to focus on function calls.

The problem here is that, from the WL side, function calls are just pattern matching, and pattern matching is expensive because it involves many Python function calls, which at the time are very expensive. So, anything we can do to reduce the number of Python function calls will improve the performance.

It is a problem, but not the only problem and I am concerned about confusing the pattern matching and Mathics function calls.

A Mathics function call involves not just potentially several pattern-match signatures until one that succeeds, (or all signatures exhausted), but there are iteration and recursion checks, and checks to see if there was a rewrite rule instead of a function call, and if so the process repeats. And once a rule matches, there is argument substitution before the call.

I used to think (and still do) that Python method calls were expensive because methods need to constantly be resolved as a result of its OO behavior and ability to change that at run time. However Mathics can be an order of magnitude worse.

One plan of attack at least for built-in functions and then later possibly for compiled functions is to hold on to the found pattern-to-apply-method map for a particular call site and use that avoiding the entire lookup process.

As the trend has now been to do the checking in Python, we will also detect mismatches when there are mistakes.

Take for example this code that is in a recent PR:

Expression(SymbolGreater, x, Integer0).evaluate(evaluation)

which is found in the implementation of built-in NonPositive[]. Furthermore, we know that to compute this expression no rule rewriting goes on. So the apply() method on a class that gets called here will always be the same. (Probably it is _ComparisonOperator#apply). Therefore the evaluation() method can be replaced with another method (in a refactor branch) - I have been calling it evaluate_fast(), that returns the found pattern-to-method mapping in addition to performing the evaluation .

With this, something the code for NonPositive might look like:

class NonPositive(Builtin):
    greater_than_zero_mapping = None

    ...
    def apply... 
        self.greater_than_zero_mapping = Expression(SymbolGreater, x Integer0).evaluate_fast(self.greater_than_zero_mapping, evaluation)

where evaluate_fast() skips what is done in evaluation() and instead calls evaluate_next() which I intend to rename evaluate_one().

rocky · 2021-08-31T16:31:50Z

The new benchmark:

Mathics git repo /workspaces/mathics-benchmark/Mathics at cfe55d
10000 iterations of ElementData...
  2.515098 secs for: ElementData["Gold", "AtomicNumber"]     

10000 iterations of NextPrime...s of 
  3.650676 secs for: NextPrime[1, 1]                         

Mathics git repo /workspaces/mathics-benchmark/Mathics at cfe55d
10000 iterations of ElementData...
  2.433841 secs for: ElementData["Gold", "AtomicNumber"]     

10000 iterations of NextPrime...
  3.345542 secs for: NextPrime[1, 1]

Why are the two sets of SHA's the same?

TiagoCavalcante · 2021-08-31T17:19:01Z

My mistake, I forgot to checkout to master before running the tests. I'll add this to mathics-benchmark and re-run the tests latter. And run black too.

TiagoCavalcante · 2021-08-31T23:38:40Z

New benchmarks:

Mathics git repo /workspaces/mathics-benchmark/Mathics at 58dc69
10000 iterations of StringQ...
  2.595370 secs for: ElementData["Gold", "AtomicNumber"]     

10000 iterations of NextPrime...
  3.167263 secs for: NextPrime[1, 1]                         

Mathics git repo /workspaces/mathics-benchmark/Mathics at cfe55d
10000 iterations of ElementData...
  2.429380 secs for: ElementData["Gold", "AtomicNumber"]     

10000 iterations of NextPrime...
  3.201453 secs for: NextPrime[1, 1]

rocky

Merging in so we can test...

rocky reviewed Aug 25, 2021

View reviewed changes

bench.py Outdated Show resolved Hide resolved

More rule tweaks

50f71bb

mmatera reviewed Aug 30, 2021

View reviewed changes

Merge branch 'quickpatterntest' of https://github.com/TiagoCavalcante…

3296516

…/Mathics into quickpatterntest

Improve quick_pattern_test readability

9ec55dc

Improve ElementData performance

7cfa655

rocky approved these changes Sep 6, 2021

View reviewed changes

rocky merged commit cb2a5b2 into mathics:quickpatterntest Sep 6, 2021

rocky mentioned this pull request Sep 6, 2021

Quickpatterntest #1577

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More rule tweaks #1567

More rule tweaks #1567

TiagoCavalcante commented Aug 25, 2021

rocky commented Aug 30, 2021

TiagoCavalcante commented Aug 30, 2021

rocky commented Aug 30, 2021 •

edited

Loading

mmatera commented Aug 30, 2021 •

edited

Loading

mmatera Aug 30, 2021 •

edited

Loading

mmatera Aug 30, 2021

TiagoCavalcante Aug 31, 2021

TiagoCavalcante Aug 31, 2021

TiagoCavalcante Aug 31, 2021

rocky Aug 31, 2021

mmatera commented Aug 30, 2021 •

edited

Loading

TiagoCavalcante commented Aug 31, 2021

rocky commented Aug 31, 2021 •

edited

Loading

mmatera commented Aug 31, 2021

rocky commented Aug 31, 2021 •

edited

Loading

rocky commented Aug 31, 2021

TiagoCavalcante commented Aug 31, 2021

TiagoCavalcante commented Aug 31, 2021

rocky left a comment

		@@ -487,7 +487,7 @@ class NextPrime(Builtin):
		}

		def apply(self, n, k, evaluation):

More rule tweaks #1567

More rule tweaks #1567

Conversation

TiagoCavalcante commented Aug 25, 2021

rocky commented Aug 30, 2021

TiagoCavalcante commented Aug 30, 2021

rocky commented Aug 30, 2021 • edited Loading

mmatera commented Aug 30, 2021 • edited Loading

mmatera Aug 30, 2021 • edited Loading

Choose a reason for hiding this comment

mmatera Aug 30, 2021

Choose a reason for hiding this comment

TiagoCavalcante Aug 31, 2021

Choose a reason for hiding this comment

TiagoCavalcante Aug 31, 2021

Choose a reason for hiding this comment

TiagoCavalcante Aug 31, 2021

Choose a reason for hiding this comment

rocky Aug 31, 2021

Choose a reason for hiding this comment

mmatera commented Aug 30, 2021 • edited Loading

TiagoCavalcante commented Aug 31, 2021

rocky commented Aug 31, 2021 • edited Loading

mmatera commented Aug 31, 2021

rocky commented Aug 31, 2021 • edited Loading

rocky commented Aug 31, 2021

TiagoCavalcante commented Aug 31, 2021

TiagoCavalcante commented Aug 31, 2021

rocky left a comment

Choose a reason for hiding this comment

rocky commented Aug 30, 2021 •

edited

Loading

mmatera commented Aug 30, 2021 •

edited

Loading

mmatera Aug 30, 2021 •

edited

Loading

mmatera commented Aug 30, 2021 •

edited

Loading

rocky commented Aug 31, 2021 •

edited

Loading

rocky commented Aug 31, 2021 •

edited

Loading