Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle integer overflow in for loops #60

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Handle integer overflow in for loops #60

wants to merge 6 commits into from

Conversation

hishamhm
Copy link
Member

Test case for issue #59.

@hishamhm
Copy link
Member Author

(Note: the above test-case is 64-bit specific, I think it's the first one. We never discussed what's the plan for Titan in 32-bit machines, but we're using lua_Number and lua_Integer for now. I think we should just keep using that to keep assumptions at a minimum but not dive too much in the Lua number support matrix at this point, since we already have a lot on our plate — we can rework tests etc to 32-bit when we see the actual need (i.e. users needing those platforms).)

@codecov-io
Copy link

codecov-io commented Nov 14, 2017

Codecov Report

Merging #60 into master will decrease coverage by 0.54%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #60      +/-   ##
==========================================
- Coverage   91.31%   90.76%   -0.55%     
==========================================
  Files           9        9              
  Lines        1554     1559       +5     
==========================================
- Hits         1419     1415       -4     
- Misses        135      144       +9
Impacted Files Coverage Δ
titan-compiler/coder.lua 90.46% <93.33%> (-0.05%) ⬇️
titan-compiler/checker.lua 86.52% <0%> (-2.08%) ⬇️
titan-compiler/ast.lua 92.95% <0%> (-0.2%) ⬇️
titan-compiler/parser.lua 93.45% <0%> (-0.13%) ⬇️
titan-compiler/util.lua 94.87% <0%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a6ca63...530a9ae. Read the comment docs.

@mascarenhas
Copy link
Member

Please squash the for commits into a single one when merging! Three of them were just to get everything passing on Travis.

@hishamhm
Copy link
Member Author

This solution is nice, but its consequences are far-reaching:

  • the generated code is no longer 100% portable
  • even for GCC it requires a specific version

Also, the change in semantics is not to be taken lightly, but opening up this issue I was aware of that. If we adopt this as an optimization, is it still possble to do a portable fallback?

In more concrete terms, would changing the semantics of

for i = x, y, step do
   k(i)
end

from

do -- simplified for positive step only
   local i = x
   while i < y do
      k(i)
      i = i + step
   end
end

to

do -- simplified for positive step only
   local i = x
   if i <= y then
      while true do
         k(i)
         if y - step < i then
            break
         end
         i = i + step
      end
   end
end

be sufficient, or are there further corner cases?

@mascarenhas
Copy link
Member

mascarenhas commented Nov 14, 2017

The semantics is actually to do i = i + step with an overflow check, then test for the result of this check along with the regular i <= y test. A simple for i = <start>, <limit>, <step> with a positive step goes from:

do
  local i = <start>
  local _limit = <limit>
  local _step = <step>
  while i <= _step do
     <body>
     i = i + _step
  end
end

to:

do
  local i = <start>
  local _limit = <limit>
  local _step = <step>
  local _overflow = false
  while (not _overflow) and (i <= _step) do
     <body>
     i, _overflow = checked_add(i, _step)
  end
end

where checked_add does the addition and also returns whether overflow occurred or not. This cover all the corner cases around overflow/underflow, while the alternative semantics you gave will break when y - step underflows due to a very small y or a very big step.

@hugomg
Copy link
Contributor

hugomg commented Nov 14, 2017

Another big consequence is that Titan for loops would have a different semantics than Lua for loops.

With respect to portability, it would also be harder to create a version of Titan that compiles down to Lua, if we ever wish do do something like that.

@mascarenhas
Copy link
Member

mascarenhas commented Nov 14, 2017

We only need a portable fallback for __builtin_add_overflow (introduced in GCC 5), which works like the checked_add that I used above. There are several portable options at https://blog.regehr.org/archives/1139, and MSVC also has similar operations like https://msdn.microsoft.com/en-us/library/windows/desktop/hh707046(v=vs.85).aspx.

@mascarenhas
Copy link
Member

Someone in the lua-l list seems to have found an alternative solution that we may check and compare with ours, although I suspect the overflow check will be cheaper, even if done in a fully portable way (with compiler intrinsics overflow checking is certainly faster). But the ongoing discussion also brought to attention the fact that the numeric for loop for floating point intervals has some nasty edge cases, too.

My gut tells me to simply restrict the numeric for loop to integers; getting a for loop to have nice closed interval semantics for any values assigned to start/finish/step is hopeless, I think their semantics can only be really understood operationally, as loop that does successive floating point additions and comparisons, so we might as well require the user to just write a while loop in that case... there are some bandaid fixes such as checking if current+step == current and raising a lack of precision error, but they do not fix the underlying problem.

@hishamhm
Copy link
Member Author

hishamhm commented Nov 17, 2017

My gut tells me to simply restrict the numeric for loop to integers;

Having had used for loops with decimal steps many times over the years (in Lua and other languages) I think this is not something we can afford to discard.

getting a for loop to have nice closed interval semantics for any values assigned to start/finish/step is hopeless, I think their semantics can only be really understood operationally

The alternatives become:

  1. the simple thing to do is to set this issue aside and just live with Lua semantics;
  2. do what Dirk Laurie suggests in lua-l and have two separate implementations of FP and integer for, each accounting for its specificities (but if FP for does the MATLAB thing and uses an epsilon, I'd expect FP == to use it as well — this is something I would find very appealing for a scripting language aimed at domain specialists... but that's not the language we are designing ;) ).
  3. do what this patch does, and do the "right thing" for integers and just accept that FP numbers are weird

Pragmatically speaking, option 1 sounds the way to go for the time being.

@mascarenhas
Copy link
Member

Pragmatically speaking, option 1 sounds the way to go for the time being.

Yes, that would be my choice too... it would be surprising to have nice closed interval semantics for integers, but while loop semantics for floating point, I think most people do not expect an innocent-looking for <id>=<start>,<end>,<step> do loop to be overloaded in this way.

do what Dirk Laurie suggests in lua-l and have two separate implementations of FP and integer for, each accounting for its specificities (but if FP for does the MATLAB thing and uses an epsilon, I'd expect FP == to use it as well — this is something I would find very appealing for a scripting language aimed at domain specialists... but that's not the language we are designing ;) ).

In Pyret comparing two floating point numbers ("roughnums") for equality is an error, you have two use one of a set of specialized operators where you have to explicitly supply the tolerances.

@hishamhm
Copy link
Member Author

In Pyret comparing two floating point numbers ("roughnums") for equality is an error, you have two use one of a set of specialized operators where you have to explicitly supply the tolerances.

A fine choice for a language aimed at teaching, as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants