Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: sconcat and stimes. #580

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kindaro
Copy link
Contributor

@kindaro kindaro commented Apr 12, 2024

Resolve #288.

There are two commits here.

  • Commit № 1 adds two new benchmarks in the Pure section: one for sconcat and one for stimes.
  • Commit № 2 adds specialized implementation of sconcat and stimes to the instance of Semigroup for strict Text.

The benchmarks can be run like so:

first_commit_hash='180645e'
pattern='$2 == "Pure" && ($4 == "sconcat" || $4 == "stimes") && $5 != "LazyText"'
git checkout "$first_commit_hash" &&
    cabal run text-benchmarks -- --pattern "$pattern" --timeout 10s --csv benchmarks.csv
git switch feature-sconcat-stimes &&
    cabal run text-benchmarks -- --pattern "$pattern" --timeout 10s --baseline benchmarks.csv --fail-if-slower 110

This will take a few minutes. You should see better times almost everywhere — only the tiny.sconcat will show worse times.

This is the report as seen on my machine:

All
  Pure
    tiny
      sconcat
        Text: FAIL
          38.6 ns ± 2.3 ns, 148% more than baseline
          Use -p '(($2=="Pure"&&($4=="sconcat"||$4=="stimes"))&&$5!="LazyText")&&/tiny.sconcat.Text/' to rerun this test only.
      stimes
        Text: OK
          70.7 ns ± 2.9 ns, 83% less than baseline
    ascii-small
      sconcat
        Text: OK
          18.6 μs ± 148 ns, 99% less than baseline
      stimes
        Text: OK
          125  μs ±  10 μs, 59% less than baseline
    ascii
      sconcat
        Text: OK
          28.7 ms ± 2.4 ms
      stimes
        Text: OK
          63.3 ms ± 5.6 ms, 78% less than baseline
    english
      sconcat
        Text: OK
          1.07 ms ± 7.4 μs
      stimes
        Text: OK
          4.40 ms ± 373 μs, 69% less than baseline
    russian
      sconcat
        Text: OK
          2.45 μs ± 224 ns, 95% less than baseline
      stimes
        Text: OK
          15.9 μs ± 736 ns, 62% less than baseline
    japanese
      sconcat
        Text: OK
          4.17 μs ± 175 ns, 97% less than baseline
      stimes
        Text: OK
          15.7 μs ± 1.4 μs, 63% less than baseline

@kindaro
Copy link
Contributor Author

kindaro commented Apr 12, 2024

Tada, all checks have passed!

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing benchmarks!

src/Data/Text.hs Outdated
@@ -361,6 +361,8 @@ instance Read Text where
-- | @since 1.2.2.0
instance Semigroup Text where
(<>) = append
stimes = replicate . P.fromIntegral
Copy link
Contributor

@Bodigrim Bodigrim Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat cautious about potential overflow in fromIntegral. Let's throw an error if Text is non-empty and n does not fit into Int, same as base does for ByteArray: https://hackage.haskell.org/package/base-4.19.1.0/docs/src/Data.Array.Byte.html#stimesPolymorphic

(We do not need to check that len * n fits into Int, because this is validated by Data.Text.replicate itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow why we should evaluate to the empty text when given a negative number but evaluate to an error when given a number that is bigger than maxBound ∷ Int, but your wish is my command.

Copy link
Contributor Author

@kindaro kindaro Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we are not going to check the input in the same way as base because we should stay abstract. The way integers are represented is an implementation detail that had changed in the past and may change in the future. Supporting all versions of GHC we test with will need a lot of CPP without significant improvement in performance. So, I shall do an equality check instead of matching on the constructors of Integer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow why we should evaluate to the empty text when given a negative number but evaluate to an error when given a number that is bigger than maxBound ∷ Int, but your wish is my command.

Ah, I did not remember this peculiarity of replicate.

I'd be in favor of stimes throwing an error on negative arguments, even if replicate does not mind to swallow it silently. @Lysxia how do you feel about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behaviour on lists:

ghci> replicate (-1) [1..10]
[]
ghci> import Data.Semigroup
ghci> stimes (-1) [1.. 10]
*** Exception: stimes: [], negative multiplier

So, stimes is not defined even while replicate is defined on lists and negative numbers. We can do the same. Let me patch it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bodigrim   So why would you be in favour of stimes throwing an error on negative arguments?

@Lysxia   So why does this sound good to you?

I should like to have the underlying reasoning recorded for posterity.

Copy link
Contributor Author

@kindaro kindaro Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is a shocking revelation — the property checks of text overall compare the behaviour of functions on Text to their correspondents on String, but the check for stimes is missing both for strict and lazy Text. It would be consistent to add such a check for stimes for both strict and lazy Text and adjust the definitions as needed for it to pass. @Bodigrim Should I add a check for lazy Text and make it pass, or should I only add a check for strict Text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Asking to "replicate a string n times" with negative n is nonsense. There must be an error in the definition of n, so throwing an exception lets users be aware of that error and fix it.
  2. The default definition of stimes already throws an exception for n <= 0. People haven't complained about it. Extending the definition for n = 0 is reasonable for a monoid.
  3. There could still be a case made in favor of making stimes less partial and more similar to replicate (I think "replicate a string n times" is nonsense as a sentence in natural language, but I don't have a strong argument that code must follow natural language). Until someone makes a good case for extending stimes, throwing an exception for negative arguments is forward-compatible: we can extend the function later (it would only break code that catches the exception, a fishy thing to do). If we made it total now and changed our minds later, that would be a more breaking change.

You can add the stimes test for strict Text now and add lazy stimes in another PR (small PR = good!)

@kindaro kindaro force-pushed the feature-sconcat-stimes branch from 2a4b0fc to 9deed15 Compare April 13, 2024 09:14
@kindaro kindaro force-pushed the feature-sconcat-stimes branch from b653fe4 to b425505 Compare April 13, 2024 09:41
src/Data/Text.hs Outdated Show resolved Hide resolved
kindaro added 2 commits April 16, 2024 15:13
The constructors of `Integer` and the module they are exported from all
changed between GHC 8 and 9.
@kindaro kindaro force-pushed the feature-sconcat-stimes branch from b425505 to c314ce2 Compare April 16, 2024 08:14
Comment on lines +365 to +366
-- | Beware: this function will evaluate to error if the given number does
-- not fit into an @Int@.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly it turns out Haddock does not see comments to instance methods. I looked at the documentation generated by cabal haddock — this comment is not rendered. This also seems to be confirmed on the Internet.

I am going to move this comment to the instance.

@kindaro kindaro marked this pull request as draft April 17, 2024 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stimes implementations
3 participants