Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix isInfixOf and breakSubString in Word16 #199

Merged
merged 1 commit into from
Jul 13, 2023
Merged

Fix isInfixOf and breakSubString in Word16 #199

merged 1 commit into from
Jul 13, 2023

Conversation

hasufell
Copy link
Member

@hasufell hasufell commented Jul 2, 2023

@hasufell hasufell requested a review from Bodigrim July 2, 2023 12:29
@hasufell hasufell force-pushed the issues/195 branch 2 times, most recently from 2b9f111 to 98ba6a4 Compare July 2, 2023 16:03
Copy link
Member

@sol sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice.

System/OsPath/Data/ByteString/Short/Word16.hs Outdated Show resolved Hide resolved
System/OsPath/Data/ByteString/Short/Word16.hs Outdated Show resolved Hide resolved
Copy link
Member

@sol sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We repeatedly use Data.ByteString.Short.breakSubstring and Data.ByteString.Short.splitAt which both copy. See: #199 (comment)

I think we will get more predictable performance characteristics if we convert to a ByteString first, calculate the offset, and then do a final splitAt on the original input.

@Bodigrim any more thoughts on this?

System/OsPath/Data/ByteString/Short/Word16.hs Outdated Show resolved Hide resolved
System/OsPath/Data/ByteString/Short/Word16.hs Outdated Show resolved Hide resolved
System/OsPath/Data/ByteString/Short/Word16.hs Outdated Show resolved Hide resolved
@hasufell
Copy link
Member Author

hasufell commented Jul 4, 2023

We repeatedly use Data.ByteString.Short.breakSubstring and Data.ByteString.Short.splitAt which both copy. See: #199 (comment)

I think we will get more predictable performance characteristics if we convert to a ByteString first, calculate the offset, and then do a final splitAt on the original input.

@Bodigrim any more thoughts on this?

I don't think I agree.

The point of ShortByteString is that it doesn't use pinned memory and doesn't contribute to memory fragmentation. Even if the ByteString here will be temporary, I find it morally wrong to use it inside the ShortByteString module.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jul 9, 2023

As discussed in haskell/bytestring#307, I'm not convinced that the algorithm behind Data.ByteString.{,Short.}.breakSubstring is any better than memcmp.

Looking at the difficulties with appending/slicing of ShortByteString to avoid quadratic performance, I think running compareByteArrays# in a loop with even indices is the simplest and most robust solution here.

@hasufell
Copy link
Member Author

@Bodigrim done

@@ -441,3 +449,29 @@ errorEmptySBS fun = moduleError fun "empty ShortByteString"
moduleError :: HasCallStack => String -> String -> a
moduleError fun msg = error (moduleErrorMsg fun msg)
{-# NOINLINE moduleError #-}

compareByteArraysOff :: BA -- ^ array 1
Copy link
Member Author

@hasufell hasufell Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, bytestring doesn't expose this internal and we need it to to be compatible with GHC-8.2.2 and lower.

@hasufell hasufell merged commit b91dcf2 into master Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.OsPath.Data.ByteString.Short.Word16.isInfixOf looks dodgy
3 participants