Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API to convert directly between lazy and short bytestrings #619

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
[0.12.1.0] - TBD

* API additions:
* Data.ByteString.Short now provides `lazyToShort` and `lazyFromShort`.

[0.12.0.2] — August 2023

* Bug fixes:
Expand Down
2 changes: 2 additions & 0 deletions Data/ByteString/Short.hs
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ module Data.ByteString.Short (
unpack,
fromShort,
toShort,
lazyFromShort,
lazyToShort,

-- * Basic interface
snoc,
Expand Down
71 changes: 69 additions & 2 deletions Data/ByteString/Short/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ module Data.ByteString.Short.Internal (
unpack,
fromShort,
toShort,
lazyFromShort,
lazyToShort,

-- * Basic interface
snoc,
Expand Down Expand Up @@ -214,6 +216,8 @@ import GHC.Exts
, byteArrayContents#
, unsafeCoerce#
, copyMutableByteArray#
, resizeMutableByteArray#
, shrinkMutableByteArray#
#if MIN_VERSION_base(4,10,0)
, isByteArrayPinned#
, isTrue#
Expand Down Expand Up @@ -249,11 +253,11 @@ import GHC.Stack.Types
import GHC.Word
import Prelude
( Eq(..), Ord(..), Ordering(..), Read(..), Show(..)
, ($), ($!), error, (++), (.), (||)
, ($), ($!), error, fail, (++), (.), (||)
, String, userError
, Bool(..), (&&), otherwise
, (+), (-), fromIntegral
, (*)
, (*), quot
, (^)
, (<$>)
, return
Expand All @@ -263,6 +267,7 @@ import Prelude
)

import qualified Data.ByteString.Internal.Type as BS
import qualified Data.ByteString.Lazy.Internal as LBS

import qualified Data.List as List
import qualified GHC.Exts
Expand Down Expand Up @@ -482,6 +487,56 @@ toShortIO (BS fptr len) = do
touchForeignPtr fptr
ShortByteString <$> stToIO (unsafeFreezeByteArray mba)

-- | A simple wrapper around 'fromShort' that wraps the strict 'ByteString' as
-- a one-chunk lazy 'LBS.ByteString'.
lazyFromShort :: ShortByteString -> LBS.ByteString
lazyFromShort = LBS.fromStrict . fromShort

-- | /O(n)/. Convert a lazy 'LBS.ByteString' into a 'ShortByteString'.
--
-- This makes a copy, so does not retain the input string. Naturally, best
-- used only with sufficiently short lazy ByteStrings.
--
lazyToShort :: LBS.ByteString -> ShortByteString
lazyToShort LBS.Empty = empty
lazyToShort (LBS.Chunk c@(BS _ len) cs) =
clyring marked this conversation as resolved.
Show resolved Hide resolved
unsafeDupablePerformIO $ do
mba0 <- stToIO (newByteArray len)
mba <- lazyToShortIO mba0 0 len c cs
ShortByteString <$> stToIO (unsafeFreezeByteArray mba)

-- | Append first and any subsequent chunks of lazy bytestring to the provided
-- buffer, resizing it if required. The return value is the final buffer.
-- Arithmetic is done with care, to avoid undetected integer addition
-- wrap-around. While nobody should be copying multi-gigabyte lazy bytestrings
-- to "short" bytestrings, we should error out if possible, rather than corrupt
-- memory..
lazyToShortIO :: MutableByteArray RealWorld -- ^ result buffer
-> Int -- ^ space already used
-> Int -- ^ total space allocated
-> ByteString -- ^ initial strict chunk
-> LBS.ByteString -- ^ remaining chunks
-> IO (MutableByteArray RealWorld) -- ^ possibly resized result
lazyToShortIO mba used size c@(BS fptr len) cs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better not to pass size, but rather extract it every time with getSizeofMutableByteArray#? Admittedly this would require IO before any pattern-matching begins...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a purely internal helper function, we could even move it into a where clause, to really make sure there are no unexpected calls. So I don't think such caution is warranted, I was much more concerned with handling integer arithmetic overflow, which I believe is handled correctly.

-- Safe against overflow since used <= size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Safe against overflow since used <= size.
-- Safe against overflow since 0 <= used <= size.

...or we could just say let newUsed = checkedAdd "Short.lazyToShort" used len. (This does have the marginal downside of checking for overflow in the no-new-buffer case.)

Another option is to allow used+len to signed-overflow, but check intToWord newUsed <= intToWord size since it cannot unsigned-overflow.

| len <= size - used
= do let ptr = unsafeForeignPtrToPtr fptr
stToIO (copyAddrToByteArray ptr mba used len)
touchForeignPtr fptr
case cs of
LBS.Empty
-> stToIO (shrinkByteArray mba $ used + len) >> pure mba
LBS.Chunk c' cs'
-> lazyToShortIO mba (used + len) size c' cs'
-- Care to detect possible 'Int' arithmetic overflow.
| used < used + len
, let newsize = max (used + len) (size + size `quot` 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If (size + size `quot` 2) overflows and becomes negative, this will behave degenerately and cause O(n²) allocation for inputs that narrowly fit into the Int# size limit.
  • Signed (`div` 2) will compile to an arithmetic right shift; signed (`quot` 2) needs an extra shift and add/subtract to fix up negative odd inputs. (Or (`shiftR` 1) makes the performance question much easier.)

, newsize >= size
= do mba' <- stToIO (resizeByteArray mba newsize)
lazyToShortIO mba' used newsize c cs
| otherwise
= fail "Input lazy bytestring is too long"

-- | /O(n)/. Convert a 'ShortByteString' into a 'ByteString'.
--
fromShort :: ShortByteString -> ByteString
Expand Down Expand Up @@ -1614,6 +1669,18 @@ unsafeFreezeByteArray (MutableByteArray mba#) =
ST $ \s -> case unsafeFreezeByteArray# mba# s of
(# s', ba# #) -> (# s', ByteArray ba# #)

resizeByteArray :: MutableByteArray s -> Int -> ST s (MutableByteArray s)
resizeByteArray (MutableByteArray mba#) new@(I# new#) =
assert (new >= 0) $
ST $ \s -> case resizeMutableByteArray# mba# new# s of
(# s', mba'# #) -> (# s', MutableByteArray mba'# #)

shrinkByteArray :: MutableByteArray s -> Int -> ST s ()
shrinkByteArray (MutableByteArray mba#) new@(I# new#) = do
assert (new >= 0) $
ST $ \s -> case shrinkMutableByteArray# mba# new# s of
s' -> (# s', () #)

writeWord8Array :: MutableByteArray s -> Int -> Word8 -> ST s ()
writeWord8Array (MutableByteArray mba#) (I# i#) (W8# w#) =
ST $ \s -> case writeWord8Array# mba# i# w# s of
Expand Down
2 changes: 1 addition & 1 deletion bytestring.cabal
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Name: bytestring
Version: 0.12.0.2
Version: 0.12.1.0
Synopsis: Fast, compact, strict and lazy byte strings with a list interface
Description:
An efficient compact, immutable byte string type (both strict and lazy)
Expand Down
3 changes: 3 additions & 0 deletions tests/Properties.hs
Original file line number Diff line number Diff line change
Expand Up @@ -536,6 +536,8 @@ prop_short_pack_unpack xs =
(Short.unpack . Short.pack) xs == xs
prop_short_toShort_fromShort bs =
(Short.fromShort . Short.toShort) bs == bs
prop_short_lazyToShort_fromShort lbs =
(Short.lazyFromShort . Short.lazyToShort) lbs == lbs

prop_short_toShort_unpack bs =
(Short.unpack . Short.toShort) bs == P.unpack bs
Expand Down Expand Up @@ -602,6 +604,7 @@ prop_short_pinned (NonNegative (I# len#)) = runST $ ST $ \s ->
short_tests =
[ testProperty "pack/unpack" prop_short_pack_unpack
, testProperty "toShort/fromShort" prop_short_toShort_fromShort
, testProperty "lazyToShort/fromShort" prop_short_lazyToShort_fromShort
, testProperty "toShort/unpack" prop_short_toShort_unpack
, testProperty "pack/fromShort" prop_short_pack_fromShort
, testProperty "empty" prop_short_empty
Expand Down