Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swapping IO for abstract IO type-classes #396

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

recursion-ninja
Copy link
Collaborator

Refactor to use io-classes type-classes instead of IO

A reasonably large-scale refactoring (of API type signatures) which addresses the numerous TODO annotations stating "replace by io-classes constraints for IO simulation" in order to provide a more general, polymorphic API for users of the lsm-tree library. The indirect object to be replaced that the TODO annotation references are the m ~ IO constraints found on the same line(s) prefixing the TODO annotation, allowing the post-refactoring functions to abstract over IO operations rather than exposing an API which is specialized to IO. While most TODO annotations were in the Database.LSMTree.Internal module, this PR touches many additional modules. This wide footprint of changes is necessary because the Database.LSMTree.Internal module imports many other modules from the lsm-tree project, hence each of these "dependency modules" required a "IO-abstraction" API update in order to facilitate the "IO-abstraction" of the Database.LSMTree.Internal module.

The io-classes which are utilized in place of specialized IO functions are:

Important Note

Wherever the PR updated a module API from exposing a specialized IO version of a function to a generalized io-classes function, a SPECIALISE directive was also added which specializes the polymorphic m type parameter to IO. The rational here is to have the module's interface file still expose a version of the function which is specialized to IO in order to prevent any performance regressions. However, this may be excessive and result in unnecessarily longer compiler times and code sizes.

I would like some commentary from reviewers more experience balancing SPECIALISE directive usage as to whether the abundant and boilerplate additions of theSPECIALISE directives are a wise decision.

Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like some commentary from reviewers more experience balancing SPECIALISE directive usage as to whether the abundant and boilerplate additions of the SPECIALISE directives are a wise decision.

Generalising the code to work for any IO-like monad is done for testing purposes only, and since the library is intended to be highly performant, the generalisation should ideally have no overhead. So, the SPECIALISE pragmas are warranted -- the compile time is also not prohibitively long at the moment, even with the changes in this PR

On that note, we should verify that the generalisation does not impact our performance. Let's run the benchmarks to see if we can see a clear difference

Also, I have some proposed changes on the branch jdral/io-classes-for-abstract-types. You can cherry-pick what you want to include.

It's puzzling that tests are currently timing out -- there are no logic changes AFAICS. We should find out what is happening here

Note that with the changes on my branch, it seems most of our tests succeed that are currently failing, except for propLockstepIO_RealImpl_RealFS. However, I'm hesitant to say that my changes "fix" the problem if we don't know the origin of the bug. However, I've played around a bit with the statemachine tests, and it seems that disabling the Open action makes the propLockstepIO_RealImpl_RealFS pass... Not sure why

Comment on lines +303 to +304
{-# INLINE withOpenSession #-}
{-# SPECIALISE withOpenSession :: Session IO h -> (SessionEnv IO h -> IO a) -> IO a #-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an INLINE, the SPECIALISE is probably not necessary, but it doesn't hurt to include it anyway

Comment on lines +253 to +256
openFromDisk ::
forall m h.
(MonadFix m, MonadSTM m, MonadThrow m, PrimMonad m)
=> HasFS m h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
openFromDisk ::
forall m h.
(MonadFix m, MonadSTM m, MonadThrow m, PrimMonad m)
=> HasFS m h
openFromDisk ::
forall m h.
(MonadFix m, MonadSTM m, MonadThrow m, PrimMonad m)
=> HasFS m h

import Control.Monad.Primitive
import qualified Control.Monad.ST as ST
--import qualified Control.Monad.ST as ST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--import qualified Control.Monad.ST as ST

@@ -140,18 +150,20 @@ addLargeSerialisedKeyOp fs builder@RunBuilder{runBuilderAcc} key page overflowPa
writeRawOverflowPages fs builder overflowPages'
for_ chunks $ writeIndexChunk fs builder

{-# SPECIALISE unsafeFinalise :: HasFS IO h -> HasBlockIO IO h -> Bool -> RunBuilder (PrimState IO) (FS.Handle h) -> IO (RunFsPaths, Bloom SerialisedKey, IndexCompact, NumEntries) #-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RealWorld instead of PrimState IO.

Suggested change
{-# SPECIALISE unsafeFinalise :: HasFS IO h -> HasBlockIO IO h -> Bool -> RunBuilder (PrimState IO) (FS.Handle h) -> IO (RunFsPaths, Bloom SerialisedKey, IndexCompact, NumEntries) #-}
{-# SPECIALISE unsafeFinalise :: HasFS IO h -> HasBlockIO IO h -> Bool -> RunBuilder RealWorld (FS.Handle h) -> IO (RunFsPaths, Bloom SerialisedKey, IndexCompact, NumEntries) #-}

Make sure to change every other occurrence in this PR as well

-- | Close a run that is being constructed (has not been finalised yet),
-- removing all files associated with it from disk.
-- After calling this operation, the run must not be used anymore.
--
-- TODO: Ensure proper cleanup even in presence of exceptions.
close :: HasFS IO h -> RunBuilder RealWorld (FS.Handle h) -> IO ()
close :: (MonadSTM m) => HasFS m h -> RunBuilder (PrimState m) (FS.Handle h) -> m ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
close :: (MonadSTM m) => HasFS m h -> RunBuilder (PrimState m) (FS.Handle h) -> m ()
close :: MonadSTM m => HasFS m h -> RunBuilder (PrimState m) (FS.Handle h) -> m ()

Comment on lines 203 to +207
Helpers
-------------------------------------------------------------------------------}

writeRawPage :: HasFS IO h -> RunBuilder RealWorld (FS.Handle h) -> RawPage -> IO ()
{-# SPECIALISE writeRawPage :: HasFS IO h -> RunBuilder (PrimState IO) (FS.Handle h) -> RawPage -> IO () #-}
writeRawPage :: (MonadSTM m, PrimMonad m) => HasFS m h -> RunBuilder (PrimState m) (FS.Handle h) -> RawPage -> m ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wrap the function signatures for helpers in this section over multiple lines if they far exceed 80 characters in width

@@ -52,7 +56,7 @@ import System.FS.BlockIO.API (HasBlockIO)
data RunReader m fhandle = RunReader {
-- | The disk page currently being read. If it is 'Nothing', the reader
-- is considered closed.
readerCurrentPage :: !(IORef (Maybe RawPage))
readerCurrentPage :: !(STRef (PrimState m) (Maybe RawPage))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try a MutVar instead

-- | Returns 'Nothing' on EOF.
readDiskPage :: HasFS IO h -> FS.Handle h -> IO (Maybe RawPage)
readDiskPage :: (MonadCatch m, MonadSTM m, PrimMonad m) => HasFS m h -> FS.Handle h -> m (Maybe RawPage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
readDiskPage :: (MonadCatch m, MonadSTM m, PrimMonad m) => HasFS m h -> FS.Handle h -> m (Maybe RawPage)
readDiskPage :: (MonadCatch m, PrimMonad m) => HasFS m h -> FS.Handle h -> m (Maybe RawPage)

@recursion-ninja recursion-ninja force-pushed the recursion-ninja/io-classes-for-abstract-types branch from 00ddaee to 52875ff Compare September 25, 2024 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants