-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytestring parser #65
Comments
How would this work, exactly? The arguments would still need to be decoded and converted into a text data type ( |
I want to be able to handle arguments that aren't valid Unicode for things that don't require valid Unicode. The String conversion interferes with this. |
It doesn't have to be Unicode, but it must be some encoding. After all, |
I'm closing this for now, as I'm not sure what you're asking exactly. |
Actually I sort of have a use case for this. The issue is that a While it is true that |
To to state this a little bit clearer (hopefully): |
Ok, looking at the actual code, a saner way to do this might be to allow the input to be anything and let the user supply both a decoding function |
This is a very common misconception. No, they don't.
This works fine. You can split on spaces and special ascii characters just fine with bytestring. What exactly is the problem? Anything you don't need to parse, you don't parse. For example, filepaths on unix are ByteString (that's what you get back from the syscalls). If you convert them to String, you lose the underlying encoding (and the bytestring representation is potentially different). Now everything further you do with said filepath in your codebase is going to be potentially wrong, including simple things like This means in fact I can't trust optparse-applicative to deal with filepaths passed in by a user. |
So, on unix, filepaths are By default, the file system encoding uses "UTF-8b", which actually embeds invalid utf-8 sequences using "surrogate escaping". As long as you use the same encoding when converting back to bytes, it is actually perfectly possible to round trip arbitrary data through a Haskell On the other hand, if you want to pass it to a function expecting a Now, there is no official way to actually convert such a
Which will reverse the process correctly. More about encodings not really related to option parsing: Reading As a side note: if you see the
Which is suggesting that there is something strange going on with the encoding. The fact that |
You mean as long as the file system encoding (which can be set to arbitrary things) permits that? Hence I'm not sure what you mean by "default" here. Also note the two caveats mentioned here: https://hackage.haskell.org/package/base-4.14.0.0/docs/GHC-IO-Encoding.html#v:mkTextEncoding
So this approach is definitely not total. |
We're in a hard place here. If we were to try and use Cribbing from the Rust Docs:
But obviously in unix systems, there are other locale specific encodings, which, as you said, may or may not be supersets of ascii. If Haskell had a |
We're in luck. I worked on this for the past year:
|
Interesting. I'm open to opening a branch on top of these proposals. I'm pretty conservative with changes here because optparse is a 10 yo project people just use and expect to work. But if base is moving, yeah, we can and should too. |
Well, base is not moving any time soon, but some boot libraries will support this additional API. So we won't break backwards compatibility that quickly. |
@pcapriotti can you please re-open this ticket? |
Demonstration of unsoundness of current roundtripping techniques: https://gist.github.com/hasufell/c600d318bdbe010a7841cc351c835f92 HF tech proposal outlining the current affairs: haskellfoundation/tech-proposals#35 |
Hello, I am writing Unix utilities in Haskell, and I need support for OsPath and OsString from the new filepath library. I understand that optparse-applicative has been built on type String = [Char] and it would require major changes to be able to support plain octet streams instead of encoded text as input. Therefore I'm kindly asking, are there any plans for optparse-applicative to transition to using OsStrings (or ByteStrings) in foreseeable future (while I could use some ad hoc CLI parser in the meanwhile), or should I consider planning a new CLI parsing library? |
@Merivuokko first, we need to implement the following function properly and multiplatform: getArgs :: [OsString] This is currently blocked because I can't see that For unix, it's fairly trivial: module Main where
import Data.ByteString.Short (ShortByteString) -- bytestring
import System.OsString.Internal.Types -- filepath
import qualified System.Posix.Env.PosixString as Posix -- unix
main :: IO ()
main = do
args <- getArgs
print args
getArgs :: IO [OsString]
getArgs = fmap OsString <$> Posix.getArgs in optparse, optparse-applicative/src/Options/Applicative/Extra.hs Lines 117 to 121 in c6cc612
|
haskell/win32#221
Thank you very much! I was about to create an issue on Win32, as this is the
best that I could have done. I don't know much about that platform and have
only tried it once (some 25 years ago).
|
My current efforts are here: https://github.com/hasufell/optparse-applicative/pull/1/files |
There are two more steps needed:
|
I'd like to parse bytestrings from System.Posix.Env.ByteString.getArgs instead of converting from String.
The text was updated successfully, but these errors were encountered: