-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getLine doesn't honor handle's newline setting #13
Comments
Yeah. I have a complete rewrite of the bytestring IO code that's mostly done. The new code respects the newline mode. But since it is slightly different behaviour I'll probably include the IO actions in a new module (and eventually deprecate the existing ones). |
What's the status on this? |
|
Any updates? |
Apparently there's quite a bit of demand for a fix. @dcoutts What happened to the IO code rewrite you mentioned? It would be interesting to see how you solved this issue!
Yeah. I suspect that simply changing the behaviour of So how about adding variants like For reference, here's the relevant code: Lines 1618 to 1668 in 39ad965
|
I think that we should bring |
I'd like to give this a shot, although it'll be my first time writing with low-level Haskell IO. I intend to make
holds. |
Referencing issue #249, I'm thinking that it makes sense to put the fixed version of Then, legacy code won't break (i.e. in case they were compensating for the extra {-# DEPRECATED getLine
"Use Data.ByteString.Char8.getLine instead. (Data.ByteString.getLine does not correctly handle \r\n on Windows; Functions that rely on ASCII encodings belong in Data.ByteString.Char8)"
#-} |
Agreed, feel free to go ahead.
The thing is that existing We can bikeshed whether this constitutes a bug fix or a breaking change later on. |
This closes issue haskell#13. The changes can be summarized as updating `findEOL` to look for "\r\n" in CRLF mode and updating the logic of `haveBuf` to resize the buffer according to the size of the newline. Additionally, tests were added to verify that both `hGetLine`s produce the same behavior. Some of the edge-cases to worry about here include * '\n' still counts as a line end. Thus line endings' length vary between 1 and 2 in CRLF mode. * "\r\r\n" can give a false-start. This means you can't always skip 2 characters when `c' /= '\n'`. * '\r' when not followed by '\n' isn't part of a newline. * Not reading out of the buffer when '\r' is the last character.
Ah, that's unfortunate. I overlooked that re-export. |
This closes issue haskell#13. The changes can be summarized as updating `findEOL` to look for "\r\n" in CRLF mode and updating the logic of `haveBuf` to resize the buffer according to the size of the newline. Additionally, tests were added to verify that both `hGetLine`s produce the same behavior. Some of the edge-cases to worry about here include * '\n' still counts as a line end. Thus line endings' length vary between 1 and 2 in CRLF mode. * "\r\r\n" can give a false-start. This means you can't always skip 2 characters when `c' /= '\n'`. * '\r' when not followed by '\n' isn't part of a newline. * Not reading out of the buffer when '\r' is the last character.
This closes issue haskell#13. The changes can be summarized as updating `findEOL` to look for "\r\n" in CRLF mode and updating the logic of `haveBuf` to resize the buffer according to the size of the newline. Additionally, tests were added to verify that both `hGetLine`s produce the same behavior. Some of the edge-cases to worry about here include * '\n' still counts as a line end. Thus line endings' length vary between 1 and 2 in CRLF mode. * "\r\r\n" can give a false-start. This means you can't always skip 2 characters when `c' /= '\n'`. * '\r' when not followed by '\n' isn't part of a newline. * Not reading out of the buffer when '\r' is the last character.
I've implemented this along with some tests to verify it works but I still have 2 concerns with this change.
|
Also, an interesting quirk to notice is that I'm not sure if that is "correct windows behavior" or not, specifically I recall old versions of notepad mushing all lines onto one line when separated by |
Interesting. I'm in favor of mirroring this behaviour. |
👍 |
I have written the property test along with a newtype This meant that one had to run a few million tests to reliably catch bugs which is too slow, but the newtype generates a newline 50% of the time, so |
@dbramucci nice! |
@dbramucci how is it going? Do not hesitate to create a draft PR, it could facilitate further discussions. |
I just did a quick experiment and it turns out that
And while I don't know of any place that relies on this, I wouldn't be surprised to see people caught off-guard by the fact that |
And on the note of inconsistencies:
cannot hold if the handle is in |
Encounter it in codeforces |
It would be nice if BS.getLine would use the handle's newline setting so that in
CRLF
mode strings are returned with both the CR (if present) and LF removed.An example of code that demonstrates the behavior:
http://stackoverflow.com/questions/22417171/bs-getline-and-crlf-endings
The text was updated successfully, but these errors were encountered: