Windows files are written in CRLF newlines #87

ndmitchell · 2020-09-12T15:22:19Z

Given a file on Windows with LF newlines, when I run it through apply-refact, it comes out with CRLF newlines. It would be better if apply-refact (or maybe ghc-exactprint?) could preserve whatever type of newline I have, in addition to everything else.

zliu41 · 2020-11-22T07:53:21Z

The fix to this should be done in ghc-exactprint, since parseModule and exactPrint don't round-trip wrt LF vs. CRLF.
I wonder if GHC ParseResult contains the newline encoding information, or is it lost? I don't see it in ParseResult.
If that information is lost, perhaps we can post-process the output, depending on whether the first newline in the input file is LF or CRLF?

jneira · 2020-11-23T06:33:33Z

If that information is lost, perhaps we can post-process the output, depending on whether the first newline in the input file is LF or CRLF?

Mmm, ideally ghc-exactprint should preserve the EOL for each line and honour the previous EOL line if each additional line. Nobody sane would mix both line endings but who knows.
Other alternative could be add as a a new parameter all info that may be lost in the actual data, line endings (and encoding?) and apply it to output uniformly, in a new function to keep backwards compatibility. Then apply-refact could decide what EOL is used (the first line f.e.) or ask clients for the same info.

zliu41 · 2020-11-23T06:57:58Z

I agree with

Nobody sane would mix both line endings

I don't mind adding a new function with an additional parameter, but I guess users rarely want to specify whether they want LF or CRLF. They just want whatever the input file has. So I'd add it only if someone says they need it.

As to encoding, I don't think GHC can even parse any non-UTF-8 encoded source file, so we can safely assume that all source files are UTF-8.

zliu41 · 2020-11-23T07:04:13Z

Oh, you probably meant that the new function takes the already parsed module as input. In that case adding a new parameter is definitely useful (although ideally, that information should be part of Anns).

How about adding it to applyRefactorings'? Can you think of other things like this besides LF vs. CRLF?

jneira · 2020-11-23T07:37:01Z

Well, taking a look to System.IO there are three axis in the output settings (for completeness):

binary vs text encoding: subsumed within the other ones, it will be always text encoding for us.
encoding: we could assume it always is UTF-8 as ghc only supports it.
eol: agree adding a paremeter as we have commented here seems to be the right solution (at least in applyRefactoring')

zliu41 self-assigned this Sep 13, 2020

zliu41 added the bug label Sep 13, 2020

jneira mentioned this issue Nov 24, 2020

Apply hint: redundant lambda breaks block identation #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows files are written in CRLF newlines #87

Windows files are written in CRLF newlines #87

ndmitchell commented Sep 12, 2020

zliu41 commented Nov 22, 2020

jneira commented Nov 23, 2020 •

edited

Loading

zliu41 commented Nov 23, 2020

zliu41 commented Nov 23, 2020

jneira commented Nov 23, 2020

Windows files are written in CRLF newlines #87

Windows files are written in CRLF newlines #87

Comments

ndmitchell commented Sep 12, 2020

zliu41 commented Nov 22, 2020

jneira commented Nov 23, 2020 • edited Loading

zliu41 commented Nov 23, 2020

zliu41 commented Nov 23, 2020

jneira commented Nov 23, 2020

jneira commented Nov 23, 2020 •

edited

Loading