Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows files are written in CRLF newlines #87

Open
ndmitchell opened this issue Sep 12, 2020 · 5 comments
Open

Windows files are written in CRLF newlines #87

ndmitchell opened this issue Sep 12, 2020 · 5 comments
Assignees
Labels

Comments

@ndmitchell
Copy link
Contributor

Given a file on Windows with LF newlines, when I run it through apply-refact, it comes out with CRLF newlines. It would be better if apply-refact (or maybe ghc-exactprint?) could preserve whatever type of newline I have, in addition to everything else.

@zliu41 zliu41 self-assigned this Sep 13, 2020
@zliu41 zliu41 added the bug label Sep 13, 2020
@zliu41
Copy link
Collaborator

zliu41 commented Nov 22, 2020

The fix to this should be done in ghc-exactprint, since parseModule and exactPrint don't round-trip wrt LF vs. CRLF.
I wonder if GHC ParseResult contains the newline encoding information, or is it lost? I don't see it in ParseResult.
If that information is lost, perhaps we can post-process the output, depending on whether the first newline in the input file is LF or CRLF?

@jneira
Copy link
Contributor

jneira commented Nov 23, 2020

If that information is lost, perhaps we can post-process the output, depending on whether the first newline in the input file is LF or CRLF?

Mmm, ideally ghc-exactprint should preserve the EOL for each line and honour the previous EOL line if each additional line. Nobody sane would mix both line endings but who knows.
Other alternative could be add as a a new parameter all info that may be lost in the actual data, line endings (and encoding?) and apply it to output uniformly, in a new function to keep backwards compatibility. Then apply-refact could decide what EOL is used (the first line f.e.) or ask clients for the same info.

@zliu41
Copy link
Collaborator

zliu41 commented Nov 23, 2020

I agree with

Nobody sane would mix both line endings

I don't mind adding a new function with an additional parameter, but I guess users rarely want to specify whether they want LF or CRLF. They just want whatever the input file has. So I'd add it only if someone says they need it.

As to encoding, I don't think GHC can even parse any non-UTF-8 encoded source file, so we can safely assume that all source files are UTF-8.

@zliu41
Copy link
Collaborator

zliu41 commented Nov 23, 2020

Oh, you probably meant that the new function takes the already parsed module as input. In that case adding a new parameter is definitely useful (although ideally, that information should be part of Anns).

How about adding it to applyRefactorings'? Can you think of other things like this besides LF vs. CRLF?

@jneira
Copy link
Contributor

jneira commented Nov 23, 2020

Well, taking a look to System.IO there are three axis in the output settings (for completeness):

  • binary vs text encoding: subsumed within the other ones, it will be always text encoding for us.
  • encoding: we could assume it always is UTF-8 as ghc only supports it.
  • eol: agree adding a paremeter as we have commented here seems to be the right solution (at least in applyRefactoring')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants