Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code to preserve original linefeeds (issue #121) #131

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
32 changes: 31 additions & 1 deletion libmodernize/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from __future__ import absolute_import, print_function

import os
import sys
import logging
import optparse
Expand All @@ -17,6 +18,35 @@
from libmodernize import __version__
from libmodernize.fixes import lib2to3_fix_names, six_fix_names, opt_in_fix_names


class LFPreservingRefactoringTool(StdoutRefactoringTool):
""" https://github.com/python-modernize/python-modernize/issues/121 """
def write_file(self, new_text, filename, old_text, encoding):
# detect linefeeds
lineends = {'\n':0, '\r\n':0, '\r':0}
lines = []
for line in open(filename, 'rb'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this work the same way on Python 2 and 3, use:

io.open(filename, newline='')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But file needs to be opened in binary more or else the information about lineends will be lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the newline='' means it will do no conversion of line endings. Binary mode has a much bigger effect on Python 3, because it means you're dealing with bytes rather than strs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But binary mode guarantees that Python 3 won't bail out with UnicodeDecodeError. How to address that? I can't know the file encoding before opening it as Python 3 requires.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. There are ways around that, like using errors='ignore' to skip characters that can't be decoded, but it may be easier to use binary mode here. In that case, though, you'll need to update the checks below to use bytes, e.g. if line.endswith(b'\r\n').

if line.endswith('\r\n'):
lineends['\r\n'] += 1
elif line.endswith('\n'):
lineends['\n'] += 1
elif line.endswith('\r'):
lineends['\r'] += 1
lines.append(line.rstrip('\r\n'))
super(LFPreservingRefactoringTool, self).write_file(
new_text, filename, old_text, encoding)
# detect if line ends are consistent in source file
if sum([bool(lineends[x]) for x in lineends]) == 1:
# detect if line ends are different from system-specific
newline = [x for x in lineends if lineends[x] != 0][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used a collections.Counter object to count the line ending styles, I think this could be simplified to newline = counter.most_common(1)[0][0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New in version 2.7, and modernize README says that it attempts to spit out a codebase compatible with Python 2.6+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, I forget when these things got added. I rarely do anything where 2.6 compatibility still matters now.

if os.linesep != newline:
with open(filename, 'wb') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we can use io.open, which is the Python 3 open function, but also available on Python 2.

for line in lines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the lines that you read before the file was rewritten, so this will undo the changes modernize made. You need to base it on new_text, or re-read the file after it is changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Fixed in 7787615

f.write(line)
self.log_debug('fixed %s linefeeds back to %s',
filename, newline)


usage = __doc__ + """\
%s

Expand Down Expand Up @@ -125,7 +155,7 @@ def main(args=None):
else:
requested = default_fixes
fixer_names = requested.difference(unwanted_fixes)
rt = StdoutRefactoringTool(sorted(fixer_names), flags, sorted(explicit),
rt = LFPreservingRefactoringTool(sorted(fixer_names), flags, sorted(explicit),
options.nobackups, not options.no_diffs)

# Refactor all files and directories passed as arguments
Expand Down