Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Jira lists with CRLF line-breaks adds erroneous whitespace to subsequent text #25

Open
arctus-io opened this issue Aug 19, 2024 · 6 comments · May be fixed by #28
Open

Converting Jira lists with CRLF line-breaks adds erroneous whitespace to subsequent text #25

arctus-io opened this issue Aug 19, 2024 · 6 comments · May be fixed by #28

Comments

@arctus-io
Copy link
Contributor

arctus-io commented Aug 19, 2024

When using jira2markdown's convert() function on Jira lists with Carriage Return (CR) Line Feed (LF) (CRLF) style line-breaks the resulting markdown text adds erroneous whitespace to subsequent text after the last list item.

See below for a visual example of the conversion issue.

from jira2markdown import convert
jira_text = 'Line Before List: Sample text words words words:\r\n * Bulleted Item 1: Sample text words words words\r\n * Bulleted Item 2: Sample text words words words\r\n\r\nLine After List: Sample text words words words\r\nLine After List: Sample text words words words'
print(jira_text)

Input (jira_text printed):

Line Before List: Sample text words words words:
 * Bulleted Item 1: Sample text words words words
 * Bulleted Item 2: Sample text words words words

Line After List: Sample text words words words
Line After List: Sample text words words words

Input (jira_text with line-breaks visualized):

Line Before List: Sample text words words words:\r\n
 * Bulleted Item 1: Sample text words words words\r\n
 * Bulleted Item 2: Sample text words words words\r\n
\r\n
Line After List: Sample text words words words\r\n
Line After List: Sample text words words words
md_text = convert(jira_text)

Expected Output (md_text printed):

Line Before List: Sample text words words words:
- Bulleted Item 1: Sample text words words words
- Bulleted Item 2: Sample text words words words

Line After List: Sample text words words words
Line After List: Sample text words words words

Expected Output (md_text with line-breaks visualized):

Line Before List: Sample text words words words:\r\n
- Bulleted Item 1: Sample text words words words\r\n
- Bulleted Item 2: Sample text words words words\r\n
\r\n
Line After List: Sample text words words words\r\n
Line After List: Sample text words words words\r\n
print(md_text)

Actual Output (md_text printed):

Line Before List: Sample text words words words:
- Bulleted Item 1: Sample text words words words
- Bulleted Item 2: Sample text words words words
  
  Line After List: Sample text words words words
  Line After List: Sample text words words words

Actual Output (md_text with line-breaks visualized):

Line Before List: Sample text words words words:\r\n
- Bulleted Item 1: Sample text words words words\n
- Bulleted Item 2: Sample text words words words\n
  \n
  Line After List: Sample text words words words\n
  Line After List: Sample text words words words

As shown the conversion ends up replacing:

  • \r\n in the list with \n
  • \r\n\r\n at the end of the list with \n \n
  • \r\n after the list with \n

Copy-and-Pasteable Snippet to replicate the issue:

from jira2markdown import convert

# Input with CRLF line-breaks 
jira_text = 'Line Before List: Sample text words words words:\r\n * Bulleted Item 1: Sample text words words words\r\n * Bulleted Item 2: Sample text words words words\r\n\r\nLine After List: Sample text words words words\r\nLine After List: Sample text words words words'

# Print input with line-breaks rendered
print("\njira_text:\n" + jira_text)

# Print input with line-breaks represented, not rendered
print("\nrepr(jira_text):\n" + repr(jira_text))

md_text = convert(jira_text)

# Print output with line-breaks rendered
print("\nmd_text:\n" + md_text)

# Print output with line-breaks represented, not rendered

print("\nrepr(md_text):\n" + repr(md_text))
@arctus-io
Copy link
Contributor Author

Happy to try and find/fix the issue if it is not a trivial fix on your end.

@catcombo
Copy link
Owner

catcombo commented Sep 7, 2024

Hi @arctus-io!

Thank you very much for the detailed issue with the reproducer. Sorry for the late response. I prepared a fix #28 Could you please test it?

@arctus-io
Copy link
Contributor Author

arctus-io commented Sep 18, 2024

@catcombo: Thanks for the fix! I can confirm #28 fixes this issue.

However... while testing this fix I noticed that the same type of issue regarding \r\n line endings is causing problems with table conversions as well.

I can submit another issue with more details if needed but I believe applying an across the board change that gives \r\n the same equivalency as \n would.

My current workaround is to just convert all \r\n line endings to \n line endings before running it through jira2markdown

@catcombo
Copy link
Owner

@arctus-io Thanks for the feedback! The easiest way to solve this problem is to replace \r\n with \n in the convert function before applying markup conversion. But I would like to find solution of how to fix it on the pyparsing level. It may take some time. Could you give me an example for the table conversion so I have more test cases?

@arctus-io
Copy link
Contributor Author

arctus-io commented Sep 19, 2024

@catcombo:

from jira2markdown import convert

# Example for CRLF `\r\n` broken table conversion

table_CRLF_test_input = 'Table Test:\r\n\r\n||heading 1||heading 2||heading 3||\r\n|col A1|col A2|col A3|\r\n|col B1|col B2|col B3|\r\n\r\nLine after table'

table_CRLF_test_output = convert(table_CRLF_test_input)

print(f"\n\nJira Input with CRLF (printed):\n{'-' * 40}\n{table_CRLF_test_input}\n{'-' * 40}\n")

print(f"\n\nJira Input with CRLF (string):\n{'-' * 40}\n{repr(table_CRLF_test_input)}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from CRLF (printed):\n{'-' * 40}\n{table_CRLF_test_output}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from CRLF (string):\n{'-' * 40}\n{repr(table_CRLF_test_output)}\n{'-' * 40}\n")

# Example for LF `\n` working table conversion

table_LF_test_input = 'Table Test:\n\n||heading 1||heading 2||heading 3||\n|col A1|col A2|col A3|\n|col B1|col B2|col B3|\n\nLine after table'

table_LF_test_output = convert(table_LF_test_input)

print(f"\n\nJira Input with LF (printed):\n{'-' * 40}\n{table_LF_test_input}\n{'-' * 40}\n")

print(f"\n\nJira Input with LF (string):\n{'-' * 40}\n{repr(table_LF_test_input)}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from LF (printed):\n{'-' * 40}\n{table_LF_test_output}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from LF (string):\n{'-' * 40}\n{repr(table_LF_test_output)}\n{'-' * 40}\n")

Output from above:

Jira Input with CRLF (printed):
----------------------------------------
Table Test:

||heading 1||heading 2||heading 3||
|col A1|col A2|col A3|
|col B1|col B2|col B3|

Line after table
----------------------------------------



Jira Input with CRLF (string):
----------------------------------------
'Table Test:\r\n\r\n||heading 1||heading 2||heading 3||\r\n|col A1|col A2|col A3|\r\n|col B1|col B2|col B3|\r\n\r\nLine after table'
----------------------------------------



Markdown Output from CRLF (printed):
----------------------------------------
Table Test:


|heading 1|heading 2|heading 3|
|-|-|-|-|
|col A1|col A2|col A3|
<br>Line after table||

----------------------------------------



Markdown Output from CRLF (string):
----------------------------------------
'Table Test:\r\n\r\n\n|heading 1|heading 2|heading 3|\r|\n|-|-|-|-|\n|col A1|col A2|col A3|\r|\n|col B1|col B2|col B3|\r<br>\r<br>Line after table|\n'
----------------------------------------



Jira Input with LF (printed):
----------------------------------------
Table Test:

||heading 1||heading 2||heading 3||
|col A1|col A2|col A3|
|col B1|col B2|col B3|

Line after table
----------------------------------------



Jira Input with LF (string):
----------------------------------------
'Table Test:\n\n||heading 1||heading 2||heading 3||\n|col A1|col A2|col A3|\n|col B1|col B2|col B3|\n\nLine after table'
----------------------------------------



Markdown Output from LF (printed):
----------------------------------------
Table Test:

|heading 1|heading 2|heading 3|
|-|-|-|
|col A1|col A2|col A3|
|col B1|col B2|col B3|

Line after table
----------------------------------------



Markdown Output from LF (string):
----------------------------------------
'Table Test:\n\n|heading 1|heading 2|heading 3|\n|-|-|-|\n|col A1|col A2|col A3|\n|col B1|col B2|col B3|\n\nLine after table'
----------------------------------------

@catcombo
Copy link
Owner

Thanks for the reproducer for tables! I guess the easiest and reliable way of how to fix this issue would be force conversion of \r\n to \n. I updated PR. Could you please test? I think it should work now for any markup elements combinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants