[SVCS-530] Xlsx duplicate header error fix #288

AddisonSchiller · 2017-10-12T17:41:56Z

Ticket

https://openscience.atlassian.net/browse/SVCS-530

Purpose

The xlsx renderer tool for the tabular renderer was overwriting column values if there were duplicate names in the header names.

Changes

The tabular renderer will no longer overwrite the values
for headers that have the same name. Instead it will rename
all duplicated headers in the format name (1)
There is a very unlikely case where, if after searching for a name for 5000 iterations, it will use a UUID instead of a count.

Added some tests (One is commented out because of how hard it is to test)

Side effects

None that I know of

QA Notes

There is a zip file on the JIRA ticket with files to test with.

The tabular renderer will no longer overwrite the values for headers that have the same name. Instead it will rename all duplicated headers in the format `name (1)`

cslzchen

In addition to our discussion, check the style as well.

cslzchen · 2017-11-15T20:07:49Z

mfr/extensions/tabular/libs/xlrd_tools.py

+                    iteration = 0
+                    while increased_name in fields:
+                        iteration += 1
+                        if iteration > 5000:


Set iteration cap as a default argument and use a lower number for testing.

cslzchen · 2017-11-15T20:08:41Z

tests/extensions/tabular/test_xlsx_tools.py

+        assert sheet[1][0] == {'Name': 1.0, 'Dup (1)': 2.0, 'Dup (2)': 3.0,
+                            'Dup (3)': 4.0, 'Dup (4)': 5.0, 'Not Dup': 6.0}
+
+    # After demo it was suggested the iteration cap be raised. The value ended up to be about 5,000


As suggested above, use a default arg for iterations and set it lower. You can then use this for this test instead of having to make a file to iterate 5000 times.

…ular-file-renderer into feature/xlsx-duplicate-column-names-fix

coveralls · 2017-11-16T19:46:29Z

Coverage increased (+0.3%) to 68.28% when pulling 6ae4062 on AddisonSchiller:feature/xlsx-duplicate-column-names-fix into 8bb2dd4 on CenterForOpenScience:develop.

AddisonSchiller · 2017-11-16T20:05:24Z

@cslzchen , added max_iterations variable for testing. Re-enabled uuid test. Also a few minor style changes (renaming vars etc)

coveralls · 2017-11-16T20:09:20Z

Coverage increased (+0.3%) to 68.336% when pulling e4fec47 on AddisonSchiller:feature/xlsx-duplicate-column-names-fix into 8bb2dd4 on CenterForOpenScience:develop.

coveralls · 2017-11-16T20:14:07Z

Coverage increased (+0.3%) to 68.318% when pulling e4fec47 on AddisonSchiller:feature/xlsx-duplicate-column-names-fix into 8bb2dd4 on CenterForOpenScience:develop.

cslzchen

Looks good and move to PCR 🎆 🎆

cslzchen · 2017-11-21T15:53:25Z

mfr/extensions/tabular/libs/xlrd_tools.py



-def xlsx_xlrd(fp):
-    """Read and convert a xlsx file to JSON format using the xlrd library
+def xlsx_xlrd(fp, max_iterations=5000):


cslzchen

h/t @AddisonSchiller, PR looks good. I will take over and rebase it up-to-date.

Xlsx duplicate header error fix

6658c82

The tabular renderer will no longer overwrite the values for headers that have the same name. Instead it will rename all duplicated headers in the format `name (1)`

cslzchen requested changes Nov 15, 2017

View reviewed changes

cslzchen added the Code Review label Nov 15, 2017

Merge branch 'develop' of https://github.com/CenterForOpenScience/mod…

6ae4062

…ular-file-renderer into feature/xlsx-duplicate-column-names-fix

AddisonSchiller added 2 commits November 16, 2017 14:54

max_iterations default arg for testing

36771a9

Style changes

e4fec47

cslzchen approved these changes Nov 21, 2017

View reviewed changes

cslzchen added Final Review and removed Code Review labels Nov 21, 2017

cslzchen requested changes Jun 18, 2018

View reviewed changes

cslzchen added Add'l Dev and removed Final Review labels Jun 18, 2018

cslzchen self-assigned this Jun 18, 2018

felliott unassigned cslzchen Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SVCS-530] Xlsx duplicate header error fix #288

[SVCS-530] Xlsx duplicate header error fix #288

AddisonSchiller commented Oct 12, 2017

cslzchen left a comment

cslzchen Nov 15, 2017

cslzchen Nov 15, 2017

coveralls commented Nov 16, 2017 •

edited

Loading

AddisonSchiller commented Nov 16, 2017

coveralls commented Nov 16, 2017 •

edited

Loading

coveralls commented Nov 16, 2017 •

edited

Loading

cslzchen left a comment

cslzchen Nov 21, 2017

cslzchen left a comment

[SVCS-530] Xlsx duplicate header error fix #288

Are you sure you want to change the base?

[SVCS-530] Xlsx duplicate header error fix #288

Conversation

AddisonSchiller commented Oct 12, 2017

Ticket

Purpose

Changes

Side effects

QA Notes

cslzchen left a comment

Choose a reason for hiding this comment

cslzchen Nov 15, 2017

Choose a reason for hiding this comment

cslzchen Nov 15, 2017

Choose a reason for hiding this comment

coveralls commented Nov 16, 2017 • edited Loading

AddisonSchiller commented Nov 16, 2017

coveralls commented Nov 16, 2017 • edited Loading

coveralls commented Nov 16, 2017 • edited Loading

cslzchen left a comment

Choose a reason for hiding this comment

cslzchen Nov 21, 2017

Choose a reason for hiding this comment

cslzchen left a comment

Choose a reason for hiding this comment

coveralls commented Nov 16, 2017 •

edited

Loading

coveralls commented Nov 16, 2017 •

edited

Loading

coveralls commented Nov 16, 2017 •

edited

Loading