ENH: use '=' width in multirow with compatible column types #60281

tuetenk0pp · 2024-11-11T19:07:59Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

It would be nice to be able to use the = option with multirow. Maybe this is something that can be decided automatically according to the column type.

Feature Description

Add logic to _parse_latex_header_span() to check for cloumn types.

IF columntype NOT ONE OF l, c, r
    RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{*}}{{{display_val}}}"
ELSE
    RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{=}}{{{display_val}}}"

Alternative Solutions

One could also find/replace the return value of pandas.DataFrame.to_latex manually:

df = pd.read_excel("input.xlsx")
df = df.set_index(df.columns[:3].tolist())
df = df.sort_index(multirow=True)

latex = df.to_latex()
latex = latex.replace("{*}", "{=}")

with open("output.tex", "w") as f:
    f.write(latex)

Additional Context

pandas/pandas/io/formats/style_render.py

Line 2451 in 156e67e

return f"\\multirow[{multirow_align}]{{{rowspan}}}{{*}}{{{display_val}}}"

See this snippet from the multirow documentation:

The width can also be given as = when the \multirow entry is given in a column that has a defined width, for example in a p{} column, an X column in tabularx or a L, C, R or J column in a tabulary environment. The text will be set in a \parbox of that width. If you give “=” in other situations, you will get strange results (usually a too wide column).

The text was updated successfully, but these errors were encountered:

rhshadrach · 2024-11-11T20:56:33Z

@tuetenk0pp - can you include a reproducible example. We cannot reproduce your example without your input.xlsx file. It would be preferable to have your example not depend on external files at all, e.g.

df = pd.DataFrame(...)

tuetenk0pp · 2024-11-11T23:10:54Z

@rhshadrach, here is a more in depth example of what is happening:

>>> import pandas as pd
>>> d = {'location': ['Supermarket', 'Supermarket', 'Supermarket', 'Supermarket', 'Drugstore', 'Drugstore', 'Drugstore', 'Drugstore', 'Farmers Market', 'Farmers Market', 'Farmers Market', 'Farmers Market'], 'category': ['Produce', 'Produce', 'Dry Goods', 'Dry Goods', 'Personal Care', 'Personal Care', 'Medicine', 'Medicine', 'Vegetables', 'Vegetables', 'Fruits', 'Fruits'], 'item': ['Apples', 'Bananas', 'Rice', 'Pasta', 'Shampoo', 'Toothpaste', 'Pain Reliever', 'Cough Sirup', 'Carrots', 'Spinach', 'Strawberrys', 'Oranges'], 'count': [6, 12, 2, 1, 1, 1, 1, 1, 1, 2, 1, 6]}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(df.columns[:3].tolist())
>>> df = df.sort_index()

The DataFrame now looks like this:

>>> df
                                            count
location       category      item
Drugstore      Medicine      Cough Sirup        1
                             Pain Reliever      1
               Personal Care Shampoo            1
                             Toothpaste         1
Farmers Market Fruits        Oranges            6
                             Strawberrys        1
               Vegetables    Carrots            1
                             Spinach            2
Supermarket    Dry Goods     Pasta              1
                             Rice               2
               Produce       Apples             6
                             Bananas           12

Now I want to export the DataFrame to LaTeX:

>>> latex = r"""
... \documentclass[border=0.5cm]{standalone}
...
... \usepackage{array}
... \usepackage{multirow}
... \usepackage{booktabs}
... 
... \begin{document}
... """
>>> column_format: str = r'p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r'
>>> latex += '\n' + df.to_latex(column_format=column_format, multirow=True) + '\n' + r'\end{document}'

The LaTeX Code now looks like this:

>>> print(latex)

\documentclass[border=0.5cm]{standalone}

\usepackage{array}
\usepackage{multirow}
\usepackage{booktabs}

\begin{document}

\begin{tabular}{p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r}
\toprule
 &  &  & count \\
location & category & item &  \\
\midrule
\multirow[t]{4}{*}{Drugstore} & \multirow[t]{2}{*}{Medicine} & Cough Sirup & 1 \\
 &  & Pain Reliever & 1 \\
\cline{2-4}
 & \multirow[t]{2}{*}{Personal Care} & Shampoo & 1 \\
 &  & Toothpaste & 1 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{*}{Farmers Market} & \multirow[t]{2}{*}{Fruits} & Oranges & 6 \\
 &  & Strawberrys & 1 \\
\cline{2-4}
 & \multirow[t]{2}{*}{Vegetables} & Carrots & 1 \\
 &  & Spinach & 2 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{*}{Supermarket} & \multirow[t]{2}{*}{Dry Goods} & Pasta & 1 \\
 &  & Rice & 2 \\
\cline{2-4}
 & \multirow[t]{2}{*}{Produce} & Apples & 6 \\
 &  & Bananas & 12 \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}

\end{document}

Here I have a screenshot of the compiled document:

As you can see, the multicolumn does not respect the fixed column width. The Farmers Market and Personal Care strings should wrap. This can be fixed with replacing the * with = in \multirow{<nrows>}{<width>}{<text>}:

latex_wrap = latex.replace('{*}', '{=}')

The LaTeX Code now looks like this:

>>> print(latex_wrap)

\documentclass[border=0.5cm]{standalone}

\usepackage{array}
\usepackage{multirow}
\usepackage{booktabs}

\begin{document}

\begin{tabular}{p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r}
\toprule
 &  &  & count \\
location & category & item &  \\
\midrule
\multirow[t]{4}{=}{Drugstore} & \multirow[t]{2}{=}{Medicine} & Cough Sirup & 1 \\
 &  & Pain Reliever & 1 \\
\cline{2-4}
 & \multirow[t]{2}{=}{Personal Care} & Shampoo & 1 \\
 &  & Toothpaste & 1 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{=}{Farmers Market} & \multirow[t]{2}{=}{Fruits} & Oranges & 6 \\
 &  & Strawberrys & 1 \\
\cline{2-4}
 & \multirow[t]{2}{=}{Vegetables} & Carrots & 1 \\
 &  & Spinach & 2 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{=}{Supermarket} & \multirow[t]{2}{=}{Dry Goods} & Pasta & 1 \\
 &  & Rice & 2 \\
\cline{2-4}
 & \multirow[t]{2}{=}{Produce} & Apples & 6 \\
 &  & Bananas & 12 \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}

\end{document}

As you can see, the issue no longer exists in the compiled document. The Farmers Market and Personal Care strings now wrap into the next line.

Note

Finding * and replacing them with = becomes much harder, once not every column containing multirows has a fixed width. Therefore the workaround as presented would no longer work. This is why I think this should acutally become a feature. Manipulating other portions of latex, e.g. replacing the tabular environment with a tabularx environment remains simple.

Here is the summarized code for reproduction:

import pandas as pd
import subprocess

d = {'location': ['Supermarket', 'Supermarket', 'Supermarket', 'Supermarket', 'Drugstore', 'Drugstore', 'Drugstore', 'Drugstore', 'Farmers Market', 'Farmers Market', 'Farmers Market', 'Farmers Market'], 'category': ['Produce', 'Produce', 'Dry Goods', 'Dry Goods', 'Personal Care', 'Personal Care', 'Medicine', 'Medicine', 'Vegetables', 'Vegetables', 'Fruits', 'Fruits'], 'item': ['Apples', 'Bananas', 'Rice', 'Pasta', 'Shampoo', 'Toothpaste', 'Pain Reliever', 'Cough Sirup', 'Carrots', 'Spinach', 'Strawberrys', 'Oranges'], 'count': [6, 12, 2, 1, 1, 1, 1, 1, 1, 2, 1, 6]}

df = pd.DataFrame(data=d)

df = df.set_index(df.columns[:3].tolist())
df = df.sort_index()
print(df)

latex = r"""
\documentclass[border=0.5cm]{standalone}

\usepackage{array}
\usepackage{multirow}
\usepackage{booktabs}

\begin{document}
"""

column_format: str = r'p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r'

latex += '\n' + df.to_latex(column_format=column_format, multirow=True) + '\n' + r'\end{document}'

latex_flex = latex.replace('{*}', '{=}')

with open('multirow.tex', 'w') as f:
    f.write(latex)

with open('multirow_flex.tex', 'w') as f:
    f.write(latex_flex)

for file in ['multirow.tex', 'multirow_flex.tex']:

    cmd = ['latexmk', '-lualatex', '-interaction=nonstopmode', '-file-line-error', '-silent', '-auxdir=./tmp', file]

    try:
        subprocess.run(cmd)
    except:
        print(f'could not run command: {cmd}')

rhshadrach · 2024-11-12T22:22:16Z

Thanks for the additional information! From the OP, you have the logic:

IF columntype NOT ONE OF l, c, r

I do not understand branching on this condition, can you elaborate here?

tuetenk0pp · 2024-11-12T23:41:46Z

The multirow documentation says:

The width can also be given as = when the \multirow entry is given in a column that has a defined width, for example in a p{} column, an X column in tabularx or a L, C, R or J column in a tabulary environment. The text will be set in a \parbox of that width. If you give “=” in other situations, you will get strange results (usually a too wide column).

So this actually means that multirow is able to adjust the width of the cell if the column has a defined width. This is generally the case for the LaTeX column types that are mentioned in the quote; so not for the standard column types l, c and r although the column type S from the siunitx package does not define a column width as well.

Thats why I thought it would be a good idea to check the columntype (from the column_format argument of to_latex()) and return the \multirow ... string accordingly.

With the added S column type the logic would look like this:

IF columntype NOT ONE OF l, c, r, S
    RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{*}}{{{display_val}}}"
ELSE
    RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{=}}{{{display_val}}}"

Or a more safe approach but with no support for any other column types than mentioned in the quote above:

IF columntype ONE OF p{}, m{}, b{}, X, L, C, R, J
    RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{=}}{{{display_val}}}"
ELSE
    RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{*}}{{{display_val}}}"

rhshadrach · 2024-11-17T13:29:02Z

Thanks, PRs to fix are welcome!

cc @attack68

attack68 · 2024-11-17T15:03:17Z

What happens if "=" is given in all cases? Does it still compile even for 'standard' columns?

tuetenk0pp · 2024-11-17T16:25:44Z

What happens if "=" is given in all cases? Does it still compile even for 'standard' columns?

The documentation states:

If you give “=” in other situations, you will get strange results (usually a too wide column).

tuetenk0pp added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 11, 2024

rhshadrach added Needs Info Clarification about behavior needed to assess issue IO LaTeX to_latex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 11, 2024

rhshadrach removed the Needs Info Clarification about behavior needed to assess issue label Nov 17, 2024

attack68 added the Styler conditional formatting using DataFrame.style label Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: use '=' width in multirow with compatible column types #60281

ENH: use '=' width in multirow with compatible column types #60281

tuetenk0pp commented Nov 11, 2024 •

edited

Loading

rhshadrach commented Nov 11, 2024

tuetenk0pp commented Nov 11, 2024

rhshadrach commented Nov 12, 2024

tuetenk0pp commented Nov 12, 2024

rhshadrach commented Nov 17, 2024

attack68 commented Nov 17, 2024

tuetenk0pp commented Nov 17, 2024

ENH: use '=' width in multirow with compatible column types #60281

ENH: use '=' width in multirow with compatible column types #60281

Comments

tuetenk0pp commented Nov 11, 2024 • edited Loading

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

rhshadrach commented Nov 11, 2024

tuetenk0pp commented Nov 11, 2024

rhshadrach commented Nov 12, 2024

tuetenk0pp commented Nov 12, 2024

rhshadrach commented Nov 17, 2024

attack68 commented Nov 17, 2024

tuetenk0pp commented Nov 17, 2024

tuetenk0pp commented Nov 11, 2024 •

edited

Loading