-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrameGroupBy.idxmin() returns DataFrame, documentation says Series #15275
Comments
The documentation is actually right. What happens is that
then the results are inferred to be of the appropriate shape. Since this is a groupby, you get a Series for each group which are then stacked. So the result is correct and the method of generating the document is 'correct', but this is a method that generates a Series (and not a scalar return), so So we would need to update the templates a bit to fix this. A PR is welcome for this. I suspect there are several other methods that have this issue as well, but you would have to scan the doc-strings I think to see. we actually just had some discussion w.r.t. this for |
OK, I think I understand what you are saying, but I'm a bit confused still about what the "expected result" should be here. If I'm reading you correctly, you're saying that the templates need to be changed such that I'm just confused because you say that "the documentation is actually right", but then go on to explain why the documentation says the wrong thing. I'm guessing that you meant that it's the right documentation for |
@pganssle yes, I mean the groupby documention is wrong, but the source itself (DataFrame.idxmin/Series.idxmin) is right. We simply use that documentation. We need a slightly more sophisticated strategy of using the source documentation, but changing the return type to what it actually is for some groupby methods. A way to do this would be to make the return type a replaceable parameter in the doc-string for Series/Dataframe (and then replace it locally there), and use that here as a template. A bit of work, but I think we need to go down this route to avoid copy-pasting doc-strings, IOW, they will be defined in a singular place (on the source method, e.g. Series.idxmin or DataFrame.idxmin / even these could combined also), then using that in groupby with some Substitution parameters. |
OK, yeah, that's what I thought. I agree on the point of documentation, by the way - I've always felt like documentation tools (at least in Python, which is where I'm most familiar with them) were lacking in support for the DRY (don't repeat yourself) principle. You end up having to do some funky stuff like creating metaclasses or dynamically generate docstrings on import (which makes things a bit harder when navigating the code instead of looking at the generated documentation). I'll take a look at how the templating looks and see if I can find an easy way to either make the return type a replaceable parameter, infer it from the code or both. |
@pganssle yep, tooling is not great, though using Appender and Substitution we get pretty far. thanks for looking into this. |
Hm... This is quite complicated, I'm not quite sure how I think the best way to handle this is to modify |
It turns out that |
This has been fixed. https://pandas.pydata.org/docs/dev/reference/api/pandas.core.groupby.SeriesGroupBy.idxmax.html The DataFrameGroupBy.idxmin / max still shows it returning a Series. PRs to fix are welcome! |
Hello I would like to help, it seems the issue is here:
this is in the code in Should I make a PR by updating the docstring to Dataframe in that file ? |
I don't think so - the issue is in |
Code Sample, a copy-pastable example if possible
Problem description
According to the documentation, this is supposed to output a
pandas.core.Series
. To me, that seems to be what makes sense, but I'm not sure how or why this ended up returning aDataFrame
. Is this just an issue with the documentation, or is it an issue with the code?Edit: Ah, I understand why this returns a
DataFrame
now - if you have multiple columns theidxmin()
might be different for each column. Seems that the documentation needs to be updated. I can make a PR if that's appropriate.Output of
pd.show_versions()
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.0.3
Cython: 0.25.2
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: