-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easier to change Gaia row limit #2184
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2184 +/- ##
==========================================
+ Coverage 66.10% 66.15% +0.05%
==========================================
Files 418 418
Lines 28167 28201 +34
==========================================
+ Hits 18619 18657 +38
+ Misses 9548 9544 -4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Sidenote for maintainers: that the fact you had to make the same changes (add the same warnings) in multiple places indicates that the code should be refactored to use common functions for data retrieval within __cone_search
and __query_object
.
@keflavich - I run into these same duplications while reviewing #2140, too. Do you recall any reasons why the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning needs to be removed, and I would also like to reach a conscious decisions for the preferred API before going ahead and merging this, while we have different approaches in other modules.
The rest of the comments nitpicks.
astroquery/gaia/core.py
Outdated
dump_to_file=dump_to_file, | ||
background=background) | ||
result = self.launch_job_async(query=query, | ||
output_file=output_file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix indents (no need to have them in new lines, I would welcome a sensible reformat)
@@ -41,7 +43,7 @@ class GaiaClass(TapPlus): | |||
MAIN_GAIA_TABLE = None | |||
MAIN_GAIA_TABLE_RA = conf.MAIN_GAIA_TABLE_RA | |||
MAIN_GAIA_TABLE_DEC = conf.MAIN_GAIA_TABLE_DEC | |||
ROW_LIMIT = conf.ROW_LIMIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keflavich - so we decided to not go with the vizier approach where the default is set in the config (but can be overridden with a class init kwarg?) I don't use this module, so don't necessarily see which approach is more practical, but firmly think that we should converge toward one API over the different modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to comment on this, but if you keep reading - the API is the same, it's only the implementation that's different. This is used below as row_limit = row_limit or self.ROW_LIMIT or conf.ROW_LIMIT
, which effectively does the same thing - the only difference is that class-level changes (as opposed to instance-level) will be different. i.e.:
Same behavior:
from astroquery.gaia import Gaia
Gaia.ROW_LIMIT = 1
g2 = Gaia()
g2.ROW_LIMIT # this is the default (None), not inherited from Gaia()
Different behavior:
from astroquery.gaia.core import GaiaClass
GaiaClass.ROW_LIMIT = 1
gg = GaiaClass()
gg.ROW_LIMIT = 2
gg2 = GaiaClass()
gg2.ROW_LIMIT # this is 1, inherited from the default set in GaiaClass
Perhaps for consistency only, we should enforce the old behavior? From a user perspective, this is very unlikely to come up, but it will certainly be confusing if it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer it if these options were set in __init__()
, but the purpose of this pull request is to resolve #1760 with minimal changes to the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eerovaher, this is definitely an improvement either way. An API change is acceptable, though - if you want to modify this to put row_limit=conf.row_limit
in __init__
, like in the Vizier module, that would be fine. @bsipocz I'd like your input here, but I think the Vizier API is commonly used and has resulted in few problems, so if we change the API here to be more like Vizier, it will be best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying the configuration options through __init__()
deserves a separate pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying the configuration options through init() deserves a separate pull request.
Yes, indeed. But if it's handled though __init__
, then there is no need for the kwarg. So converging to a preferred API is certainly in the realm of this PR, and thus I would like to think about this first than rush through a merge and then change the API later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the row_limit
keyword arguments was discussed in #2153 (comment), #2153 (comment) and #2153 (comment).
astroquery/gaia/core.py
Outdated
return job.get_results() | ||
table = job.get_results() | ||
if len(table) == row_limit: | ||
warn('The number of rows in the result matches the current row ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this approach is right, definitely not when verbose
is set to False. But even then, I would assume the user set the limit to a certain number rather than unlimited then they expect that the return may be trunckrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that this should be disabled with verbose=False
. We don't warn users when using other modules that their results are truncated - though I think it would not be a bad idea to do so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eerovaher. @bsipocz , I do think the several issues reported with surprise at getting truncated results - and my own experience forgetting that ROW_LIMIT
is the reason there are fewer sources on a given patch of sky - suggests that we can default to having a more verbose response when the results are truncated by a user parameter. It constitutes a minor UX change, but I think a positive one. So, I'm 👍 for keeping this change - but we do want to make sure to clarify how to silence the warning, since excess warnings are one of the main annoyances users (including me) complain about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option worth considering is to emit a warning only if the row limit value used was taken from conf
, so if the user has specified either the row_limit
keyword argument or the class attribute then there will be no warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few pull requests seem to have addressed the complaints about the gaia
module being too verbose, so I'm not sure that is a problem anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a warning, that is independent of verbose
increases that verbosity that is now kind of fixed, and what people were complaining about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the gaia
module used to be too verbose in general, these warnings in particular seem to be requested by the users. But if you think it is better to only emit them if verbose=True
then I'll update the pull request accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these warnings in particular seem to be requested by the users
I don't see any issues that ask for the warning. The issue mentioned above asks for better documentation. OTOH there are a lot of issues that ask for better message handling that makes logging better for automated pipelines and that remove the littering of stdout, and historically, too, we've been asked a lot of times to get rid of the diagnostic prints etc even though they are a preferred way of coding of a lot of people in their own scripts.
So, yes, I refer back again in my first comment above that this warning should not be raised when verbose=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the pull request so that a MaxResultsWarning
is not emitted if verbose=False
.
#638 does ask for such warnings, but at least in gaia
module enabling them is now simple and also documented.
There already is an issue about this: #2100. |
It is now possible to specify the row limit of the `query_object` and `cone_search` families of functions with a keyword argument. It is now also possible to change the row limit through the `conf.ROW_LIMIT` configuration item at runtime.
The `query_object` and `cone_search` families of functions in the Gaia module now emit a `MaxResultsWarning` if the number of rows in the query result matches the row limit.
98ecac4
to
bc0129e
Compare
I noticed that astroquery/astroquery/exceptions.py Lines 93 to 97 in c29328b
This is more appropriate for letting the users know their row limit might be too small, so I've updated the pull request. |
The `query_object` and `cone_search` families of functions in `astroquery.gaia` no longer emit a `MaxResultsWarning` unless they were called with `verbose=True`.
Dear Astroquery team, Could you please put in cc @esdc-esac-esa-int in the pull requests where ESA code is modified? Thanks! |
@eerovaher - could you please rebase, so then this can be reviewed and eventually merged in? |
The first commit of this pull request adds the
row_limit
argument to thequery_object
andcone_search
families of functions in the Gaia module. It also ensures that the row limit can be changed through the configuration system at runtime.The second commit makes the previously mentioned functions emit a
UserWarning
MaxResultsWarning
if the number of rows in the query result matches the row limit.Closes #1760