Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of PHP alias in the Python client #1288

Merged
merged 18 commits into from
Oct 12, 2023

Conversation

rzats
Copy link
Contributor

@rzats rzats commented Sep 19, 2023

Closes #1285.

Summary:

Removes all references to the api.php compatibility endpoint in src/client/delphi_epidata.py, instead redirecting all requests to their respective endpoints.

This has some implications on the outputs of the client, as it is taken out of "compatibility mode", which would normally be set by the call to the api.php endpoint. This means:

Those changes are reflected in the integration tests.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

Because the client relied on the use of the compatibility endpoint, this PR effectively removes a LOT of test coverage for that endpoint... Can you expand on the tiny bit of api.php-specific testing left in test_covidcast_endpoints.py to make up for some of the loss? Using the as_api_compatibility_row_* methods from src/common/covidcast_row.py might make some of that easier.

Have you checked yet to make sure the covidcast client and covidcast-indicators are compatible with these changes (mentioned in the first comment of the associated issue, #1285)? After you verify the covidcast client still works properly, you will need to change this line in signal_dash_data_generator.py.

Also, a quick search through the code shows some comments still mention "api.php" where they probably shouldnt.

src/client/delphi_epidata.py Outdated Show resolved Hide resolved
src/client/delphi_epidata.py Outdated Show resolved Hide resolved
src/client/delphi_epidata.py Outdated Show resolved Hide resolved
integrations/server/test_covidcast.py Outdated Show resolved Hide resolved
integrations/server/test_covidcast.py Outdated Show resolved Hide resolved
@melange396
Copy link
Collaborator

Good point! Can you create src/client/packaging/pypi/CHANGELOG.md (perhaps using this template) and note that as of this version, returned results will now always have the epidata key in the root dict, and the other field keys at the appropriate locations?

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but second @melange396's comments about checking downstream effects on the covidcast Python client (and any other places that from delphi_epidata import Epidata such as hhs).

src/client/delphi_epidata.py Outdated Show resolved Hide resolved
src/client/delphi_epidata.py Outdated Show resolved Hide resolved
@rzats
Copy link
Contributor Author

rzats commented Sep 22, 2023

@melange396 @dshemetov fixed the issues you've mentioned, and also created two PRs to fix the compatibility changes which will be caused by this being merged:

cmu-delphi/covidcast-indicators#1899
cmu-delphi/covidcast#663

@capnrefsmmat
Copy link
Contributor

This is a breaking change to the Python client API; we'll want to make sure we update the version number on release accordingly.

@melange396
Copy link
Collaborator

from cmu-delphi/covidcast#663 (comment)

@capnrefsmmat said:

since the next version has a breaking API change, shouldn't we be incrementing more than the minor version number?

and i said:

In even a fairly loose interpretation of the semantic versioning rules: yeah, we probably should be.

At the same time, bumping the major number would belie the fact that this is a relatively tiny change in the scope of the rest of the repository... Theres no new functionality and the only real effective change is to the signature of one public method, Epidata.async_epidata(). I would also argue that method is probably very rarely used, though thats just based on a hunch -- i find it hard to believe that many (if any?) people are actually using it because it is poorly documented and it doesnt appear in established call chains, but i have no demonstrable evidence for this. [It IS used once in covidcast-indicators, but we have full control over that code & execution.]

Still, your point stands and this change is worth more than a single increment of the patch-level version.

This would be a lot easier if the client and its versioning were decoupled from the rest of the API code, but we wont truly get that until we release and get adoption of epidatpy, which lives in its own repo.

@melange396
Copy link
Collaborator

We COULD decouple the python client version from the repository version by changing the PyPI metadata here and here, removing those files from the automatic version bumping here, and then changing the version in the client code here.

That would require us to remember to update the version number by hand in all those places if/when the python client code changes again (or come up with an automatic solution). On the plus side, that would stop us from a ridiculous practice we have been doing pretty frequently in the past: changing the client version number when there are changes elsewhere in the repository even though the client code is literally identical.

@dshemetov
Copy link
Contributor

dshemetov commented Sep 22, 2023

On the separate automatic solution for the client: we could probably have another bumpversion.cfg somewhere in the client folder. Unfortunately all the clients are in the same folder, so I think that means only one can have automated version bumping with this approach. If we are open to restructuring that folder somewhat (e.g. R, Py, JS folders), then we could automate them separately. There are probably a bunch of things that depend on this folder structure being the way it is, so we'll need to find those and adjust them.

We could also live with manual version updates for the clients. These clients are quite stable, I'm happy to punt on the automation and not make this PR dependent on that.

@dshemetov
Copy link
Contributor

TODO: look for code in the system that depends on the client package version number being coupled to the repo version. Probably need to check version-specifying config files in this repo (and in other repos).

@melange396
Copy link
Collaborator

It looks like our github action for publishing to PyPI will no-op if the version in setup.py isnt changed, thanks to the option skip_existing: true (i did a good bit of research to come to that conclusion but didnt actually test it). Perhaps we should switch the automatic client release GH actions into manually triggered steps anyway?

Going with a new hand-coded client version of >=5.* seems excessive to me. What if we went with 4.1001.10? Then the python client version would be the same as the next repository version modulo 1000 on the minor part, and the two versioning paths should be distinct until they realign with v5 (presuming we get to v5 before ~1k revisions of v4). The one thousand point difference should convey the sentiment that this is a potentially significant upgrade.

Or we could sweep it under the rug by leaving the repository version linked to the python client version and just increment the patch number, and hope that nobody experiences untoward effects. If some user's code breaks when they update to version 4.1.10, they will either roll back to the previous version or fix their code. ...But they will go through the exact same thing if we release the same changes as version 5.0.0 instead. Strict semantic versioning makes more sense for projects that might introduce a new major or minor version but still perform follow-up updates to the lesser version branch -- for us, regardless of whether we move to 4.1.10 or 4.1001.10 or 5.1.1, we arent going to fill in any intermediate versions between that and the current 4.1.9.

In the past, we've done some potentially "breaking" changes with just a patch-level version bump -- thats not exactly justification for doing it again, but it is a precedent.

@melange396
Copy link
Collaborator

TODO: look for code in the system that depends on the client package version number being coupled to the repo version.

Im not sure what you mean; I dont think anything DEPENDS on the client version being tied to the rep version, but rather they simply happen to be tied together as a consequence of the client and server code existing in the same place with a shared bumpversion config.

@dshemetov
Copy link
Contributor

dshemetov commented Sep 23, 2023

I was thinking of it abstractly: if you have dependencies X and Y that you know are coupled in version and are bundled together, you can bury the dependence on Y being a version, by specifying the X version and installing Y from the obtained bundle.

The key is the bundling though, so if the client is always installed from PyPI (and not by downloading this repo), then I think we're safe. Otherwise, if the client races ahead in version, installing from the repo will get you an old client version.

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, but it seems like we need to make a decision on versioning. My personal take is

  1. we should decouple this client from repo version and start doing SemVer proper
  2. we take this Python client release GitHub action and move it to a separate file that's triggered automatically (a) from the main branch, (b) when delphi_epidata.py changes
  3. my order of preference for next version is v5.0.0 to v4.2.0 to v4.1001.0 to v4.1.10, but @melange396 has final say here

integrations/server/test_api_keys.py Outdated Show resolved Hide resolved
src/client/packaging/pypi/CHANGELOG.md Outdated Show resolved Hide resolved
@melange396
Copy link
Collaborator

My call on the versioning discussion: keep it the way it is (for now) and revert async_epidata().

Rationale:

  • Semantic versioning:
    • FWIW, it appears we never actually publicly committed to use semantic versioning... We only claimed (amongst ourselves) a desire/intention to conform to it.
    • Unless users do a somewhat particular form of package version pinning on their side AND we maintain/support multiple simultaneous version paths (which currently sounds like more effort than its worth, especially with https://github.com/cmu-delphi/epidatpy on the horizon), any user will be affected when they upgrade whether we name the new version 4.1.10 or 5.1.1 or anything in between.
    • To me, this indicates that we can proceed (within reason) however we choose.
  • Offending changes to the python API:
    • The proposed potentially breaking change is a relatively small/simple modification to one public-facing method, async_epidata().
    • It seems unlikely that [m]any users will actually be affected because of the obscurity of the method. In fact, im pretty thoroughly convinced that nobody is actually using async_epidata(). There are only two live(?) places in the cmu-delphi org's github where its even remotely possible that it is being used:
    • async_epidata() seems like it was mostly useful (AFAICT) because it was a way to skirt row limits, which should be much less important now that the limit has been increased.
    • Depending on how its used, async_epidata() can be very inefficient. It can add a lot of http overhead and incur many roundtrips back and forth between the client and server... It can also potentially tie up a large number of our webapp server worker threads and/or DB handles because of the high default concurrent connection limit. Its better for everyone if this isnt used.
    • In light of all this, I claim it will make no difference if we change the method or not, but lets leave it alone for sake of compatibility, though we should deprecate it to discourage current or future use.
  • Coupled/Decoupled versions for repo and client(s):
    • Decoupling the python client version from the repo version will change our existing procedures and potentially introduce complexities/complications over an indeterminate future period.
    • Decoupling the version numbers might even be confusing for some users (though this is admittedly pretty unlikely).
    • However, there are defintely benefits to only creating new client versions when they actually change. I will create a ticket (gh issue) for this.

TODO:

  • revert async_epidata() so it uses the api.php url.
  • mark async_epidata() as deprecated.
  • create a ticket (ie, gh issue) to reexamine the coupled versioning of the clients to the rest of delphi-epidata.

Additionally, as a note to our future selves: we may want to announce the new client version (after release) to our user email list to suggest they upgrade.

@dshemetov
Copy link
Contributor

Should we mark functions downstream of async_epidata deprecated too? I'm thinking mostly of _async_fetch_epidata in the Python client, probably don't need to be concerned about nowcast.

integrations/server/test_api_keys.py Show resolved Hide resolved
src/client/packaging/pypi/CHANGELOG.md Outdated Show resolved Hide resolved
src/client/packaging/pypi/CHANGELOG.md Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Oct 5, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
1.2% 1.2% Duplication

Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but lets wait to merge it until the last few comments in cmu-delphi/covidcast#663 are addressed

@melange396 melange396 merged commit 85e9e74 into dev Oct 12, 2023
8 checks passed
@melange396 melange396 deleted the rzatserkovnyi/api-php-refactor branch October 12, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove usage of php alias in the python client
4 participants