Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FM2-642: Improve performance of loading locations by tag #547

Merged
merged 4 commits into from
Sep 5, 2024
Merged

Conversation

ibacher
Copy link
Member

@ibacher ibacher commented Aug 8, 2024

Description of what I changed

As we know, the FHIR Location end point is quite a bit slower than it ought to be. Based on the profiling of individual requests, I thought most of this overhead was just the overhead of translating to FHIR. However, our new performance test suite convinced me otherwise since it shows a median response time of 5 seconds for loading 50 locations (the minimum response time of 132ms is closer to the overhead I expected).

As a result I did some extensive profiling of the locations search request which pointed to two relatively easy-to-fix hotspots:

  1. For Locations (and a few other resources) we support mapping attribute types to FHIR Telecom elements (phone numbers, URLs, email addresses, etc.). The implementation of this would hit the database on every request to load a global property, which was consuming an excessive amount of time, especially because this GP is unlikely to actually be used in many real-world scenarios.
  2. For any pieces of OpenMRS Metadata, we provide the ability to localize the name of the metadata using the same technique supported by the UIFramework and REST module. While not as drastic as the above, this also consumes an excessive amount of time, especially as a Location resource can involve multiple pieces of metadata, especially if there is any location hierarchy.

For the first of these I made the following optimizations:

  • The FhirGlobalPropertyService implementation that reads from the database from every request has been swapped to an implementation backed by a custom GlobalPropertyListener that heavily caches any GP look-ups. This means that anywhere we were hitting the database to check GP values are now completely handled in-memory, drastically reducing the time to lookup.
  • In the base where no contact point attributes are defined, there should now be at most one DB look-up to confirm this instead of the previous at least two.

For the second of these, directly invoking getMessage(String, Object[], Locale) is about 50% faster than invoking getMessage(), which seems to be due to Spring overhead. While individually this doesn't mean much, across millions of invocations of the getMetadataTranslation() method, it shaves a substantial portion of time.

Between these two optimizations, I reduced the time to make 1 million search requests for locations by tag from 25 minutes to about 4 minutes, so this hopefully represents a serious improvement in perceptible performance.

Additionally, while doing this, I noticed that it was actually common (in the various tests) to wind up with multiple copies of FhirContext objects. FhirContext objects are extremely expensive to construct, so I've switched things up so we should only end up with at-most two FhirContexts (one for R4, one for DSTU3), which should speed up the time it takes for tests to run and may have a small impact on runtime performance in the real world too.

Outstanding work:

  • I still think there's a role for some aggressive per-request caching; I'm working on an API for this, especially to be applied to the code that handles concepts, since this also appears to be very slow (hence, I think, the relatively poor performance of getting Observations, Conditions, and Allergies). This work will need to be a little more careful, but I'm hoping we produce similar, if not better gains, across those domains, making the FHIR API much snappier for bulk results.
  • These changes need to be run in a real environment and put through their paces a bit more before they are ready-to-go.

Issue I worked on

see https://openmrs.atlassian.net/browse/FM2-642

Checklist: I completed these to help reviewers :)

  • My IDE is configured to follow the code style of this project.

    No? Unsure? -> configure your IDE, format the code and add the changes with git add . && git commit --amend

  • I have added tests to cover my changes. (If you refactored
    existing code that was well tested you do not have to add tests)

    No? -> write tests and add them to this commit git add . && git commit --amend

  • I ran mvn clean package right before creating this pull request and
    added all formatting changes to my commit.

    No? -> execute above command

  • All new and existing tests passed.

    No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.

  • My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

@ibacher ibacher marked this pull request as draft August 8, 2024 21:30
@ibacher
Copy link
Member Author

ibacher commented Aug 8, 2024

cc: @jayasanka-sack @dkayiwa

@ojwanganto
Copy link

Thanks, @ibacher. Any work on performance is highly appreciated. Thanks for the clear description of the implementation approach

@ibacher ibacher marked this pull request as ready for review August 9, 2024 15:18
@ibacher
Copy link
Member Author

ibacher commented Aug 13, 2024

I have no idea why the tests are failing here... they pass locally in every way I can figure out how to run them.

@ibacher ibacher merged commit ca8bf98 into master Sep 5, 2024
1 of 2 checks passed
@ibacher ibacher deleted the FM2-642 branch September 5, 2024 10:56
@moshonk
Copy link

moshonk commented Sep 8, 2024

@ibacher Thanks for this PR and the detailed description. Here's some feedback from my local tests.

The above changes have introduced a significantly high initial loading time for the module in my instance whenever it's run alongside the event module (tried 2.8.0 to 2.10.0). My 8GB M1 Mac laptop did not succeed in loading the module at all. See the error below:

INFO - ServiceContext.startRefreshingContext(821) |2024-09-08T19:00:29,387| Refreshing Context
INFO - EhCacheManagerFactoryBean.afterPropertiesSet(134) |2024-09-08T19:00:39,727| Initializing EhCache CacheManager
WARN - IdgenTask.run(32) |2024-09-08T19:11:09,803| Not running scheduled task. DaemonToken = null; enabled = false
WARN - BrokerService.checkSystemUsageLimits(1886) |2024-09-08T19:11:19,059| Store limit is 102400 mb, whilst the data directory: ../activemq-data/localhost/KahaDB only has 39771 mb of usable space
ERROR - BrokerService.checkSystemUsageLimits(1925) |2024-09-08T19:11:19,059| Temporary Store limit is 51200 mb, whilst the temporary data directory: ../activemq-data/localhost/tmp_storage only has 39771 mb of usable space

Reverting to the previous commit restores the initial(normal) loading time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants