Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need logging of the DataRequests generated by the R-client #22

Open
pmorrill opened this issue Apr 17, 2019 · 8 comments
Open

We need logging of the DataRequests generated by the R-client #22

pmorrill opened this issue Apr 17, 2019 · 8 comments

Comments

@pmorrill
Copy link
Collaborator

I am separating this issue out as it is buried in another thread.

My intention is to start writing the DataRequest records to the Dendroica tables, when they are first generated by the R-client. At least that way you can perform queries to monitor usage. I will write the standard record format as it now exists in Dendroica.data_requests.

However, I have added 3 fields to the DataRequest object, that will not be reflected in the filter attributes: filterStartDayOfYr, filterEndDayOfYr, filterLocalityType.

If you add these to the Dendroica.data_requests table, I can doctor the DataRequest.java file to save and retrieve them. Is that how we want to handle this?

@denislepage
Copy link
Member

denislepage commented Apr 17, 2019 via email

@pmorrill
Copy link
Collaborator Author

One thing I noticed, is that the Dendroica field 'request_origin' can be null. But your enum does not resolve in that case, throwing an exception (line 2088 of DataRequest). Should there be a default of 'web' on that field?

@denislepage
Copy link
Member

denislepage commented Apr 23, 2019 via email

@pmorrill
Copy link
Collaborator Author

I am concerned about using the 'finalize' function to trigger a database insert of new DataRequests. In my own system, this is not reliably triggered. Garbage Collection can be a little unpredictable, and even delayed considerably.

While it is possible to call finalize() explicitly, it is not recommended.

So I would suggest we move that functionality out of finalize and into it's own function, so it is easier to call on demand.

@denislepage
Copy link
Member

denislepage commented Apr 26, 2019 via email

@pmorrill
Copy link
Collaborator Author

I am about ready to add the code that supports full DataRequests db record creation for api generated queries. Here are some notes and few questions.

(Steffi: much of what follows it is server-side talk. But there will be a few small changes to the api spec that arise form it: I will describe those in a separate email or posting once these changes go into the sandbox


  1. Record creation

When a new query is initiated by the api - assuming the attribute validation suceeeds - a new DataRequests object is created, using a new 'formId' generated by Util#getNewId. A DataRequestCollections object us also added, corresponding to the collection code requested. I set the record count on that DataRequestCollections, and the status as 9 (approved), then add it to the DataRequests object.

If the query succeeds and data is delivered, I write the RequedstLabel as 'API Request' and set the requestOrigin as 'api'. I also set the RequestDate. Then call the 'upsert' function to push the records to the database.


  1. Request re-use

Every successful request will now be written to the database, which means that the user can later re-run any request, whether generated by the R-client or on the web forms. In either case, the user can provide a bmdeVersion parameter, and a 'fields' parameter (if bmdeVersion == 'custom'). And any request will also be paginated, using the lastRecord and numRecords parameters.

Anytime a request is run, I am calling the DataRequestsCollections#setDownloadedDt function, and running an upsert. This means that if the R-client splits a request over several pages, the upsert call will be run several times. (Note that I have not yet considered Denis' latest post above, with the alternative way to manage the upsert event.....I will do that)

If a request origin was set as 'web', it will be switched to 'mixed' as soon as it is run through the API.

The api entrypoint 'list_requests' will now respond with all types (web, api, or even mixed). I am including the requestOrigin value in the list returned to the R-client, and we may consider adding some filtering to the 'list_requests' api entrypoint. For example, we could allow filtering to include only web generated requests, etc.


  1. Data Collections in a Request

Requests generated by the R-client only ever include a single collection code, and the DataRequestsCollections status is always set to 'approved' (we have already checked the user's access to the collection).

But, requests generated by the web forms may include multiple collections, and it is possible that a subset of those collections will not yet be approved when the query is run. This is handled by checking for approval on the set of collections, and only including the approved ones in the sql query.

Data Requests are also validated against the client userId, to prevent X from running a request approved for Y (or even generated by Y).


That's the basic system, which I will check in tomorrow if my testing is good this afternoon.

@denislepage
Copy link
Member

denislepage commented Apr 29, 2019 via email

@pmorrill
Copy link
Collaborator Author

Our api spec only allows ONE collection per data request. That's how it was set up. We can make changes of course.

I do not save a query request when only a count of records is generated. The system only saves a request object to the database when actual data is downloaded, though it updates it on subsequent pages - which seems unnecessary.

DataRequests are not being cached in memory during pagination right now. The formId (requestId) is used to pull them from the database on each page-call. I will be considering ways to streamline that tomorrow / Wednesday. Caching in a static object works, but we need some sort of time-out and garbage collection.

None of this specific code is in Git right now, as I have not finished my testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants