Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End Point for E12 staff to download patient data #643

Open
nikyraja opened this issue Nov 21, 2023 · 12 comments
Open

End Point for E12 staff to download patient data #643

nikyraja opened this issue Nov 21, 2023 · 12 comments
Assignees

Comments

@nikyraja
Copy link
Contributor

nikyraja commented Nov 21, 2023

As agreed in #534 the E12 team need a way to download the data from patient records to allow our annual analyses and reporting. This can either be via a download button to give a large csv with one patient per row, or via the API. The access to this will be restricted to E12 staff only.

The data can be pseudonymised data, where the following personal identifiers can be removed/modified before download:

  • first and last name, NHS number removed
  • DOB transformed to age at first assessment; in months for those aged 0-2yrs, in years for those aged over 2 years
  • Home postcode transformed to LSOA deprivation quintile

The above is also specified in the DPIA #625

This will be required in March 2024 (post-launch).

@hannahevansrcphacuk
Copy link

I wondered if it would at all be possible also please, if the variable names downloaded are all 30 characters in length?

@dc2007git dc2007git self-assigned this Nov 24, 2023
@dc2007git
Copy link
Contributor

Hi @nikyraja , what specifically do you mean by DOB transformed to first assessment date? Are you happy with me just exporting the first assessment date?

@dc2007git
Copy link
Contributor

I wondered if it would at all be possible also please, if the variable names downloaded are all 30 characters in length?

Hi @hannahevansrcphacuk . Do you need the column names exactly 30 characters in length, or 30 or less? As puffing out the first_name column name to 30 characters would be very verbose!

@hannahevansrcphacuk
Copy link

Thank you for querying Danny. If they could be restricted to a minimum of 30 characters please this would be really helpful. Thank you!

@dc2007git
Copy link
Contributor

Thank you for querying Danny. If they could be restricted to a minimum of 30 characters please this would be really helpful. Thank you!

Sounds good Hannah, will keep you posted with progress

@pacharanero
Copy link
Member

@hannahevansrcphacuk can I just confirm what you want here?

restricted to a minimum of 30 characters

So if the column name is first_name you want this expanded out to a minimum of 30 characters?
Firstly, this seems an odd requirement which I would like to understand a bit more first
Secondly, what do you want adding to short column names to bulk them up to 30 characters
Thirdly, did you actually mean you want the column names restricted to a maximum of 30 characters (which seems more logical to me)

@hannahevansrcphacuk
Copy link

Hi Marcus, yes the column/field/variable names restricted to a maximum of 30 characters. That's exactly right. Thank you Marcus

@pacharanero
Copy link
Member

@nikyraja @hannahevansrcphacuk @AmaniKrayemRCPCH
To aid in the development and testing of this important feature, I think it would be advantageous to have a full suite of dummy Cases data which includes the audit measures for a number of fictional Cases. We can then use this to assure ourselves that the export feature represents fully what you are expecting to see.

@eatyourpeas @anchit-chandran we have seed functions which can create Cases and Registrations - do you think these would be appropriate? I am wondering perhaps if they wouldn't be because the data they create is too random and doesn't really reflect 'real life' E12 data. We may have to either: develop some alternative seed function with realistic data, or get someone to manually create realistic data in the UI.

Perhaps this feature needs a proper meeting to fully discuss.

@hannahevansrcphacuk
Copy link

Yes we want
-Age at first assessment=
FLOOR((first assessment date - DOB)/365)
-Age in months for children <2 (0 of 1 year)
-Date of first assessment

@nikyraja
Copy link
Contributor Author

@nikyraja @hannahevansrcphacuk @AmaniKrayemRCPCH To aid in the development and testing of this important feature, I think it would be advantageous to have a full suite of dummy Cases data which includes the audit measures for a number of fictional Cases. We can then use this to assure ourselves that the export feature represents fully what you are expecting to see.

@eatyourpeas @anchit-chandran we have seed functions which can create Cases and Registrations - do you think these would be appropriate? I am wondering perhaps if they wouldn't be because the data they create is too random and doesn't really reflect 'real life' E12 data. We may have to either: develop some alternative seed function with realistic data, or get someone to manually create realistic data in the UI.

Perhaps this feature needs a proper meeting to fully discuss.

Sure, we are expecting a cvs export(s), so we can prepare the heading titles etc, but essentially we need all the data minus the above patient identifiers; one row per patient, each variable with a separate column. Where there is a one to many relationship (eg. treatment or episodes etc), these can be as separate tables, but could also be additional columns?

  • The 'Fridah button' allowed us to complete analysis so hopefully you can use and adapt this?

One things to note is that whilst we are happy for the identifiers to be removed/transformed for our annual analyses, we would need to 'full data' to be archived and dated on the system somewhere, so if we need to, we can go back to this download and extract identifiers. An example of where we need this is for data access requests which require NHS numbers.

A meeting focusing specifically on this would be useful.

@dc2007git
Copy link
Contributor

I agree with Marcus, it would definitely be useful to have a proper brief of everything that is expected. I've developed a feature that will create a csv using dummy data with the sex, deprivation quintile, date of first assessment and age at first assessment with the preferences for date formatting.

Like Marcus said, I think testing is going to be really important for this feature to catch any edge cases before they are committed to a csv. Here's a screenshot of what the output looks like so far. Any other column names you want included, then let me know, but we can properly discuss everything in a meeting. Please note that there are lots of empty spaces because i've only tested assessment date calculations on 3 cases, but more in-depth testing is definitely going to be necessary before deployment:
image

@hannahevansrcphacuk
Copy link

@nikyraja @hannahevansrcphacuk @AmaniKrayemRCPCH To aid in the development and testing of this important feature, I think it would be advantageous to have a full suite of dummy Cases data which includes the audit measures for a number of fictional Cases. We can then use this to assure ourselves that the export feature represents fully what you are expecting to see.
@eatyourpeas @anchit-chandran we have seed functions which can create Cases and Registrations - do you think these would be appropriate? I am wondering perhaps if they wouldn't be because the data they create is too random and doesn't really reflect 'real life' E12 data. We may have to either: develop some alternative seed function with realistic data, or get someone to manually create realistic data in the UI.
Perhaps this feature needs a proper meeting to fully discuss.

Sure, we are expecting a cvs export(s), so we can prepare the heading titles etc, but essentially we need all the data minus the above patient identifiers; one row per patient, each variable with a separate column. Where there is a one to many relationship (eg. treatment or episodes etc), these can be as separate tables, but could also be additional columns?

  • The 'Fridah button' allowed us to complete analysis so hopefully you can use and adapt this?

One things to note is that whilst we are happy for the identifiers to be removed/transformed for our annual analyses, we would need to 'full data' to be archived and dated on the system somewhere, so if we need to, we can go back to this download and extract identifiers. An example of where we need this is for data access requests which require NHS numbers.

A meeting focusing specifically on this would be useful.

To add further to what Niky has specified above, what I think will be a really important feature to have from the get go is to be able to respond quickly to DSAR requests we may have from individuals. An individual (a parent of a child in Epilepsy 12) may ask to see what data was used in the quarterly/monthly/annual reporting as well as ask what data do we currently hold. We would need to be able to obtain this information quickly, if a request was made by an individual, and in a way that is compliant with contracts and GI. We have more time for other Data Access Request, requested for research/further analysis where data from multiple individuals are requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants