End Point for E12 staff to download patient data #643

nikyraja · 2023-11-21T10:40:40Z

As agreed in #534 the E12 team need a way to download the data from patient records to allow our annual analyses and reporting. This can either be via a download button to give a large csv with one patient per row, or via the API. The access to this will be restricted to E12 staff only.

The data can be pseudonymised data, where the following personal identifiers can be removed/modified before download:

first and last name, NHS number removed
DOB transformed to age at first assessment; in months for those aged 0-2yrs, in years for those aged over 2 years
Home postcode transformed to LSOA deprivation quintile

The above is also specified in the DPIA #625

This will be required in March 2024 (post-launch).

hannahevansrcphacuk · 2023-11-21T10:51:07Z

I wondered if it would at all be possible also please, if the variable names downloaded are all 30 characters in length?

dc2007git · 2023-11-29T10:12:22Z

Hi @nikyraja , what specifically do you mean by DOB transformed to first assessment date? Are you happy with me just exporting the first assessment date?

dc2007git · 2023-11-29T10:35:36Z

I wondered if it would at all be possible also please, if the variable names downloaded are all 30 characters in length?

Hi @hannahevansrcphacuk . Do you need the column names exactly 30 characters in length, or 30 or less? As puffing out the first_name column name to 30 characters would be very verbose!

hannahevansrcphacuk · 2023-11-29T11:04:50Z

Thank you for querying Danny. If they could be restricted to a minimum of 30 characters please this would be really helpful. Thank you!

dc2007git · 2023-11-29T11:06:51Z

Thank you for querying Danny. If they could be restricted to a minimum of 30 characters please this would be really helpful. Thank you!

Sounds good Hannah, will keep you posted with progress

pacharanero · 2023-11-29T11:13:37Z

@hannahevansrcphacuk can I just confirm what you want here?

restricted to a minimum of 30 characters

So if the column name is first_name you want this expanded out to a minimum of 30 characters?
Firstly, this seems an odd requirement which I would like to understand a bit more first
Secondly, what do you want adding to short column names to bulk them up to 30 characters
Thirdly, did you actually mean you want the column names restricted to a maximum of 30 characters (which seems more logical to me)

hannahevansrcphacuk · 2023-11-29T11:20:54Z

Hi Marcus, yes the column/field/variable names restricted to a maximum of 30 characters. That's exactly right. Thank you Marcus

pacharanero · 2023-11-29T11:29:54Z

@nikyraja @hannahevansrcphacuk @AmaniKrayemRCPCH
To aid in the development and testing of this important feature, I think it would be advantageous to have a full suite of dummy Cases data which includes the audit measures for a number of fictional Cases. We can then use this to assure ourselves that the export feature represents fully what you are expecting to see.

@eatyourpeas @anchit-chandran we have seed functions which can create Cases and Registrations - do you think these would be appropriate? I am wondering perhaps if they wouldn't be because the data they create is too random and doesn't really reflect 'real life' E12 data. We may have to either: develop some alternative seed function with realistic data, or get someone to manually create realistic data in the UI.

Perhaps this feature needs a proper meeting to fully discuss.

hannahevansrcphacuk · 2023-11-29T14:54:11Z

Yes we want
-Age at first assessment=
FLOOR((first assessment date - DOB)/365)
-Age in months for children <2 (0 of 1 year)
-Date of first assessment

nikyraja · 2023-11-29T14:56:39Z

@nikyraja @hannahevansrcphacuk @AmaniKrayemRCPCH To aid in the development and testing of this important feature, I think it would be advantageous to have a full suite of dummy Cases data which includes the audit measures for a number of fictional Cases. We can then use this to assure ourselves that the export feature represents fully what you are expecting to see.

@eatyourpeas @anchit-chandran we have seed functions which can create Cases and Registrations - do you think these would be appropriate? I am wondering perhaps if they wouldn't be because the data they create is too random and doesn't really reflect 'real life' E12 data. We may have to either: develop some alternative seed function with realistic data, or get someone to manually create realistic data in the UI.

Perhaps this feature needs a proper meeting to fully discuss.

Sure, we are expecting a cvs export(s), so we can prepare the heading titles etc, but essentially we need all the data minus the above patient identifiers; one row per patient, each variable with a separate column. Where there is a one to many relationship (eg. treatment or episodes etc), these can be as separate tables, but could also be additional columns?

The 'Fridah button' allowed us to complete analysis so hopefully you can use and adapt this?

One things to note is that whilst we are happy for the identifiers to be removed/transformed for our annual analyses, we would need to 'full data' to be archived and dated on the system somewhere, so if we need to, we can go back to this download and extract identifiers. An example of where we need this is for data access requests which require NHS numbers.

A meeting focusing specifically on this would be useful.

dc2007git · 2023-11-29T15:48:12Z

I agree with Marcus, it would definitely be useful to have a proper brief of everything that is expected. I've developed a feature that will create a csv using dummy data with the sex, deprivation quintile, date of first assessment and age at first assessment with the preferences for date formatting.

Like Marcus said, I think testing is going to be really important for this feature to catch any edge cases before they are committed to a csv. Here's a screenshot of what the output looks like so far. Any other column names you want included, then let me know, but we can properly discuss everything in a meeting. Please note that there are lots of empty spaces because i've only tested assessment date calculations on 3 cases, but more in-depth testing is definitely going to be necessary before deployment:

hannahevansrcphacuk · 2023-11-30T09:55:56Z

@nikyraja @hannahevansrcphacuk @AmaniKrayemRCPCH To aid in the development and testing of this important feature, I think it would be advantageous to have a full suite of dummy Cases data which includes the audit measures for a number of fictional Cases. We can then use this to assure ourselves that the export feature represents fully what you are expecting to see.
@eatyourpeas @anchit-chandran we have seed functions which can create Cases and Registrations - do you think these would be appropriate? I am wondering perhaps if they wouldn't be because the data they create is too random and doesn't really reflect 'real life' E12 data. We may have to either: develop some alternative seed function with realistic data, or get someone to manually create realistic data in the UI.
Perhaps this feature needs a proper meeting to fully discuss.

Sure, we are expecting a cvs export(s), so we can prepare the heading titles etc, but essentially we need all the data minus the above patient identifiers; one row per patient, each variable with a separate column. Where there is a one to many relationship (eg. treatment or episodes etc), these can be as separate tables, but could also be additional columns?

The 'Fridah button' allowed us to complete analysis so hopefully you can use and adapt this?

One things to note is that whilst we are happy for the identifiers to be removed/transformed for our annual analyses, we would need to 'full data' to be archived and dated on the system somewhere, so if we need to, we can go back to this download and extract identifiers. An example of where we need this is for data access requests which require NHS numbers.

A meeting focusing specifically on this would be useful.

To add further to what Niky has specified above, what I think will be a really important feature to have from the get go is to be able to respond quickly to DSAR requests we may have from individuals. An individual (a parent of a child in Epilepsy 12) may ask to see what data was used in the quarterly/monthly/annual reporting as well as ask what data do we currently hold. We would need to be able to obtain this information quickly, if a request was made by an individual, and in a way that is compliant with contracts and GI. We have more time for other Data Access Request, requested for research/further analysis where data from multiple individuals are requested.

dc2007git self-assigned this Nov 24, 2023

dc2007git mentioned this issue Nov 29, 2023

643 end point for e12 staff to download patient data #667

Closed

dc2007git linked a pull request Nov 29, 2023 that will close this issue

643 end point for e12 staff to download patient data #667

Closed

mbarton mentioned this issue Aug 28, 2024

Encrypt CSV download from submissions view rcpch/national-paediatric-diabetes-audit#118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End Point for E12 staff to download patient data #643

End Point for E12 staff to download patient data #643

nikyraja commented Nov 21, 2023 •

edited

Loading

hannahevansrcphacuk commented Nov 21, 2023

dc2007git commented Nov 29, 2023

dc2007git commented Nov 29, 2023

hannahevansrcphacuk commented Nov 29, 2023

dc2007git commented Nov 29, 2023

pacharanero commented Nov 29, 2023

hannahevansrcphacuk commented Nov 29, 2023

pacharanero commented Nov 29, 2023

hannahevansrcphacuk commented Nov 29, 2023

nikyraja commented Nov 29, 2023

dc2007git commented Nov 29, 2023

hannahevansrcphacuk commented Nov 30, 2023

End Point for E12 staff to download patient data #643

End Point for E12 staff to download patient data #643

Comments

nikyraja commented Nov 21, 2023 • edited Loading

hannahevansrcphacuk commented Nov 21, 2023

dc2007git commented Nov 29, 2023

dc2007git commented Nov 29, 2023

hannahevansrcphacuk commented Nov 29, 2023

dc2007git commented Nov 29, 2023

pacharanero commented Nov 29, 2023

hannahevansrcphacuk commented Nov 29, 2023

pacharanero commented Nov 29, 2023

hannahevansrcphacuk commented Nov 29, 2023

nikyraja commented Nov 29, 2023

dc2007git commented Nov 29, 2023

hannahevansrcphacuk commented Nov 30, 2023

nikyraja commented Nov 21, 2023 •

edited

Loading