Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Google Docs Meta Data #1546

Merged
merged 2 commits into from
Oct 8, 2024
Merged

Update Google Docs Meta Data #1546

merged 2 commits into from
Oct 8, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Oct 7, 2024

Updating Google Docs Meta Data

  • addition of "Signal Set" column
  • addition of two chng signals: 7dav_inpatient_covid and 7dav_outpatient_covid
  • a bunch of fixes to extended ascii apostrophes and quotation marks (replaced with regular ascii equivalents)

The signal name for "covid_naat_pct_positive_7dav" was lost in an apparent accidental paste, but i fixed it here w/ a commit to the branch PR, and manually in the spreadsheet

Copy link

sonarcloud bot commented Oct 8, 2024

@melange396
Copy link
Collaborator

It turns out that there are still extended ascii chars in here (they are actually unicode chars)... They are findable by running:

from collections import defaultdict
highchars = defaultdict(int)
with open('db_signals.csv') as f:
    for line in f:
        for char in line:
            val = ord(char)
            if val>=127:
                highchars[val] += 1

the current db_signals.csv file gets the following results:

>>> highchars
defaultdict(<class 'int'>, {8220: 9, 8217: 30, 8221: 9})
>>> chr(8220)
'“'
>>> chr(8221)
'”'
>>> chr(8217)
'’'
>>> 

I am not going to simply replace them in the file itself because of escaping concerns, so after merging this PR, i will replace them in the google spreadsheet and then run the csv sync utility (GH action) again.

@melange396 melange396 merged commit a9a2535 into dev Oct 8, 2024
7 checks passed
@melange396 melange396 deleted the bot/update-docs branch October 8, 2024 20:45
@melange396
Copy link
Collaborator

in case it helps someone in the future, heres some ugly code that i used to help compare the two versions of these files:

import csv

dev = []
with open('dev__db_signals.csv') as f:
    for r in csv.reader(f):
        dev.append(r)

new = []
with open('new__db_signals.csv') as f:
    for r in csv.reader(f):
        new.append(r)

def compare_rows(a, b):
    if len(a) != len(b):
        print("length mismatch")
    for i in range(len(a)):
        if a[i] != b[i]:
            print("    ", i, a[i].replace("\n", ""))
            print("    ", i, b[i].replace("\n", ""))

for i in range(len(dev)):
    offset = 0
    if i in (7,8):
        # skip added rows                                                                                                                                                                                          
        continue
    if i > 8:
        # account for added rows                                                                                                                                                                                   
        offset = 2
    n = new[i][:10] + new[i][11:] # skip added column @ index 10                                                                                                                                                   
    d = dev[i-offset]
    if n != d:
        print(i)
        compare_rows(n, d)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant