Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symbol column dropped from rowData of SegerstolpePancreasData (devel) #45

Open
lgeistlinger opened this issue Mar 7, 2024 · 5 comments

Comments

@lgeistlinger
Copy link

lgeistlinger commented Mar 7, 2024

In Bioc release:

> library(scRNAseq)
> sce.seger <- SegerstolpePancreasData()
> rowData(sce.seger)
DataFrame with 26179 rows and 2 columns
                symbol                 refseq
           <character>            <character>
SGIP1            SGIP1              NM_032291
AZIN2            AZIN2 NM_052998+NM_001293562
CLIC4            CLIC4              NM_013943
AGBL4            AGBL4              NM_032785
NECAP2          NECAP2 NM_001145277+NM_0011..
...                ...                    ...
KIR2DL4        KIR2DL4 NM_001080772+NM_0022..
KIR2DS3        KIR2DS3              NM_012313
KIR2DS2        KIR2DS2 NM_001291696+NM_0123..
BIVM-ERCC5  BIVM-ERCC5           NM_001204425
eGFP              eGFP                   eGFP

In Bioc devel:

> library(scRNAseq)
> sce.seger <- SegerstolpePancreasData()
> rowData(sce.seger)
DataFrame with 26179 rows and 1 column
                           refseq
                      <character>
SGIP1                   NM_032291
AZIN2      NM_052998+NM_001293562
CLIC4                   NM_013943
AGBL4                   NM_032785
NECAP2     NM_001145277+NM_0011..
...                           ...
KIR2DL4    NM_001080772+NM_0022..
KIR2DS3                 NM_012313
KIR2DS2    NM_001291696+NM_0123..
BIVM-ERCC5           NM_001204425
eGFP                         eGFP

I think this causes OSCA.advanced and OSCA.workflows to break in devel @PeteHaitch @alanocallaghan

@LTLA
Copy link
Owner

LTLA commented Mar 7, 2024

Hm. I think I must have deemed the row names to be redundant with the symbol column and removed the latter to reduce the file size. To avoid breaking stuff, I can dynamically add it back in for the SegerstolpePancreasData function; however, fetchDataset() will still return the sans-symbol version, so people loading the dataset directly from the files (i.e., not through the per-dataset getters) will get a slightly different version of the dataset.

FYI fetchDataset() is going to be the way forward as it (i) avoids the need for contributors to write a getter function and (ii) eliminates the involvement of dataset-specific logic that can't be easily replicated in other frameworks like Python or JS.

Is Segerstolpe the only one? FWIW you can set legacy=TRUE and it'll pull from ExperimentHub for now.

@lgeistlinger
Copy link
Author

If that's the way forward we can also adapt the corresponding parts of the OSCA book to look up the symbols from the rownames. I can't tell you whether this also happens to other datasets at this point. But the breakage comes from looking up the symbol column for ID mapping purposes, and this can be replaced by providing the rownames instead then.

@LTLA
Copy link
Owner

LTLA commented Mar 7, 2024

Added back symbol in 2.19.4. Only for SegerstolpePancreasData, so fetchDataset will still be missing symbol.

@alanocallaghan
Copy link
Contributor

alanocallaghan commented Mar 7, 2024

Yeah seems sensible to just use the rownames for OSCA purposes moving forward

@alanocallaghan
Copy link
Contributor

Think this is resolved now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants