Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrete values after using the CLR normalization pt.pp.clr #144

Open
LucHendriks opened this issue Jun 27, 2024 · 0 comments
Open

Discrete values after using the CLR normalization pt.pp.clr #144

LucHendriks opened this issue Jun 27, 2024 · 0 comments

Comments

@LucHendriks
Copy link

Description
Question on output of CLR normalization of protein data. When using the muon.prot.pp.clr() function to apply a CLR transformation on our protein counts we observe a weird result when plotting the counts of the proteins where the output shows some bands of discrete for low values. See image below for raw data and data normalized using CLR function of muon.

Screenshot from 2024-06-27 15-40-41
Screenshot from 2024-06-27 15-40-28

To Reproduce
Analysis was run on a subset of the data due to the size of the original dataset. But the data is a 10X CITEseq dataset with 137 proteins.

from muon import prot as pt

# Check the total number of observations
n_obs = mdata['prot'].n_obs

# Determine the size of the subsample
subsample_size = 100000 

# Randomly select the observations
np.random.seed(123) 
sample_indices = np.random.choice(n_obs, subsample_size, replace=False)

# Create the subsample
subsample = mdata['prot'][sample_indices, :].copy()

normalized_counts = pt.pp.clr(subsample, inplace=False)
subsample.layers['clr_dev'] = normalized_counts.X

Expected behaviour
Normally after a log transformation you would expect continuous data and not as observed here some discrete values in the lower range. Could this be due to 0 values not being handled correctly?

System

  • OS: Ubuntu 18.04.6 LTS
  • Python version: 3.8.12
  • Versions of libraries involved: muon 0.1.3

Additional context

muon/muon/_prot/preproc.py

Lines 201 to 240 in 94917d2

def clr(adata: AnnData, inplace: bool = True, axis: int = 0) -> Union[None, AnnData]:
"""
Apply the centered log ratio (CLR) transformation
to normalize counts in adata.X.
Args:
data: AnnData object with protein expression counts.
inplace: Whether to update adata.X inplace.
axis: Axis across which CLR is performed.
"""
if axis not in [0, 1]:
raise ValueError("Invalid value for `axis` provided. Admissible options are `0` and `1`.")
if not inplace:
adata = adata.copy()
if issparse(adata.X) and axis == 0 and not isinstance(adata.X, csc_matrix):
warn("adata.X is sparse but not in CSC format. Converting to CSC.")
x = csc_matrix(adata.X)
elif issparse(adata.X) and axis == 1 and not isinstance(adata.X, csr_matrix):
warn("adata.X is sparse but not in CSR format. Converting to CSR.")
x = csr_matrix(adata.X)
else:
x = adata.X
if issparse(x):
x.data /= np.repeat(
np.exp(np.log1p(x).sum(axis=axis).A / x.shape[axis]), x.getnnz(axis=axis)
)
np.log1p(x.data, out=x.data)
else:
np.log1p(
x / np.exp(np.log1p(x).sum(axis=axis, keepdims=True) / x.shape[axis]),
out=x,
)
adata.X = x
return None if inplace else adata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant