Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recombat fails when C is only numerical #3

Open
dmalzl opened this issue Feb 22, 2023 · 2 comments
Open

recombat fails when C is only numerical #3

dmalzl opened this issue Feb 22, 2023 · 2 comments

Comments

@dmalzl
Copy link

dmalzl commented Feb 22, 2023

Hi,

I try to use recombat to correct for batch effects in my dataset and set X and C accordingly with keeping all the wanted variation in X and all the unwanted in C. However, my unwanted variation is just two numerical covariates. Running the recombat then gives the following error:

[reComBat] 2023-02-22 12:19:17,361 Fit the linear model.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-a4ea77a41deb> in <module>
     68         unwanted_variation_design.loc[:, column] = values
     69 
---> 70     X = model.fit_transform(
     71         norm_counts,
     72         batches,

~/.conda/envs/hgpython/lib/python3.9/site-packages/reComBat/reComBat.py in fit_transform(self, data, batches, X, C)
    242         '''
    243 
--> 244         self.fit(data,batches,X=X if X is not None else None, C=C if C is not None else None)
    245         return self.transform(data,batches,X=X if X is not None else None, C=C if C is not None else None)
    246 

~/.conda/envs/hgpython/lib/python3.9/site-packages/reComBat/reComBat.py in fit(self, data, batches, X, C)
    143 
    144         logging.info("Fit the linear model.")
--> 145         Z = self.fit_model_(data,batches_one_hot,X=X if X is not None else None,C=C if C is not None else None)
    146 
    147         if self.optimize_params:

~/.conda/envs/hgpython/lib/python3.9/site-packages/reComBat/reComBat.py in fit_model_(self, data, batches_one_hot, X, C)
    269             C_categorical = C.loc[:,[c for c in C.columns if '_numerical' not in c]]
    270             C_numerical = C.loc[:,[c for c in C.columns if '_numerical' in c]].values
--> 271             C_categorical_one_hot = pd.get_dummies(C_categorical.astype(str),drop_first=True).values
    272             C_covariates = np.hstack([C_categorical_one_hot,C_numerical])
    273             C_covariates_dim = C_covariates.shape[1]

~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/core/reshape/encoding.py in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype)
    200             )
    201             with_dummies.append(dummy)
--> 202         result = concat(with_dummies, axis=1)
    203     else:
    204         result = _get_dummies_1d(

~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    329                     stacklevel=find_stack_level(),
    330                 )
--> 331             return func(*args, **kwargs)
    332 
    333         # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    366     1   3   4
    367     """
--> 368     op = _Concatenator(
    369         objs,
    370         axis=axis,

~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    423 
    424         if len(objs) == 0:
--> 425             raise ValueError("No objects to concatenate")
    426 
    427         if keys is None:

ValueError: No objects to concatenate

As directly visible from the stack trace the culprit is that recombat assumes that C contains categorical and numerical covariates and does not check if there is only one of them. Since the categorical covariates frame is empty calling pd.get_dummies fails.

Is this the expected behaviour? Do you need to have categorical covariates in C?

If yes this should be documented. If no please introduce some conditional to catch it.

Thanks in advance,
Daniel

@sreichl
Copy link

sreichl commented May 26, 2024

Hi @dmalzl,
the workaround I am experimenting with for now is to put the same variable I use in batch as a categorical unwanted variable to circumvent the error. Do you think this is valid? ie Do you see a problem with that?
Thanks & Cheers, Stephan

@sreichl
Copy link

sreichl commented Aug 22, 2024

Update: this is not recommended as it makes a difference and might even introduce a batch effect... rather correct for the numerical confounder in the downstream analyses e.g, within the DEA model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants