You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to use recombat to correct for batch effects in my dataset and set X and C accordingly with keeping all the wanted variation in X and all the unwanted in C. However, my unwanted variation is just two numerical covariates. Running the recombat then gives the following error:
[reComBat] 2023-02-22 12:19:17,361 Fit the linear model.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-a4ea77a41deb> in <module>
68 unwanted_variation_design.loc[:, column] = values
69
---> 70 X = model.fit_transform(
71 norm_counts,
72 batches,
~/.conda/envs/hgpython/lib/python3.9/site-packages/reComBat/reComBat.py in fit_transform(self, data, batches, X, C)
242 '''
243
--> 244 self.fit(data,batches,X=X if X is not None else None, C=C if C is not None else None)
245 return self.transform(data,batches,X=X if X is not None else None, C=C if C is not None else None)
246
~/.conda/envs/hgpython/lib/python3.9/site-packages/reComBat/reComBat.py in fit(self, data, batches, X, C)
143
144 logging.info("Fit the linear model.")
--> 145 Z = self.fit_model_(data,batches_one_hot,X=X if X is not None else None,C=C if C is not None else None)
146
147 if self.optimize_params:
~/.conda/envs/hgpython/lib/python3.9/site-packages/reComBat/reComBat.py in fit_model_(self, data, batches_one_hot, X, C)
269 C_categorical = C.loc[:,[c for c in C.columns if '_numerical' not in c]]
270 C_numerical = C.loc[:,[c for c in C.columns if '_numerical' in c]].values
--> 271 C_categorical_one_hot = pd.get_dummies(C_categorical.astype(str),drop_first=True).values
272 C_covariates = np.hstack([C_categorical_one_hot,C_numerical])
273 C_covariates_dim = C_covariates.shape[1]
~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/core/reshape/encoding.py in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype)
200 )
201 with_dummies.append(dummy)
--> 202 result = concat(with_dummies, axis=1)
203 else:
204 result = _get_dummies_1d(
~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(*args, **kwargs)
332
333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no
~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
366 1 3 4
367 """
--> 368 op = _Concatenator(
369 objs,
370 axis=axis,
~/.conda/envs/hgpython/lib/python3.9/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
423
424 if len(objs) == 0:
--> 425 raise ValueError("No objects to concatenate")
426
427 if keys is None:
ValueError: No objects to concatenate
As directly visible from the stack trace the culprit is that recombat assumes that C contains categorical and numerical covariates and does not check if there is only one of them. Since the categorical covariates frame is empty calling pd.get_dummies fails.
Is this the expected behaviour? Do you need to have categorical covariates in C?
If yes this should be documented. If no please introduce some conditional to catch it.
Thanks in advance,
Daniel
The text was updated successfully, but these errors were encountered:
Hi @dmalzl,
the workaround I am experimenting with for now is to put the same variable I use in batch as a categorical unwanted variable to circumvent the error. Do you think this is valid? ie Do you see a problem with that?
Thanks & Cheers, Stephan
Update: this is not recommended as it makes a difference and might even introduce a batch effect... rather correct for the numerical confounder in the downstream analyses e.g, within the DEA model
Hi,
I try to use recombat to correct for batch effects in my dataset and set X and C accordingly with keeping all the wanted variation in X and all the unwanted in C. However, my unwanted variation is just two numerical covariates. Running the recombat then gives the following error:
As directly visible from the stack trace the culprit is that recombat assumes that C contains categorical and numerical covariates and does not check if there is only one of them. Since the categorical covariates frame is empty calling
pd.get_dummies
fails.Is this the expected behaviour? Do you need to have categorical covariates in C?
If yes this should be documented. If no please introduce some conditional to catch it.
Thanks in advance,
Daniel
The text was updated successfully, but these errors were encountered: