Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_correlated_variable_rows in chapter 14.3 has dead code #10

Open
krassowski opened this issue Jul 14, 2022 · 1 comment
Open

get_correlated_variable_rows in chapter 14.3 has dead code #10

krassowski opened this issue Jul 14, 2022 · 1 comment

Comments

@krassowski
Copy link

In https://jokergoo.github.io/ComplexHeatmap-reference/book/more-examples.html#visualize-cell-heterogeneity-from-single-cell-rnaseq in the function get_correlated_variable_genes:

get_correlated_variable_genes = function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0) {
    ind = order(apply(mat, 1, function(x) {
            q = quantile(x, c(0.1, 0.9))
            x = x[x < q[1] & x > q[2]]
            var(x)/mean(x)
        }), decreasing = TRUE)[1:n]
    mat2 = mat[ind, , drop = FALSE]
    dt = cor(t(mat2), method = "spearman")
    diag(dt) = 0
    dt[abs(dt) < cor_cutoff] = 0
    dt[dt < 0] = -1
    dt[dt > 0] = 1

    i = colSums(abs(dt)) > n_cutoff

    mat3 = mat2[i, ,drop = FALSE]
    return(mat3)
}

The fragment that filters by quantile does not seem to do anything. q[1] is the lower quantile, q[2] is the upper quantile of x, thus x < q[1] & x > q[2]] is always FALSE. This means that x gets length of 0, and the entire expression:

apply(mat, 1, function(x) {
  q = quantile(x, c(0.1, 0.9))
  x = x[x < q[1] & x > q[2]]
  var(x)/mean(x)
})

returns a named vector with "NA" only. In consequence ind is equivalent to 1:n.

What was meant probably was x = x[x > q[1] & x < q[2]], but if the example works without this step, it could just be removed.

If we correct the direction to align with the likely intent:

Before (note: 721 genes) After correcting directions (note: 693 genes)
image image

After removing the filtering code:

get_correlated_variable_genes = function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0) {
    dt = cor(t(mat), method = "spearman")
    diag(dt) = 0
    dt[abs(dt) < cor_cutoff] = 0
    dt[dt < 0] = -1
    dt[dt > 0] = 1

    i = colSums(abs(dt)) > n_cutoff

    mat3 = mat[i, ,drop = FALSE]
    return(mat3)
}

image

@jokergoo
Copy link
Owner

jokergoo commented Jul 26, 2022

You are right. That was a mistake. It is interesting that order() on a vector of all NA values still returns a meaningful ordering.

> order(c(NA, NA, NA))
[1] 1 2 3

I will adjust the code.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants