Add samples to a distance matrix #45

wasade · 2017-09-20T06:07:55Z

Take as input two BIOM tables, the background and the samples to add, and as input the distance matrix of the background. Represent the background as condensed form (better yet, load from a condensed form representation on disk). How distances are computed is partial values from stripes and discussed below. How computed distances are added is as follows:

a new sample, foo, is to be added
distances between foo and all background samples are computed in background distance matrix order
the foo sample id is pushed on to the 0th position of the sample id array of the condensed form matrix
foo sample distance values are pushed into the front of the condensed form vector of values

This works because:

# the distance matrix
# 0 A A A
# A 0 B B
# A B 0 C
# A B C 0

# is in CF
# A A A B B C

A new sample can be expressed as a new row in the distance matrix. If that new row corresponds to the first row in the distance matrix then we are in effect pushing into the cf representation:

# 0 x x x x
# x 0 A A A
# x A 0 B B
# x A B 0 C
# x A B C 0

# is

# x x x x A A A B B C

Computing the distance of a set of samples corresponds to partial stripe compute. The indexing gets janky, and half of it is easy. For the first half of x distances in the above, they are the 0th value of each stripe. The remaining distances are in effect a negative diagonal along the stripes starting at the right most position of the 0th stripe, then 2nd from right in the next stripe, etc -- annoying to determine but feasible. The hard part is only computing those values, doing so efficiently, and reasonably in the present framework. Still thinking about that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add samples to a distance matrix #45

Add samples to a distance matrix #45

wasade commented Sep 20, 2017

Add samples to a distance matrix #45

Add samples to a distance matrix #45

Comments

wasade commented Sep 20, 2017