You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Take as input two BIOM tables, the background and the samples to add, and as input the distance matrix of the background. Represent the background as condensed form (better yet, load from a condensed form representation on disk). How distances are computed is partial values from stripes and discussed below. How computed distances are added is as follows:
a new sample, foo, is to be added
distances between foo and all background samples are computed in background distance matrix order
the foo sample id is pushed on to the 0th position of the sample id array of the condensed form matrix
foo sample distance values are pushed into the front of the condensed form vector of values
This works because:
# the distance matrix# 0 A A A# A 0 B B# A B 0 C# A B C 0# is in CF# A A A B B C
A new sample can be expressed as a new row in the distance matrix. If that new row corresponds to the first row in the distance matrix then we are in effect pushing into the cf representation:
# 0 x x x x# x 0 A A A# x A 0 B B# x A B 0 C# x A B C 0# is# x x x x A A A B B C
Computing the distance of a set of samples corresponds to partial stripe compute. The indexing gets janky, and half of it is easy. For the first half of x distances in the above, they are the 0th value of each stripe. The remaining distances are in effect a negative diagonal along the stripes starting at the right most position of the 0th stripe, then 2nd from right in the next stripe, etc -- annoying to determine but feasible. The hard part is only computing those values, doing so efficiently, and reasonably in the present framework. Still thinking about that.
The text was updated successfully, but these errors were encountered:
Take as input two BIOM tables, the background and the samples to add, and as input the distance matrix of the background. Represent the background as condensed form (better yet, load from a condensed form representation on disk). How distances are computed is partial values from stripes and discussed below. How computed distances are added is as follows:
This works because:
A new sample can be expressed as a new row in the distance matrix. If that new row corresponds to the first row in the distance matrix then we are in effect pushing into the cf representation:
Computing the distance of a set of samples corresponds to partial stripe compute. The indexing gets janky, and half of it is easy. For the first half of x distances in the above, they are the 0th value of each stripe. The remaining distances are in effect a negative diagonal along the stripes starting at the right most position of the 0th stripe, then 2nd from right in the next stripe, etc -- annoying to determine but feasible. The hard part is only computing those values, doing so efficiently, and reasonably in the present framework. Still thinking about that.
The text was updated successfully, but these errors were encountered: