Skip to content

Commit

Permalink
Fixed a problem with huge integers in cell ids
Browse files Browse the repository at this point in the history
  • Loading branch information
VPetukhov committed Jul 20, 2023
1 parent 2d91b9d commit 197bca0
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ In some cases, you may want to use another segmentation as a prior for Baysor. T
baysor run [ARGS] MOLECULES_CSV [PRIOR_SEGMENTATION]
```

Here, `PRIOR_SEGMENTATION` can be a path to a binary image with a segmentation mask, an image with integer cell segmentation labels or a column name in the `MOLECULES_CSV` with integer cell assignment per molecule. In the latter case, the column name must have `:` prefix, e.g. for column `cell` you should use `baysor run [ARGS] molecules.csv :cell`. In case the image is too big to be stored in the tiff format, Baysor supports MATLAB '.mat' format: it should contain a single field with an integer matrix for either a binary mask or segmentation labels. When loading the segmentation, Baysor filters segments that have less than `min-molecules-per-segment` molecules. It can be set in the toml config, and the default value is `min-molecules-per-segment = min-molecules-per-cell / 4`. **Note:** only CSV column prior is currently supported for 3D segmentation.
Here, `PRIOR_SEGMENTATION` can be a path to a binary image with a segmentation mask, an image with integer cell segmentation labels or a column name in the `MOLECULES_CSV` with integer cell assignment per molecule (`0` value means no assignment). In the latter case, the column name must have `:` prefix, e.g. for column `cell` you should use `baysor run [ARGS] molecules.csv :cell`. In case the image is too big to be stored in the tiff format, Baysor supports MATLAB '.mat' format: it should contain a single field with an integer matrix for either a binary mask or segmentation labels. When loading the segmentation, Baysor filters segments that have less than `min-molecules-per-segment` molecules. It can be set in the toml config, and the default value is `min-molecules-per-segment = min-molecules-per-cell / 4`. **Note:** only CSV column prior is currently supported for 3D segmentation.

To specify the expected quality of the prior segmentation you may use `prior-segmentation-confidence` parameter. The value `0.0` makes the algorithm ignore the prior, while the value `1.0` restricts the algorithm from contradicting the prior. Prior segmentation is mainly needed for the cases where gene expression signal is not enough, e.g. with very sparse protocols (such as ISS or DARTFISH). Another potential use case is high-quality data with a visible sub-cellular structure. In these situations, setting `prior-segmentation-confidence > 0.7` is recommended. Otherwise, the default value `0.2` should work well.

Expand Down
2 changes: 2 additions & 0 deletions src/data_loading/cli_wrappers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import HDF5
import JSON
import LinearAlgebra: Adjoint
using ProgressMeter
using StatsBase: denserank

Polygons = Union{Dict{Int, Matrix{T}}, Dict{String, Dict{Int, Matrix{T}}}} where T <: Real

Expand All @@ -15,6 +16,7 @@ function parse_prior_assignment(pos_data::Matrix{Float64}, prior_segmentation::V
error("The prior segmentation column '$col_name' must not contain negative numbers")
end

prior_segmentation = denserank(prior_segmentation) .- 1
filter_segmentation_labels!(prior_segmentation, min_molecules_per_segment=min_molecules_per_segment)
scale, scale_std = estimate_scale_from_assignment(pos_data, prior_segmentation; min_mols_per_cell=min_mols_per_cell)

Expand Down

0 comments on commit 197bca0

Please sign in to comment.