Skip to content

Commit

Permalink
Updated the docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Viktor Petukhov committed Sep 24, 2024
1 parent 0fbad4b commit 9e163c4
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 15 deletions.
8 changes: 4 additions & 4 deletions docs/Manifest.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file is machine-generated - editing it directly is not advised

julia_version = "1.10.4"
julia_version = "1.10.5"
manifest_format = "2.0"
project_hash = "cfe419f0a5bd446f85d5c51ed1644192d6c33e05"

Expand Down Expand Up @@ -106,9 +106,9 @@ uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[deps.Baysor]]
deps = ["Base64", "CSV", "CairoMakie", "CategoricalArrays", "ColorSchemes", "Colors", "Comonicon", "Compat", "Configurations", "DataFrames", "Dates", "Deneb", "Distances", "Distributions", "FastRandPCA", "FileIO", "FiniteDiff", "GeometricalPredicates", "Graphs", "HDF5", "ImageCore", "ImageIO", "ImageMagick", "ImageMorphology", "JSON", "LazyModules", "LibGit2", "LinearAlgebra", "Logging", "MAT", "Makie", "MultivariateStats", "NearestNeighbors", "OrderedCollections", "Parquet", "Pipe", "Pkg", "ProgressMeter", "Random", "SparseArrays", "StaticArrays", "Statistics", "StatsBase", "Test", "UMAP", "UUIDs", "VoronoiDelaunay"]
path = "/home/vpetukhov/.julia/dev/Baysor"
path = "/Users/vpetukhov/.julia/dev/Baysor"
uuid = "cc9f9468-1fbe-11e9-0acf-e9460511877c"
version = "0.6.2"
version = "0.7.0"

[[deps.BufferedStreams]]
git-tree-sha1 = "6863c5b7fc997eadcabdbaf6c5f201dc30032643"
Expand Down Expand Up @@ -1869,7 +1869,7 @@ version = "0.15.2+0"
[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.8.0+1"
version = "5.11.0+0"

[[deps.libfdk_aac_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
Expand Down
6 changes: 6 additions & 0 deletions docs/src/preview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

As a full run takes some time, it can be useful to run a quick preview to get meaning from the data and to get some guesses about the parameters of the full run. The only output of this step is `preview.html`, which visualizes the dataset and provides some diagnostics. Be careful, as this can get too large to be rendered for large datasets, so it's better to run on a subset of the data.

```bash
baysor preview <args> [options]
```

CLI parameters:

```@docs
Baysor.CLI.preview
```
4 changes: 2 additions & 2 deletions docs/src/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Baysor can be used in several ways:
A minimal command for cell segmentation:

```bash
baysor run [-s SCALE -x X_COL -y Y_COL -z Z_COL --gene GENE_COL] -c config.toml MOLECULES_FILE [PRIOR_SEGMENTATION]
baysor run [-s SCALE -x X_COL -y Y_COL -z Z_COL --gene GENE_COL -c config.toml -o OUTPUT_PATH] MOLECULES_FILE [PRIOR_SEGMENTATION]
```

## Dataset preview
Expand All @@ -22,6 +22,6 @@ baysor preview [-x X_COL -y Y_COL --gene GENE_COL -c config.toml -o OUTPUT_PATH]
Many analyses don't require segmentation, and can be run on local neighborhoods instead. In the paper, we call them Neighborhood Composition Vectors (NCVs). To obtain them from Baysor, you may run `baysor segfree`. For more information, see `baysor segfree --help`. Minimal command:

```bash
baysor segfree [-k K_NEIGHBORS -n NCVS_TO_SAVE -x X_COL -y Y_COL --gene GENE_COL -c config.toml -o OUTPUT_PATH] MOLECULES_FILE
baysor segfree [-k K_NEIGHBORS -x X_COL -y Y_COL --gene GENE_COL -c config.toml -o OUTPUT_PATH] MOLECULES_FILE
```

6 changes: 6 additions & 0 deletions docs/src/segfree.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

Many analyses don't require segmentation, and can be run on local neighborhoods instead. In the paper, we call them Neighborhood Composition Vectors (NCVs). To obtain them from Baysor, you may run `baysor segfree`. This would output a [loom file](https://linnarssonlab.org/loompy/format/index.html) with NCVs of size `k` in (`/matrix`), storing `ncv_color` and `confidence` as column attributes (`/col_attrs/`).

```bash
baysor segfree <args> [options]
```

CLI parameters:

```@docs
Baysor.CLI.segfree
```
23 changes: 14 additions & 9 deletions docs/src/segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@

To run the algorithm on your data, use the following command:

```bash
baysor run <args> [options] [flags]
```

CLI parameters:

```@docs
Baysor.CLI.run
```
Expand Down Expand Up @@ -49,7 +55,8 @@ Please, notice that it's highly recommended to set `--n-clusters=1`, so molecule

## Outputs

Segmentation results:
### Segmentation results

- **segmentation\_counts.loom** or **segmentation\_counts.tsv** (depends on `--count-matrix-format`): count matrix with segmented stats. In the case of loom format, column attributes also contain the same info as **segmentation\_cell\_stats.csv**.
- **segmentation.csv**: segmentation info per molecule:
- `confidence`: probability of a molecule to be real (i.e. not noise)
Expand All @@ -68,15 +75,17 @@ Segmentation results:
- `max_cluster_frac` *(only if `n-clusters > 1`)*: fraction of the molecules coming from the most popular cluster. Cells with low `max_cluster_frac` are often doublets.
- `lifespan`: number of iterations the given component exists. The maximal `lifespan` is clipped proportionally to the total number of iterations. Components with a short lifespan likely correspond to noise.

Visualization:
### Visualization

- **segmentation\_polygons\_2d/3d.json**: polygons used for visualization in GeoJSON format. In the case of 3D segmentation, `2d.json` file contains polygons for all molecules pulled across the z-stack. And 3D shows polygons per z-slice. In case of continuous z, it's binned into 20 uniform bins. Depending on the format, it can contain `GeometryCollection` or `FeatureCollection`:
- For 3D, the file contains an array of dictionaries (one per z-slice), each of which representing a `Collection`. For 2D data it's just a single dictionary with a `Collection`.
- Each `GeometryCollection` has a field `geometries`, which is an array of polygons with `cell` field set to cell ids and `coordinates` set to its coordinates.
- `FeatureCollection` is the format, compatible with [10x SpaceRanger](https://www.10xgenomics.com/support/software/xenium-ranger/1.7/analysis/inputs/XR-input-overview#compat-files). It contains a list of `Feature`s with cell ids saved in the `id` field and coordinates in `geometry/coordinates`.
- **segmentation\_diagnostics.html**: visualization of the algorithm QC. *Shown only when `-p` is set.*
- **segmentation\_borders.html**: visualization of cell borders for the dataset colored by local gene expression composition (first part) and molecule clusters (second part). *Shown only when `-p` is set.*

Other:
### Other

- **segmentation\_params.dump.toml**: aggregated parameters from the config and CLI

## Choice of parameters
Expand All @@ -86,11 +95,7 @@ Most important parameters:
- `scale` is the most sensitive parameter, which specifies the expected radius of a cell. It doesn't have to be precise, but the wrong setup can lead to over- or under-segmentation. This parameter is inferred automatically if cell centers are provided.
- `min-molecules-per-cell` is the number of molecules, required for a cell to be considered as real. It really depends on the protocol. For instance, for ISS it's fine to set it to 3, while for MERFISH it can require hundreds of molecules.

Some other sensitive parameters (normally, shouldn't be changed):

- `new-component-weight` is proportional to the probability of generating a new cell for a molecule, instead of assigning it to one of the existing cells. More precisely, the probability to assign a molecule to a particular cell linearly depends on the number of molecules, already assigned to this cell. And this parameter is used as the number of molecules for a cell, which is just generated for this new molecule. The algorithm is robust to small changes in this parameter. And normally values in the range of 0.1-0.9 should work fine. Smaller values would lead to slower convergence of the algorithm, while larger values force the emergence of a large number of small cells on each iteration, which can produce noise in the result. In general, the default value should work well.

Run parameters:

- `--config.segmentation.n-cells-init` expected number of cells in data. This parameter influence only the convergence speed of the algorithm. It's better to set larger values than smaller ones.
- `--config.segmentation.iters` number of iterations for the algorithm. **At the moment, no convergence criteria are implemented, so it will work exactly `iters` iterations**. Thus, too small values would lead to non-convergence of the algorithm, while larger ones would just increase working time. Optimal values can be estimated by the convergence plots, produced among the results.
- `--config.segmentation.n-cells-init` expected number of cells in data. This parameter influences the convergence speed of the algorithm, as well as peak memory usage. Setting this value too small would lead to under-segmentation.
- `--config.segmentation.iters` number of iterations for the algorithm. **At the moment, no convergence criteria are implemented, so it will work exactly `iters` iterations**. Thus, too small values would lead to non-convergence of the algorithm, while larger ones would just increase working time. Optimal values can be estimated by the convergence plots in the diagnostics report.

0 comments on commit 9e163c4

Please sign in to comment.