Bulk2Space requires five formatted data as input:
- Bulk-seq Normalized Data: a
.csv
file with genes as rows and one sample as column
|
Sample |
Gene1 |
5.22 |
Gene2 |
3.67 |
... |
... |
GeneN |
15.76 |
- Single Cell RNA-seq Normalized Data: a
.csv
file with genes as rows and cells as columns
|
Cell1 |
Cell2 |
Cell3 |
... |
CellN |
Gene1 |
1.05 |
2.31 |
1.72 |
... |
0 |
Gene2 |
4.71 |
1.07 |
0 |
... |
4.22 |
... |
... |
... |
... |
... |
... |
GeneN |
0.55 |
0 |
1.48 |
... |
0 |
- Single Cell RNA-seq Annotation Data: a
.csv
file with cell ID and celltype annotation columns.
- The column containing cell ID should be named
Cell
- the column containing the labels should be named
Cell_type
|
Cell |
Cell_type |
Cell1 |
Cell1 |
T cell |
Cell2 |
Cell2 |
B cell |
... |
... |
... |
CellN |
CellN |
Monocyte |
- Spatial Transcriptomics Normalized Data: a
.csv
file with genes as rows and cells (or spots) as columns
|
Cell1 / Spot1 |
Cell2 / Spot2 |
... |
CellN / SpotN |
Gene1 |
3.22 |
4.71 |
... |
1.01 |
Gene2 |
0 |
2.17 |
... |
2.20 |
... |
... |
... |
... |
... |
GeneN |
0 |
0.11 |
... |
1.61 |
- Spatial Transcriptomics Coordinates Data: a
.csv
with cell/spot ID and coordinates columns.
- The column containing the coordinates should be named
xcoord
and ycoord
- For spot-based data, the column containing spot ID should be named
Spot
- For image-based data, the column containing cell ID should be named
Cell
|
Spot (or Cell) |
xcoord |
ycoord |
Cell_1 / Spot_1 |
Cell_1 / Spot_1 |
1.2 |
5.2 |
Cell_2 / Spot_2 |
Cell_1 / Spot_1 |
5.4 |
4.3 |
... |
... |
... |
... |
Cell_n / Spot_n |
Cell_1 / Spot_1 |
11.3 |
6.3 |
- Decompose bulk transcriptomics data into single-cell transcriptomics data:
from bulk2space import Bulk2Space
model = Bulk2Space()
# Decompose bulk transcriptomics data into single-cell transcriptomics data
generate_sc_meta, generate_sc_data = model.train_vae_and_generate(
input_bulk_path,
input_sc_data_path,
input_sc_meta_path,
input_st_data_path,
input_st_meta_path,
ratio_num=1,
top_marker_num=500,
gpu=0,
batch_size=512,
learning_rate=1e-4,
hidden_size=256,
epoch_num=5000,
vae_save_dir='save_model',
vae_save_name='vae',
generate_save_dir='output',
generate_save_name='output')
Parameter |
Description |
Default Value |
input_bulk_path |
Path to bulk-seq data files (.csv) |
None |
input_sc_data_path |
Path to scRNA-seq data files (.csv) |
None |
input_sc_meta_path |
Path to scRNA-seq annotation files (.csv) |
None |
input_st_data_path |
Path to ST data files (.csv) |
None |
input_st_meta_path |
Path to ST metadata files (.csv) |
None |
ratio_num |
The multiples of the number of cells of generated scRNA-seq data |
(int) 1 |
top_marker_num |
The number of marker genes of each celltype used |
(int) 500 |
gpu |
The GPU ID. Use cpu if --gpu < 0 |
(int) 0 |
batch_size |
The batch size for β-VAE model training |
(int) 512 |
learning_rate |
The learning rate for β-VAE model training |
(float) 0.0001 |
hidden_size |
The hidden size of β-VAE model |
(int) 256 |
epoch_num |
The epoch number for β-VAE model training |
(int) 5000 |
vae_save_dir |
Path to save the trained β-VAE model |
(str) save_model |
vae_save_name |
File name of the trained β-VAE model |
(str) vae |
generate_save_dir |
Path to save the generated scRNA-seq data |
(str) output |
generate_save_name |
File name of the generated scRNA-seq data |
(str) output |
- Decompose spatial barcoding-based spatial transcriptomics data (10x Genomics, ST, or Slide-seq, etc) into spatially resolved single-cell transcriptomics data:
from bulk2space import Bulk2Space
model = Bulk2Space()
# Decompose spatial barcoding-based spatial transcriptomics data
# (10x Genomics, ST, or Slide-seq, etc) into spatially resolved
# single-cell transcriptomics data
df_meta, df_data = model.train_df_and_spatial_deconvolution(
generate_sc_meta,
generate_sc_data,
input_st_data_path,
input_st_meta_path,
spot_num=500,
cell_num=10,
df_save_dir='save_model',
df_save_name='df',
map_save_dir='output',
map_save_name='deconvolution',
top_marker_num=500,
marker_used=True,
k=10)
Parameter |
Description |
Default Value |
generate_sc_meta |
Generated scRNA-seq metadata |
None |
generate_sc_data |
Generated scRNA-seq data |
None |
input_st_data_path |
Path to ST data files (.csv) |
None |
input_st_meta_path |
Path to ST metadata files (.csv) |
None |
spot_num |
The spot number of pseudo-spot data which used to train the deep forest model |
(int) 500 |
cell_num |
The cell number per spot of pseudo-spot data which used to train the deep forest model |
(int) 10 |
df_save_dir |
Path to save the trained deep forest model |
(str) save_model |
df_save_name |
File name of the trained deep forest model |
(str) df |
map_save_dir |
Path to save the deconvoluted ST data |
(str) output |
map_save_name |
File name of the deconvoluted ST data |
(str) deconvolution |
top_marker_num |
The number of marker genes of each celltype used |
(int) 500 |
marker_used |
Whether to only use marker genes of each cell type |
(bool) True |
k |
The number of cells per spot set |
(int) 10 |
- Map image-based spatial transcriptomics data (MERFISH, SeqFISH, or STARmap, etc) into spatially resolved single-cell transcriptomics data:
from bulk2space import Bulk2Space
model = Bulk2Space()
# Map image-based spatial transcriptomics data (MERFISH, SeqFISH, or STARmap, etc)
# into spatially resolved single-cell transcriptomics data
df_meta, df_data = model.spatial_mapping(
generate_sc_meta,
generate_sc_data,
input_st_data_path,
input_st_meta_path)
Parameter |
Description |
Default Value |
generate_sc_meta |
Generated scRNA-seq metadata |
None |
generate_sc_data |
Generated scRNA-seq data |
None |
input_st_data_path |
Path to ST data files (.csv) |
None |
input_st_meta_path |
Path to ST metadata files (.csv) |
None |