This repository contains a structured pipeline for analyzing bone data through multiple stages, including data import, visualization, alignment, KDE calculation, and clustering analysis. The pipeline is designed for CSV and IMS image files and processes spatial data for KDE and hierarchical clustering visualization.
This pipeline relies on several Python packages, which can be installed automatically via the provided Conda environment file.
- Ensure you have Conda installed on your system.
- Clone the repository and navigate to the directory
- Create the Conda environment using the environment.yml file:
conda env create -f environment.yml
- Activate the environment:
conda activate BoneSPDM
These commands will set up the necessary environment with all required dependencies.
To ensure proper functionality, organize your data in the following folder structure within the project's root directory:
- Data Folders (without inhibitor): Organize other data files by time points in subdirectories named as follows:
d0/
: For day 0 data.d5/
: For day 5 data.d10/
: For day 10 data.d15/
: For day 15 data.d30/
: For day 30 data.reference_bone/
: Place the reference bone image in this folder.
- Data Folders (with inhibitor): If inhibitor data is available, organize it in a separate directory from the data without inhibitors. Subdivide this directory by time points as follows:
d15
: For inhibitor data for day 15. For example,inhibitor/d15/
would store day 15 inhibitor data.
Each folder should contain the relevant data files needed for analysis at that time point. This structured arrangement ensures smooth data processing within the pipeline.
The first step is to import the necessary data files, which include:
- CSV Files: Contains quantitative data (positions) associated with the bone samples.
- IMS Image Files: Images for bone samples.
After loading the data, this step visually inspects the image data and performs adjustments, such as flipping the image on the x or y axis to ensure proper orientation.
In this step, each bone sample is aligned to a reference bone. All samples are transformed to the reference space, ensuring consistency across data.
The pipeline calculates the Kernel Density Estimate (KDE) to determine spatial probability density for the transformed data.
The final step clusters the transformed data points (e.g., HSCs and random dots) using consensus clustering and decide the optimal cluster numbers based on the Silhouette Score. Random forest is applied to predict the cluster labels for data with inhibitos. Heatmaps of the clustered data are also generated to visualize spatial relationships. Scatter plots, and stacked bar plots are also available. Finally, the Mann-Whitney U test is performed to compare the cluster compositions between different groups.
- Run each section of the pipeline sequentially, ensuring all paths and parameters are correctly set for your specific dataset.
- Inspect intermediate visualizations to verify data quality and make adjustments as needed.
- Save output figures and clustering data as needed for further analysis or reporting.