diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..ed9a7ea
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,65 @@
+# BEDboss
+bedboss is a command-line pipeline that standardizes and calculates statistics for genomic interval data, and enters the results into a BEDbase database.
+It has 3 components:
+
+1) bedmaker (`bedboss make`);
+2) bedqc (`bedboss qc`);
+3) bedstat (`bedboss stat`).
+
+You may run all 3 pipelines together, or separately.
+
+Mainly pipelines are intended to be run from command line but nevertheless,
+they are also available as a python function, so that user can implement them to his own code.
+----
+## BEDboss consist of 3 main pipelines:
+
+### bedmaker
+bedmaker - pipeline to convert supported file types* into BED format and bigBed format. Currently supported formats:
+
+- bedGraph
+- bigBed
+- bigWig
+- wig
+
+### bedqc
+flag bed files for further evaluation to determine whether they should be included in the downstream analysis.
+Currently, it flags bed files that are larger than 2G, has over 5 milliom regions, and/or has mean region width less than 10 bp.
+This threshold can be changed in bedqc function arguments.
+
+### bedstat
+
+pipeline for obtaining statistics about bed files
+
+It produces BED file Statistics:
+
+- **GC content**.The average GC content of the region set.
+- **Number of regions**. The total number of regions in the BED file.
+- **Median TSS distance**. The median absolute distance to the Transcription Start Sites (TSS)
+- **Mean region width**. The average region width of the region set.
+- **Exon percentage**. The percentage of the regions in the BED file that are annotated as exon.
+- **Intron percentage**. The percentage of the regions in the BED file that are annotated as intron.
+- **Promoter proc percentage**. The percentage of the regions in the BED file that are annotated as promoter-prox.
+- **Intergenic percentage**. The percentage of the regions in the BED file that are annotated as intergenic.
+- **Promoter core percentage**. The percentage of the regions in the BED file that are annotated as promoter-core.
+- **5' UTR percentage**. The percentage of the regions in the BED file that are annotated as 5'-UTR.
+- **3' UTR percentage**. The percentage of the regions in the BED file that are annotated as 3'-UTR.
+
+# Additional information
+
+## bedmaker
+
+### Additional dependencies
+
+- bedToBigBed: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
+- bigBedToBed: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed
+- bigWigToBedGraph: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph
+- wigToBigWig: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig
+
+## bedstat
+
+### Additional dependencies
+regionstat.R script is used to calculate the bed file statistics, so the pipeline also depends on several R packages:
+
+All dependencies you can find in R helper script, and use it to easily install the required packages:
+
+- Rscript scripts/installRdeps.R [How to install R dependencies](./how_to_install_r_dep.md)
diff --git a/docs/changelog.md b/docs/changelog.md
new file mode 100644
index 0000000..5026ad7
--- /dev/null
+++ b/docs/changelog.md
@@ -0,0 +1,7 @@
+# Changelog
+
+This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
+
+## [0.1.0a1] - 2023-08-02
+### Added
+- Initial alpha release
diff --git a/docs/how_to_bedbase_config.md b/docs/how_to_bedbase_config.md
new file mode 100644
index 0000000..0c19ae0
--- /dev/null
+++ b/docs/how_to_bedbase_config.md
@@ -0,0 +1,45 @@
+# How to create bedbase config file (for bedstat)
+
+### Bedbase config file is yaml file with 4 parts:
+- path to output files
+- database credentials
+- server information
+- remote info
+
+### Example:
+```yaml
+path:
+ pipeline_output_path: $BEDBOSS_OUTPUT_PATH # do not change it
+ bedstat_dir: bedstat_output
+ remote_url_base: null
+ bedbuncher_dir: bedbucher_output
+ # region2vec: "add/path/here"
+ # vec2vec: "add/path/here"
+database:
+ host: $DB_HOST_URL
+ port: $POSTGRES_PORT
+ password: $POSTGRES_PASSWORD
+ user: $POSTGRES_USER
+ name: $POSTGRES_DB
+ dialect: postgresql
+ driver: psycopg2
+server:
+ host: 0.0.0.0
+ port: 8000
+qdrant:
+ host: localhost
+ port: 6333
+ api_key: None
+ collection: bedbase
+remotes:
+ http:
+ prefix: https://data.bedbase.org/
+ description: HTTP compatible path
+ s3:
+ prefix: s3://data.bedbase.org/
+ description: S3 compatible path
+```
+
+### Download example bedbase configuration file here: Example bedbase configuration file
+
+.
\ No newline at end of file
diff --git a/docs/how_to_create_database.md b/docs/how_to_create_database.md
new file mode 100644
index 0000000..12d2679
--- /dev/null
+++ b/docs/how_to_create_database.md
@@ -0,0 +1,18 @@
+# How to create bedbase database
+
+To run bedstat, bedbuncher and bedmbed we need to create postgres database.
+
+We are initiating postgres db in docker.
+If you don't have docker installed, you can install it with `sudo apt-get update && apt-get install docker-engine -y`.
+
+Now, create a persistent volume to house PostgreSQL data:
+
+```bash
+docker volume create postgres-data
+```
+
+```bash
+docker run -d --name bedbase-postgres -p 5432:5432 -e POSTGRES_PASSWORD=bedbasepassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -v postgres-data:/var/lib/postgresql/data postgres:13
+```
+
+Now we have created docker and can run pipelines.
diff --git a/docs/how_to_install_r_dep.md b/docs/how_to_install_r_dep.md
new file mode 100644
index 0000000..2059795
--- /dev/null
+++ b/docs/how_to_install_r_dep.md
@@ -0,0 +1,7 @@
+# How to install R dependencies
+
+1. Install R: https://cran.r-project.org/bin/linux/ubuntu/fullREADME.html
+2. Download this script: Install R dependencies
+3. Install dependencies by running this command in your terminal: ```Rscript installRdeps.R```
+4. Run `bash_requirements_test.sh` to check if everything was installed correctly (located in test folder:
+[Bash requirement tests](https://github.com/bedbase/bedboss/blob/68910f5142a95d92c27ef53eafb9c35599af2fbd/test/bash_requirements_test.sh))
diff --git a/docs/installRdeps.R b/docs/installRdeps.R
new file mode 100644
index 0000000..3cad82f
--- /dev/null
+++ b/docs/installRdeps.R
@@ -0,0 +1,25 @@
+.install_pkg = function(p, bioc=FALSE) {
+ if(!require(package = p, character.only=TRUE)) {
+ if(bioc) {
+ BiocManager::install(pkgs = p)
+ } else {
+ install.packages(pkgs = p)
+ }
+ }
+}
+
+.install_pkg("R.utils")
+.install_pkg("BiocManager")
+.install_pkg("optparse")
+.install_pkg("devtools")
+.install_pkg("GenomicRanges", bioc=TRUE)
+.install_pkg("GenomicFeatures", bioc=TRUE)
+.install_pkg("ensembldb", bioc=TRUE)
+.install_pkg("LOLA", bioc=TRUE)
+.install_pkg("BSgenome", bioc=TRUE)
+if(!require(package = "GenomicDistributions", character.only=TRUE)) {
+ devtools::install_github("databio/GenomicDistributions")
+}
+if(!require(package = "GenomicDistributionsData", character.only=TRUE)) {
+ install.packages("http://big.databio.org/GenomicDistributionsData/GenomicDistributionsData_0.0.1.tar.gz", repos=NULL)
+}
diff --git a/docs/usage.md b/docs/usage.md
new file mode 100644
index 0000000..a457e59
--- /dev/null
+++ b/docs/usage.md
@@ -0,0 +1,160 @@
+# Usage reference
+
+BEDboss is command-line tool-warehouse of 3 pipelines for genomic interval files
+
+BEDboss include: bedmaker, bedqc, bedstat. This pipelines can be run using next positional arguments:
+
+- `bedbase all`: Runs all pipelines one in order: bedmaker -> bedqc -> bedstat
+
+- `bedbase make`: Creates Bed and BigBed files from other type of genomic interval files [bigwig|bedgraph|bed|bigbed|wig]
+
+- `bedbase qc`: Runs Quality control for bed file (Works only with bed files)
+
+- `bedbase stat`: Runs statistics for bed and bigbed files.
+
+Here you can see the command-line usage instructions for the main bedboss command and for each subcommand:
+
+## `bedboss --help`
+```console
+version: 0.1.0
+usage: bedboss [-h] [--version] {all,make,qc,stat} ...
+
+Warehouse of pipelines for BED-like files: bedmaker, bedstat, and bedqc.
+
+positional arguments:
+ {all,make,qc,stat}
+ all Run all bedboss pipelines and insert data into bedbase
+ make A pipeline to convert bed, bigbed, bigwig or bedgraph
+ files into bed and bigbed formats
+ qc Run quality control on bed file (bedqc)
+ stat A pipeline to read a file in BED format and produce
+ metadata in JSON format.
+
+options:
+ -h, --help show this help message and exit
+ --version show program's version number and exit
+```
+
+## `bedboss all --help`
+```console
+usage: bedboss all [-h] -s SAMPLE_NAME -f INPUT_FILE -t INPUT_TYPE -o
+ OUTPUT_FOLDER -g GENOME [-r RFG_CONFIG]
+ [--chrom-sizes CHROM_SIZES] [-n NARROWPEAK]
+ [--standard-chrom] [--check-qc]
+ [--open-signal-matrix OPEN_SIGNAL_MATRIX] [--ensdb ENSDB]
+ --bedbase-config BEDBASE_CONFIG [-y SAMPLE_YAML]
+ [--no-db-commit] [--just-db-commit]
+
+options:
+ -h, --help show this help message and exit
+ -s SAMPLE_NAME, --sample-name SAMPLE_NAME
+ name of the sample used to systematically build the
+ output name
+ -f INPUT_FILE, --input-file INPUT_FILE
+ Input file
+ -t INPUT_TYPE, --input-type INPUT_TYPE
+ Input type [required] options:
+ (bigwig|bedgraph|bed|bigbed|wig)
+ -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
+ Output folder
+ -g GENOME, --genome GENOME
+ reference genome (assembly)
+ -r RFG_CONFIG, --rfg-config RFG_CONFIG
+ file path to the genome config file(refgenie)
+ --chrom-sizes CHROM_SIZES
+ a full path to the chrom.sizes required for the
+ bedtobigbed conversion
+ -n NARROWPEAK, --narrowpeak NARROWPEAK
+ whether the regions are narrow (transcription factor
+ implies narrow, histone mark implies broad peaks)
+ --standard-chrom Standardize chromosome names. Default: False
+ --check-qc Check quality control before processing data. Default:
+ True
+ --open-signal-matrix OPEN_SIGNAL_MATRIX
+ a full path to the openSignalMatrix required for the
+ tissue specificity plots
+ --ensdb ENSDB A full path to the ensdb gtf file required for genomes
+ not in GDdata
+ --bedbase-config BEDBASE_CONFIG
+ a path to the bedbase configuration file
+ -y SAMPLE_YAML, --sample-yaml SAMPLE_YAML
+ a yaml config file with sample attributes to pass on
+ more metadata into the database
+ --no-db-commit skip the JSON commit to the database
+ --just-db-commit just commit the JSON to the database
+```
+
+## `bedboss make --help`
+```console
+usage: bedboss make [-h] -f INPUT_FILE [-n NARROWPEAK] -t INPUT_TYPE -g GENOME
+ -r RFG_CONFIG -o OUTPUT_BED --output-bigbed OUTPUT_BIGBED
+ -s SAMPLE_NAME [--chrom-sizes CHROM_SIZES]
+ [--standard-chrom]
+
+options:
+ -h, --help show this help message and exit
+ -f INPUT_FILE, --input-file INPUT_FILE
+ path to the input file
+ -n NARROWPEAK, --narrowpeak NARROWPEAK
+ whether the regions are narrow (transcription factor
+ implies narrow, histone mark implies broad peaks)
+ -t INPUT_TYPE, --input-type INPUT_TYPE
+ a bigwig or a bedgraph file that will be converted
+ into BED format
+ -g GENOME, --genome GENOME
+ reference genome
+ -r RFG_CONFIG, --rfg-config RFG_CONFIG
+ file path to the genome config file
+ -o OUTPUT_BED, --output-bed OUTPUT_BED
+ path to the output BED files
+ --output-bigbed OUTPUT_BIGBED
+ path to the folder of output bigBed files
+ -s SAMPLE_NAME, --sample-name SAMPLE_NAME
+ name of the sample used to systematically build the
+ output name
+ --chrom-sizes CHROM_SIZES
+ a full path to the chrom.sizes required for the
+ bedtobigbed conversion
+ --standard-chrom Standardize chromosome names. Default: False
+```
+
+## `bedboss qc --help`
+```console
+usage: bedboss qc [-h] --bedfile BEDFILE --outfolder OUTFOLDER
+
+options:
+ -h, --help show this help message and exit
+ --bedfile BEDFILE a full path to bed file to process
+ --outfolder OUTFOLDER
+ a full path to output log folder.
+```
+
+## `bedboss stat --help`
+```console
+usage: bedboss stat [-h] --bedfile BEDFILE
+ [--open-signal-matrix OPEN_SIGNAL_MATRIX] [--ensdb ENSDB]
+ [--bigbed BIGBED] [--bedbase-config BEDBASE_CONFIG]
+ [-y SAMPLE_YAML] --genome GENOME_ASSEMBLY [--no-db-commit]
+ [--just-db-commit]
+
+options:
+ -h, --help show this help message and exit
+ --bedfile BEDFILE a full path to bed file to process
+ --open-signal-matrix OPEN_SIGNAL_MATRIX
+ a full path to the openSignalMatrix required for the
+ tissue specificity plots
+ --ensdb ENSDB a full path to the ensdb gtf file required for genomes
+ not in GDdata
+ --bigbed BIGBED a full path to the bigbed files
+ --bedbase-config BEDBASE_CONFIG
+ a path to the bedbase configuration file
+ -y SAMPLE_YAML, --sample-yaml SAMPLE_YAML
+ a yaml config file with sample attributes to pass on
+ more metadata into the database
+ --genome GENOME_ASSEMBLY
+ genome assembly of the sample
+ --no-db-commit whether the JSON commit to the database should be
+ skipped
+ --just-db-commit whether just to commit the JSON to the database
+```
+