Skip to content

Commit

Permalink
Merge pull request #11 from yfyang86/v0.0.2
Browse files Browse the repository at this point in the history
V0.0.2
  • Loading branch information
yfyang86 authored Dec 4, 2022
2 parents 4317153 + ef7fb1e commit 2b0bc6f
Show file tree
Hide file tree
Showing 7 changed files with 89 additions and 61 deletions.
97 changes: 62 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,31 @@
# PMAT
PMAT detects the differentially methylated regions (DMRs) for unordered pairs.

| Contents | Logs |
|:---------|:-----|
| Date | 2022-11-22 |
| Version | v-0.1 |
| Date | 2022-12-04 |
| Version | v-0.2 |

Pairwise methylation assocation test (PMAT) is a computational tool tailored for identifying DMRs between unordered pairs like twins. In this tool, absolute methylation difference between unorderd pairs was maximized in DMRs identification. A folded normal (FN) test based on the framework of likelihood ratio test was proposed recently and implemented to test the methylation difference between unordered pairs in each methylation region. In order to improve the approximation precision of a FN test when sample size is not large enough, we further established PMAT with Bartlett correction (PMAT-C). In PMAT-C, a folded normal test with Bartlett correction (FN-C) was implemented to test the methylation differences between unordered pairs.

In PMAT with FN test, the asymptotic distribution of a likelihood ratio statistic under the null hypothesis is a mixture of chi-squared distribution. That is

$$LRT \sim 0.5 \chi_0^2 + 0.5 \chi_1^2.$$

In PMAT-C with FN-C test, the mixture proportion of asymptotic distribution with an empirical proportion $\hat{EP}$ is :
$$LRT \sim (1-\hat{EP}) \chi_0^2 + \hat{EP} \chi_1^2,$$ where $\hat{EP}$ is estimated using $$\hat{EP} = 0.60105772-4.0224 n^{(-0.89)}.$$

Specially, PMAT-C with $\hat{EP} =0.5$ leads to PMAT.

**NOTE:** Currently, all the parameters are hard coded in the `./proc/src/mathematics.cpp` file (the leading 4 entries in the `samParam` array):
```cpp
double foldednomalpvalue(double *a, int m, double *b, int n, bool logit=true, double logittuning=0.00001, bool nonZero = true){
bool useBartlette = true;
// config.txt
double samParam[6]={0.60105772, 0., -4.0224, 0.89, 1., 1.};
...
}
```
Users could change the parameter and re-compile to get the corresponding pmat-c.
## C/C++ program
Expand Down Expand Up @@ -41,7 +62,7 @@ Libraries: GSL-2.x
```bash
# debian/ubuntu
apt-install libgsl-dev
apt install libgsl-dev
# Centos/RH
yum install libgsl-devel
```
Expand All @@ -54,7 +75,6 @@ yum install libgsl-devel
cd pmat
cd ./release/Win64
tar vxf PMAT-0.1-win64-rtools42.tar.gz
./pmat --help
```

> **Note**: For MAC OS user, `Home brew` is recommended to install GSL with `brew install gsl` and properly tune the thread numbers. Otherwise, one should compile the GSL from the source and run `pmat` with `-t 1` option.
Expand All @@ -75,7 +95,7 @@ make
- [x] ARM(MAC M1-ARM64/Linux-ARMv7/ARMv8);
- [x] MIPS/LoongArch(龙芯3A5000, UOS).

**NOTE**: Binary release for Mac/Windows/Linux(Ubuntu 18.04) users is also [available](https://github.com/yfyang86/pmat/releases). But due to the library/platform dependencies/limitations, some function may not work as designed (eg, parallel computing on Mac OSX). Please contact the maintainer or raise an [issue](https://github.com/yfyang86/pmat/issues) if you encounter such problems.
**NOTE**: Binary release for Mac/Windows/Linux(Ubuntu 18.04) users is also [available](https://github.com/yfyang86/pmat/releases). But due to the library/platform dependencies/limitations, some function may not work as designed (eg, parallel computing on Mac OSX). Please contact the maintainer or raise an [issue](https://github.com/yfyang86/pmat/issues) if you encounter such problems. Currently, only V0.0.1 is available.

### Test

Expand Down Expand Up @@ -126,47 +146,55 @@ chr1 10542 0.96 0.90625 0.978378378378378 0.969298245614035 0.882352941176471 0.

Run the test code in linux

```
./pmat -t 32 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.mr.DMR
```bash
./pmat -t 32 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.mr.DMR
```
Or use the binary release. The top 10 lines in the output file are:

```
chr1 12505 12679 1 0.108321 7 0.0034718 0.90758 0.83462 0.80487
chr1 661864 661928 1 0.066676 5 0.0016 0.65065 0.75823 0.79141
chr1 662608 662692 1 0.113871 5 0.28328 0.79049 0.70271 0.73315
chr1 715040 715121 1 0.214549 5 0.46519 0.17554 0.56345 0.69286
chr1 545153 545215 1 0.362152 5 0.032756 0.048372 0.59842 0.76797
chr1 713375 713449 1 0.083285 5 0.46782 0.79534 0.68649 0.72001
chr1 842293 842431 1 0.242725 5 0.41702 0.72288 0.27554 0.32715
chr1 136631 136718 1 0.140452 6 0.014309 0.28271 0.85088 0.83067
chr1 136814 136876 1 0.121317 5 0.45345 0.12209 0.87179 0.82835
chr1 136912 137169 1 0.183106 11 0.33916 0.88037 0.62192 0.65789
chr1 661864 661928 0.066676 5 0.0021516 0.75823 0.79141
chr1 662608 662692 0.113871 5 0.33195 0.70271 0.73315
chr1 713375 713453 0.091691 6 0.5 0.67901 0.69787
chr1 842293 842431 0.242725 5 0.5 0.27554 0.32715
chr1 12505 12754 0.119788 8 0.00046499 0.83472 0.81572
chr1 545078 545215 0.371799 11 0.0082285 0.60262 0.71239
chr1 854528 854739 0.159495 5 0.00016186 0.75215 0.67748
chr1 837884 838368 0.106765 14 0.12877 0.84016 0.81546
chr1 848507 848646 0.076560 5 0.5 0.67124 0.65959
chr1 136631 136718 0.140452 6 0.015472 0.85088 0.83067
```
The following table explains the columns in output file.

| Column| Comment | Example |
|:--------|:--------|:--------|
| 1 | chromosome | Example: chr1 |
| 2| start | Example: 10497 |
| 3 | end | Example: 12679 |
| 4 | q-value | Example: 1 |
| 5 | avarge absolute 5mC difference between pairs| Example: 0.108321 |
| 6| #CpGs in a region | Example: 7 |
| 7 | p-value via FN-C test or FN test | Example: 0.0034718 |
| 8 | p-value via 2d KS test | Example: 0.90758 |
| 9 | average 5mC in one group| Example: 0.83462 |
| 10 | average 5mC in the other group | Example: 0.80487 |
| 2| start | Example: 661864 |
| 3 | end | Example: 661928 |
| 4 | avarge absolute 5mC difference between pairs| Example: 0.066676 |
| 5| #CpGs in a region | Example: 5 |
| 6 | p-value via FN-C test or FN test | Example: 0.0021516 |
| 7 | average 5mC in one group| Example: 0.75823 |
| 8 | average 5mC in the other group | Example: 0.79141 |


Run the following command to add column names and FDR value via BH method.
Run the following command to add column names and FDR value via BH method. (Perl script is available upon request.)

```
echo -e "chr\tstart\tstop\tq-value\tabs.methyl.diff\tCpGs\tpFN\tp2DKS\tpre\tpost\tpFN.fdr" > test.mr.DMR.fdr
```bash
echo -e "chr\tstart\tstop\tabs.methyl.diff\tCpGs\tpFN\t5mC.A\t5mC.B" > test.mr.DMR.fdr
num_lines.pl test.mr.DMR > o
pvalues_BP_correction.pl o 0 8 | sort -n -k1 | mycut.pl -v -f1 | format_tab.pl >> test.mr.DMR.fdr
```

Alternatively, readers can add column names and FDR values via BH method using the following R script.
```R
data <- read.table("test.mr.DMR", header = F)
colnames(data) <- c("chr", "start", "stop", "abs.methyl.diff", "CpGs", "pFN", "5mC.A", "5mC.B")
data$FDR <- p.adjust(data$pFN, method = "BH")
write.table(data, "test.mr.DMR.fdr", row.names= F, sep = "\t", quote = F)
```



## License

C/C++ version (in the `./proc` folder):
Expand All @@ -183,9 +211,8 @@ C/C++ version (in the `./proc` folder):
- [ ] 2022-11: Example/Simulation
- [x] Simple demo
- [ ] Full example (in plan)
- [ ] Config file support: In the comming Version
- [ ] Adjusted p-value support : In the comming Version

## Wishing List
- [ ] Config file support for the Bartlett correction : In the comming Version
- [x] Documentation and Examples.
- [ ] `Config.txt` support without re-compile.
- [x] Adjusted p-value support : In the comming Version

- [ ] [Simulation Demos](./TODO.md):
3 changes: 1 addition & 2 deletions proc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ clean:
rm -f test.mr.DMR

test1:
./pmat -t 32 -a A -b B -X 104 -Y 104 -m 5 -d 0.01 ../examples/test.dat > test.mr.DMR

./pmat -t 32 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.mr.DMR

test: test1

Expand Down
4 changes: 2 additions & 2 deletions proc/Makefile.win
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
src/metseg.o


all: metpmatilene
all: pmat

pmat: ${METSEGOBJ}
${CXX} $(CXXFLAGS) ${METSEGOBJ} -o pmat $(LDFLAGS)
Expand All @@ -35,7 +35,7 @@ clean:
rm -f test.mr.DMR

test1:
./pmat -t 2 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.dat.DMR
./pmat -t 2 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.mr.DMR

test: test1

Expand Down
4 changes: 4 additions & 0 deletions proc/config.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
a: 0.6132
b: 0
c: 4.0224
d: -0.89
18 changes: 8 additions & 10 deletions proc/src/manopt-win.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
*
*/
#define __WIN32__

#include <stdarg.h>
#include <stdlib.h>
#include <stdio.h>
Expand All @@ -35,6 +34,8 @@
#include <windows.h>
#endif



char* getNiceSVNVersion(const char *version) {
int i;
//char *p, *src, *subs[] = {"$Rev: ","$Date: ", " $"};
Expand All @@ -51,8 +52,7 @@ char* getNiceSVNVersion(const char *version) {
return src;
}

int
detectTerminalwidth(void) {
int detectTerminalwidth(void) {
#ifndef __WIN32__
struct winsize termwinsize;
ioctl(0,TIOCGWINSZ, &termwinsize);
Expand All @@ -67,8 +67,8 @@ detectTerminalwidth(void) {
#endif
}

unsigned char
isfloat(char *s) {

unsigned char isfloat(char *s) {
int i=0;
int len=strlen(s);
unsigned char dpt = 0;
Expand All @@ -79,17 +79,15 @@ isfloat(char *s) {
return (len==i);
}

unsigned char
isint(char *s) {
unsigned char isint(char *s) {
int i=0;
int len=strlen(s);
i += (s[i] == 43 || s[i] == 45) ? 1 : 0;
while((s[i] >= 48 && s[i] <= 57)) i++;
return (len==i);
}

void
manopt_usage(manopt_optionset *set) {
void manopt_usage(manopt_optionset *set) {
unsigned int i=0,j=0,k,l,
aptr = 0,
msglen = 0,
Expand Down Expand Up @@ -1136,4 +1134,4 @@ manopt_dumpoptionset(manopt_optionset *set) {
}
}
}
}
}
4 changes: 2 additions & 2 deletions proc/src/mathematics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -859,8 +859,8 @@ double foldednomal(double *a, int m, double *b, int n ){
double foldednomalpvalue(double *a, int m, double *b, int n, bool logit=true, double logittuning=0.00001, bool nonZero = true){
bool useBartlette = true;
// config.txt
//double samParam[6]={0.60105772, 0., -4.0224, 0.89, 1., 1.};
double samParam[6]={0.5, 0., 0, 0., 1., 1.};
double samParam[6]={0.60105772, 0., -4.0224, 0.89, 1., 1.};
//double samParam[6]={0.5, 0., 0, 0., 1., 1.};

if (m != n)
{
Expand Down
20 changes: 10 additions & 10 deletions proc/src/metseg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -633,8 +633,7 @@ void kstest(segment_t *seg , int a, int b, char mindiff, char mincpgs, char test
#endif


//ks[0]=faskstest[0];
ks[0] = p;
ks[0]=faskstest[0];
ks[1]=meandiff;
ks[2]=p;

Expand Down Expand Up @@ -2627,27 +2626,29 @@ int main(int argc, char** argv) {


//wait for all threads to terminate
while(idle != nfo.threads);

// PMAT only handles nfo.mode == 1
//if(nfo.mode == 1 || nfo.mode == 2) {
while(idle != nfo.threads)
;

if(nfo.mode == 1 || nfo.mode == 2) {
fprintf(stderr, "Number of Tests: %d\n", nfo.outputList->numberTests);
multiple_testing_correction(nfo.outputList, 1, nfo.mtc);
multiple_testing_correction(nfo.outputList, nfo.mode, nfo.mtc);
for(int i=0;i<nfo.outputList->i;i++){
if(nfo.outputList->segment_out_st[i].meandiff >= nfo.minMethDist || nfo.outputList->segment_out_st[i].meandiff <= -1* nfo.minMethDist) {
fprintf(stdout, "%s\t%d\t%d\t%f\t%d\t%.5g\t%.5g\t%.5g\n",
nfo.outputList->segment_out_st[i].chr,
nfo.outputList->segment_out_st[i].start,
nfo.outputList->segment_out_st[i].stop,
//nfo.outputList->segment_out_st[i].q,
nfo.outputList->segment_out_st[i].meandiff,
nfo.outputList->segment_out_st[i].length,
nfo.outputList->segment_out_st[i].mwu,
//nfo.outputList->segment_out_st[i].p,
nfo.outputList->segment_out_st[i].methA,
nfo.outputList->segment_out_st[i].methB);
}
}
//}
/*
}

if(nfo.mode == 3) {
fprintf(stderr, "Number of Tests: %d\n", nfo.outputList->numberTests);
multiple_testing_correction(nfo.outputList, nfo.mode, nfo.mtc);
Expand All @@ -2667,7 +2668,6 @@ int main(int argc, char** argv) {
}
}
}
*/

fflush(stdout);

Expand Down

0 comments on commit 2b0bc6f

Please sign in to comment.