Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hg38 support #6

Draft
wants to merge 81 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
37fcb5a
Made ChrConverter more flexible (second hard-coded reference) and cen…
vinjana Dec 13, 2023
703ef68
Build script for sophiaMref.
vinjana Dec 13, 2023
769cce8
Made README more descriptive with texts from Umut's and the SophiaWor…
vinjana Dec 13, 2023
0a9e8f2
More code layout changes. Removed temporary file.
vinjana Dec 13, 2023
a5ebdb6
Removed using namespace std. More code layout changes.
vinjana Dec 13, 2023
57f8be4
Documentation for hg37 chromosome name parser.
vinjana Dec 13, 2023
f4f4dce
Replace obscure -2 into NA variable to get more self-descriptive code.
vinjana Dec 13, 2023
cbcf2dd
Major refactoring to have a map-based hg38 index/converter.
vinjana Dec 15, 2023
fee16d2
Fixed some compile instructions for static compile.
vinjana Dec 15, 2023
5904ab0
Compile fix for static libraries with Makefile.
vinjana Dec 15, 2023
9a894b6
Fixed many errors, but not all yet.
vinjana Dec 20, 2023
fec30e9
Fixed all apparent compile errors. Some semantic errors as warning re…
vinjana Dec 20, 2023
e7100ed
Made compressedMrefIndexToIndex() return an optional to account for f…
vinjana Dec 20, 2023
e12fe90
Makefile simplified. Static compile still doesn't work, though.
vinjana Dec 20, 2023
944fe11
Fully implemented dynamic and static comilation with Makefile. Releas…
vinjana Dec 21, 2023
31ca645
Merge pull request #7 from DKFZ-ODCF/static-compile
vinjana Dec 21, 2023
aca3c6a
Encapsulate implementation detail into ChrConverter.isIgnoredChromoso…
vinjana Dec 22, 2023
f9419f8
Removed unnecessary imports. Wrap main functions into try-catch to ca…
vinjana Dec 22, 2023
e1a0649
Fixes after coderabbit review.
vinjana Dec 22, 2023
dfbacc2
Switched to parse chromosome name function taking start and end itera…
vinjana Dec 22, 2023
6428131
Fixed input check and improved error message. Changed `make STATIC=tr…
vinjana Dec 22, 2023
ee9d46a
Fixed cornercase with supplementary alignment tag (SA:Z:) containing …
vinjana Jan 8, 2024
2073adf
Minor.
vinjana Jan 8, 2024
3476b32
Added comment.
vinjana Jan 8, 2024
4e017b8
Improve chrName parser for breakpoint files, to be able to deal with …
vinjana Jan 11, 2024
c2d56dc
Added googletest/gtest-based unit tests for the chromosome name parse…
vinjana Jan 12, 2024
dc6acf0
Fixed Makefile to build static binaries, except the testRunner (which…
vinjana Jan 12, 2024
bf4d4d6
Fixed wrong initialiazation order of GlobalAppConfig.
vinjana Jan 12, 2024
78213ee
Bugs fixed, but parser of breakpoints still broken.
vinjana Jan 15, 2024
92d012e
Added boost::stacktrace for better error reporting.
vinjana Jan 15, 2024
55ba762
Documented output BED file and code.
vinjana Jan 16, 2024
2817ae2
Added failing Breakpoint parser tests.
vinjana Jan 16, 2024
adeace4
Hack to prevent failure in tests due to reinitialized singleton.
vinjana Jan 16, 2024
d14705a
Added documentation for sophiaMref output file. Added `binaries` targ…
vinjana Jan 17, 2024
cd83acf
Added tests. Removed TODO.
vinjana Jan 17, 2024
b9e69e6
Added contig classes, also to hg38 (not working yet). Range approach.
vinjana Jan 17, 2024
39e46c0
Migrated Hg38ChrConverter to one that reads configuration file. Yet h…
vinjana Jan 23, 2024
4190cfc
Transformed the Hg38ChrConverter into a simple GenericChrConverter th…
vinjana Jan 23, 2024
b66c253
Changed from hg37 to classic_hg37 as default Hg37ChrConverter.
vinjana Jan 23, 2024
da6fc36
Removed obsolete IndexRange. Fixed is$category methods for classic_hg…
vinjana Jan 24, 2024
604ca78
Much improved error messages for parsing.
vinjana Jan 24, 2024
3917745
Some minor exception tweaks.
vinjana Jan 24, 2024
fabd143
Fixed a bug I intruduced. Noted that SuppAlignmentAnno::SuppAlignment…
vinjana Jan 24, 2024
b5ba8ed
Comments and removed strange 'm' printed by `sophiaAnnotate`.
vinjana Jan 24, 2024
c0158d7
Added CONTRIBUTORS.md.
vinjana Jan 24, 2024
e027b40
README update. Added hs37d5+phix.tsv to resources.
vinjana Jan 24, 2024
47eac31
Fix chromosome label.
vinjana Jan 25, 2024
cb551f3
Fixed some chromosome index mappings that could have been entirely av…
vinjana Jan 29, 2024
5aac8c8
Major refactorings. Have unsigned and signed int for ChrIndex and Com…
vinjana Jan 31, 2024
da58046
Commandline fixes created by refactoring.
vinjana Feb 1, 2024
7b7d7ec
Switched types for ChrIndex and CompressedMrefIndex and fixed errors.…
vinjana Feb 1, 2024
aabceca
Comments and layout changes.
vinjana Feb 1, 2024
355f377
Reviewed my own code, to find any bugs.
vinjana Feb 5, 2024
6c58604
Added assertions (can be turned off with -DNDEBUG for production) to …
vinjana Feb 6, 2024
e55b4a5
Refined tests for valid chromosome indices (classic_hg37), at constru…
vinjana Feb 6, 2024
3d54031
Fixed check.
vinjana Feb 6, 2024
8c86dd8
Small refactorings
vinjana Feb 6, 2024
7c07a52
Refactored ChrCategory, wrote some tests, and fixed some issues.
vinjana Feb 6, 2024
7ba7053
Changed all index and position types back to int-length. MrefEntry an…
vinjana Feb 6, 2024
2b59b12
Changed MrefEntry::validity into signed char, to reduce space require…
vinjana Feb 7, 2024
50860d0
Little refactoring for code readability.
vinjana Feb 7, 2024
9c0d5df
Fixed the memory issue.
vinjana Feb 7, 2024
b9926db
Removed all assertions, to see whether that fixes the memory issue.
vinjana Feb 13, 2024
78fda5e
Small changes.
vinjana Feb 14, 2024
e230e17
Added test for internal static constructor function of Hg37ChrConverter.
vinjana Feb 14, 2024
5022eda
Removed assertValid calls completely. Instead now use IndexRange class.
vinjana Feb 14, 2024
9e013ce
Minor.
vinjana Feb 20, 2024
d2bb840
Preallocate full memory for MasterMrefProcessor (a lot), rather than …
vinjana Mar 11, 2024
450ccc7
Some small refactorings, edits, and comments.
vinjana Mar 13, 2024
dc7f525
Added tests for SuppAlignmentAnno.
vinjana Mar 14, 2024
600ffdd
Some debugging output to catch a specific difference
vinjana Mar 15, 2024
b673eca
Fixed some incorrectly translated conditions concerning decoys.
vinjana Mar 15, 2024
4721b35
Split gonosomes into X and Y classes. One condition (CompressedMrefIn…
vinjana Mar 18, 2024
88d4715
Made assemblyName a non-static, const value in ChrConverter.
vinjana Mar 18, 2024
7d2e571
Calculate memory allocated by MasterRefProcessor and report in GB.
vinjana Mar 18, 2024
9a00c6b
Minor.
vinjana Mar 18, 2024
f1c8209
Answers to CodeRabbit.
vinjana Mar 18, 2024
e65cfbf
Fixed incorrect index returned from parse function.
vinjana Mar 18, 2024
95089fe
Merge pull request #22 from DKFZ-ODCF/review-my-code
vinjana Mar 18, 2024
a8ac0ae
Adapted SvEvent conditions again to make them make similar to dealing…
vinjana Mar 18, 2024
3c14f17
Removed all using namespace std.
vinjana Mar 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
.idea/
*.iml
*.o
Release_sophia/sophia
Release_sophiaAnnotate/sophiaAnnotate
build/*.o
include/strtk.hpp
include/rapidcsv.h
sophia
sophiaAnnotate
sophiaMref
testRunner
boost/
5 changes: 5 additions & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Individual contributions:

* Umut Toprak: Original implementation for hg37 until SOPHIA 35.
* Naga Paramasivam: Extensive testing for hg38.
* Philip R. Kensche: Development after SOPHIA 35, including documentation, code refactoring and generalization for hg38.
20 changes: 20 additions & 0 deletions CodingConventions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Coding Conventions

This is a list of some things that contributions should adhere to.
The code still has severe legacy problems with these issues.

1. Keep constructors free of side effects. Prefer using static factory functions.
2. If there are many parameters, use a builder pattern.
3. Never deliberately use `nullptr` values. Prefer `std::optional` instead.
4. Don't use `using namespace std`
5. Use the type system to your advantage.
6. Separate parsing code. In general, learn something about how to separate concerns in programming, learn the SOLID principles, **and** apply them.
7. Use the standard library, including the Standard Template Library. Prefer searching in the C++ standard library over reinventing the wheel.
8. Use the boost library. It is already a dependency. Prefer search in boost over reinventing the wheel.
9. Always try to leave the code in a better (more readable, understandable, maintainable, safer) state than you found it.
10. C++ is hard to read, so don't make it harder than necessary. Code readability is **not optional**.
* Use descriptive but concise names for variables, functions, classes, etc.
* Keep lines short.
* Prefer vertical lists (e.g. of function arguments) over horizontal lists).
* Avoid "what" and "how" comments. Prefer "why" comments.
11. If you figure out something really hard and unintuitive, add a comment instead of letting the next programmer figure it out again.
182 changes: 182 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# Compiler
CXX = x86_64-conda_cos6-linux-gnu-g++

INCLUDE_DIR = ./include
BUILD_DIR = ./build
SRC_DIR = ./src
TESTS_DIR = ./tests

# Compiler flags
LIBRARY_FLAGS := -lz -lm -lrt -lboost_system -lboost_iostreams -lboost_program_options -ldl -lbacktrace -lboost_stacktrace_backtrace -DBOOST_STACKTRACE_USE_BACKTRACE
LDFLAGS := $(LDFLAGS) -flto=auto -rdynamic -no-pie
# Turned on -Wsign-conversion to get warnings for conversions between signed and unsigned types. This is a cheap workaround to implementing ChrIndex and CompressedMrefIndex.
CXXFLAGS := -I$(INCLUDE_DIR) $(CXXFLAGS) -std=c++20 -flto=auto -Wall -Wextra -Wsign-conversion -Werror -c -fmessage-length=0 -Wno-attributes -lbacktrace -lboost_stacktrace_backtrace -DBOOST_STACKTRACE_USE_BACKTRACE

ifeq ($(static),true)
LD_BEGIN_FLAGS := -L$(boost_lib_dir)
LD_END_FLAGS := $(LDFLAGS) -static -static-libgcc -static-libstdc++ $(LIBRARY_FLAGS)
else
LD_BEGIN_FLAGS :=
LD_END_FLAGS := $(LDFLAGS) $(LIBRARY_FLAGS)
endif

ifeq ($(develop),true)
# NOTE: Generally, it is a good idea to compile with -O0 during development, because it seems
# that thus the compiler actually catches some binary dependencies during linking that
# will otherwise be missed.
CXXFLAGS := $(CXXFLAGS) -O0 -ggdb3 -DDEBUG -fno-inline -fno-elide-constructors -fno-omit-frame-pointer -fno-optimize-sibling-calls
LD_END_FLAGS := $(LD_END_FLAGS) -Wl,-O0 -ggdb3 -DDEBUG -fno-inline
else
# Ignore some leftover unused variables from SvEvent::assessBreakpointClonalityStatus.
CXXFLAGS := $(CXXFLAGS) -O3 -DNDEBUG
endif

# Source files
SOURCES = $(wildcard $(SRC_DIR)/*.cpp)

# Object files should have .o instead of .cpp.
# Note, we put the objects for production and tests both into the build directory.
OBJECTS = $(SOURCES:$(SRC_DIR)/%.cpp=$(BUILD_DIR)/%.o)

# Binaries
BINARIES = sophiaMref sophia sophiaAnnotate

OBJECTS_WITH_MAIN = $(BUILD_DIR)/sophia.o $(BUILD_DIR)/sophiaMref.o $(BUILD_DIR)/sophiaAnnotate.o

# Default rule
all: test $(BINARIES)

# Define search paths for different file suffices
vpath %.h $(INCLUDE_DIR)
vpath %.hpp $(INCLUDE_DIR)
vpath %.cpp $(SRC_DIR)
vpath %_test.cpp $(TESTS_DIR)

# Ensure the build/ directory exists.
$(BUILD_DIR):
mkdir -p $@

# Retrieve StrTK
$(INCLUDE_DIR)/strtk.hpp:
wget -c https://raw.githubusercontent.com/ArashPartow/strtk/d2b446bf1f7854e8b08f5295ec6f6852cae066a2/strtk.hpp -O $(INCLUDE_DIR)/strtk.hpp
vinjana marked this conversation as resolved.
Show resolved Hide resolved

# Retrieve rapidcsv
$(INCLUDE_DIR)/rapidcsv.h:
wget -c https://github.com/d99kris/rapidcsv/raw/v8.80/src/rapidcsv.h -O $(INCLUDE_DIR)/rapidcsv.h

# General compilation rule for object files that have matching .h files.
$(BUILD_DIR)/%.o: $(SRC_DIR)/%.cpp $(INCLUDE_DIR)/%.h $(INCLUDE_DIR)/strtk.hpp $(INCLUDE_DIR)/rapidcsv.h Makefile | $(BUILD_DIR)
$(CXX) $(CXXFLAGS) -c $< -o $@

# Test source files with the suffix _test.cpp
TEST_SOURCES = $(wildcard $(TESTS_DIR)/*_test.cpp)

# ... and the corresponding object files, all with the suffix _test.o.
TEST_OBJECTS = $(TEST_SOURCES:$(TESTS_DIR)/%_test.cpp=$(BUILD_DIR)/%_test.o)

# There are usually no .h files for test files, so we need a separate rule for test files.
$(BUILD_DIR)/%_test.o: $(TESTS_DIR)/%_test.cpp $(TESTS_DIR)/Fixtures.h Makefile | $(BUILD_DIR)
$(CXX) $(CXXFLAGS) -c $< -o $@

# Link the testRunner.
testRunner: $(TEST_OBJECTS) $(filter-out $(OBJECTS_WITH_MAIN),$(OBJECTS))
$(CXX) $(LD_BEGIN_FLAGS) -o testRunner $^ $(LDFLAGS) $(LIBRARY_FLAGS) -Wl,-Bdynamic -lgtest -lgtest_main -pthread

# Rule for running the tests
test: testRunner
./testRunner

# Rules for sophia
$(BUILD_DIR)/sophia.o: $(SRC_DIR)/sophia.cpp Makefile | $(BUILD_DIR)
$(CXX) $(CXXFLAGS) -c $< -o $@
sophia: $(BUILD_DIR)/global.o \
$(BUILD_DIR)/ChrCategory.o \
$(BUILD_DIR)/ChrInfo.o \
$(BUILD_DIR)/ChrInfoTable.o \
$(BUILD_DIR)/Alignment.o \
$(BUILD_DIR)/Breakpoint.o \
$(BUILD_DIR)/ChosenBp.o \
$(BUILD_DIR)/ChrConverter.o \
$(BUILD_DIR)/IndexRange.o \
$(BUILD_DIR)/Hg37ChrConverter.o \
$(BUILD_DIR)/GenericChrConverter.o \
$(BUILD_DIR)/MateInfo.o \
$(BUILD_DIR)/SamSegmentMapper.o \
$(BUILD_DIR)/Sdust.o \
$(BUILD_DIR)/SuppAlignment.o \
$(BUILD_DIR)/HelperFunctions.o \
$(BUILD_DIR)/GlobalAppConfig.o \
$(BUILD_DIR)/sophia.o
$(CXX) $(LD_BEGIN_FLAGS) -o $@ $^ $(LD_END_FLAGS)

# Rules for sophiaAnnotate
$(BUILD_DIR)/sophiaAnnotate.o: $(SRC_DIR)/sophiaAnnotate.cpp Makefile | $(BUILD_DIR)
$(CXX) $(CXXFLAGS) -c $< -o $@
sophiaAnnotate: $(BUILD_DIR)/global.o \
$(BUILD_DIR)/ChrCategory.o \
$(BUILD_DIR)/ChrInfo.o \
$(BUILD_DIR)/ChrInfoTable.o \
$(BUILD_DIR)/MateInfo.o \
$(BUILD_DIR)/Alignment.o \
$(BUILD_DIR)/AnnotationProcessor.o \
$(BUILD_DIR)/Breakpoint.o \
vinjana marked this conversation as resolved.
Show resolved Hide resolved
$(BUILD_DIR)/BreakpointReduced.o \
$(BUILD_DIR)/ChrConverter.o \
$(BUILD_DIR)/IndexRange.o \
$(BUILD_DIR)/Hg37ChrConverter.o \
$(BUILD_DIR)/GenericChrConverter.o \
$(BUILD_DIR)/DeFuzzier.o \
$(BUILD_DIR)/GermlineMatch.o \
$(BUILD_DIR)/MrefEntry.o \
$(BUILD_DIR)/MrefEntryAnno.o \
$(BUILD_DIR)/MrefMatch.o \
$(BUILD_DIR)/SuppAlignment.o \
$(BUILD_DIR)/SuppAlignmentAnno.o \
$(BUILD_DIR)/Sdust.o \
$(BUILD_DIR)/ChosenBp.o \
$(BUILD_DIR)/SvEvent.o \
$(BUILD_DIR)/HelperFunctions.o \
$(BUILD_DIR)/GlobalAppConfig.o \
$(BUILD_DIR)/sophiaAnnotate.o
$(CXX) $(LD_BEGIN_FLAGS) -o $@ $^ $(LD_END_FLAGS)

# Rules for sophiaMref
$(BUILD_DIR)/sophiaMref.o: $(SRC_DIR)/sophiaMref.cpp Makefile | $(BUILD_DIR)
$(CXX) $(CXXFLAGS) -c $< -o $@
sophiaMref: $(BUILD_DIR)/global.o \
$(BUILD_DIR)/ChrCategory.o \
$(BUILD_DIR)/ChrInfo.o \
$(BUILD_DIR)/ChrInfoTable.o \
$(BUILD_DIR)/MateInfo.o \
$(BUILD_DIR)/Alignment.o \
$(BUILD_DIR)/GlobalAppConfig.o \
$(BUILD_DIR)/ChrConverter.o \
$(BUILD_DIR)/IndexRange.o \
$(BUILD_DIR)/Hg37ChrConverter.o \
$(BUILD_DIR)/GenericChrConverter.o \
$(BUILD_DIR)/HelperFunctions.o \
$(BUILD_DIR)/SuppAlignment.o \
$(BUILD_DIR)/SuppAlignmentAnno.o \
$(BUILD_DIR)/Sdust.o \
$(BUILD_DIR)/ChosenBp.o \
$(BUILD_DIR)/MrefEntry.o \
$(BUILD_DIR)/MrefEntryAnno.o \
$(BUILD_DIR)/MrefMatch.o \
$(BUILD_DIR)/MasterRefProcessor.o \
$(BUILD_DIR)/Breakpoint.o \
$(BUILD_DIR)/BreakpointReduced.o \
$(BUILD_DIR)/GermlineMatch.o \
$(BUILD_DIR)/DeFuzzier.o \
$(BUILD_DIR)/sophiaMref.o
$(CXX) $(LD_BEGIN_FLAGS) -o $@ $^ $(LD_END_FLAGS)

binaries: $(BINARIES)


# Rule for clean
.PHONY: clean clean-all
clean:
rm -f $(BUILD_DIR)/*.o $(BINARIES)

clean-all: clean
rm -f $(INCLUDE_DIR)/strtk.hpp
Loading