Skip to content

Commit

Permalink
Add BCF support and refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
dpoznik committed Sep 15, 2023
1 parent c0c79d9 commit 2df2fda
Show file tree
Hide file tree
Showing 73 changed files with 5,926 additions and 4,014 deletions.
2 changes: 0 additions & 2 deletions .git-blame-ignore-revs

This file was deleted.

1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @dpoznik
61 changes: 56 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,59 @@
__pycache__
*.egg-info
*.pyc
# Binaries, byte compilations, etc.
#----------------------------------
__pycache__/
*.py[cod]
*.so

# Caches
#----------------------------------
.cache
.ipynb_checkpoints
.metaflow
.minio.sys
.mypy_cache
.pytest_cache
.tox

# Distribution & packaging
#----------------------------------
build/
dist/
eggs/
sdist/
wheels/
.eggs/
*.egg
*.egg-info/
_version.py

# Editors & IDEs
#----------------------------------
*~
\#*
.#*
.project
.pydevproject
.settings
.venv
output/
.vscode

# Environments
#----------------------------------
.python-version

# macOS
#----------------------------------
.DS_Store
._*
.Trash*

# Project-specific
#----------------------------------
logs/
output*/
tests/fixtures/input/1000Y.all.bcf
tests/fixtures/input/1000Y.all.bcf.csi
tests/fixtures/input/1000Y.subset.bcf
tests/fixtures/input/1000Y.subset.bcf.csi
tests/fixtures/input/ALL.chrY_10Mbp_mask.glia_freebayes_maxLikGT_siteQC.20130502.60555_biallelic_snps.vcf.gz
tests/fixtures/input/ALL.chrY_10Mbp_mask.glia_freebayes_maxLikGT_siteQC.20130502.60555_biallelic_snps.vcf.gz.tbi
!tests/fixtures/output/
39 changes: 28 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,34 @@
default_stages: [commit, merge-commit]
fail_fast: true
repos:
- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.0
hooks:
- id: flake8
types: [file, python]
args: [--select, "F401,F841"] # Check for unused imports and variables
- repo: https://github.com/pycqa/isort
rev: 5.8.0
- repo: [email protected]:PyCQA/isort.git
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 20.8b1
- repo: git@github.com:psf/black.git
rev: 23.7.0
hooks:
- id: black
language_version: python3
- repo: [email protected]:pre-commit/pre-commit-hooks.git
rev: v4.4.0
hooks:
- id: check-yaml
args: [--allow-multiple-documents]
- id: pretty-format-json
- id: trailing-whitespace
exclude: haplogroups.*.txt|isogg_tree/|isogg.[0-9.]*.txt
- repo: [email protected]:PyCQA/flake8.git
rev: 6.1.0
hooks:
- id: flake8
- repo: [email protected]:PyCQA/pydocstyle.git
rev: 6.3.0
hooks:
- id: pydocstyle
additional_dependencies: [tomli]
exclude: tests/
- repo: [email protected]:pre-commit/mirrors-mypy.git
rev: v1.5.1
hooks:
- id: mypy
additional_dependencies: [types-PyYAML]
64 changes: 64 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Changelog for `yhaplo`

Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)


## Planned

- Correct ISOGG polarization errors for a few dozen SNPs.


## [Unreleased]

No unreleased changes

[Unreleased]: https://github.com/23andMe/yhaplo/compare/2.0.2...HEAD


## [2.0.2] - 2023-09-15

This is a major clean-up and refactoring release.
Core logic has not changed, and output should be equivalent to prior versions.
The key changes from an end-user perspective are BCF support, a cleaner API,
and faster processing of most input types.

### Added
- BCF support
- Automated tests
- Optional dependencies
- `Sample` subclasses: `TextSample`, `VCFSample`, `AblockSample`
- API for processing ablocks (23andMe internal)
- `Dockerfile` defining image for Batch computes (23andMe internal)
- Compute flow (23andMe internal)
- Script for copying and altering files for open sourcing (23andMe internal)
- `CHANGELOG.md`

### Changed
- Lint and update pre-commit hooks
- Set up Drone CI (23andMe internal)
- Set up `tox` testing (23andMe internal)
- Update `Makefile` and configuration files
- Refactor for PEP-8 compliance (snake case, etc.)
- Update directory structure
- Modernize packaging and infer version dynamically
- Namespace command-line entry points: `yhaplo`, `yhaplo_convert_to_genos`, `yhaplo_plot_tree`
- Replace static methods
- Clean up logging and use file handlers
- Use f-strings
- Reformat docstrings
- Add type annotations
- Use `importlib.resources` to load metadata files
- Move example input from package to `tests/fixtures/`
- Update `README.md`, `README.23andMe.md`, and `yhaplo_manual.pdf`
- Speed up sample-major file processing
- Speed up ablock processing (23andMe internal)
- Use Pysam to process VCF/BCF input
- Map physical coordinates to block indexes (23andMe internal)
- Handle platform SNPs natively (23andMe internal)

### Removed
- Support for Python 2 and Python 3.8
- Use of research-environment utilities (23andMe internal)

[2.0.2]: https://github.com/23andMe/yhaplo/compare/1.1.2..2.0.2

Loading

0 comments on commit 2df2fda

Please sign in to comment.