Skip to content
This repository has been archived by the owner on Oct 29, 2023. It is now read-only.

Scrub binary files from git history #162

Open
wants to merge 580 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
580 commits
Select commit Hold shift + click to select a range
41f8e42
Fixed counter names to be a bit more consistent.
deflaux Mar 11, 2015
e0d3e82
Merge pull request #34 from iliat/dev-broad
iliat Mar 12, 2015
6871e62
Nuke pipeline that has moved elsewhere.
deflaux Mar 13, 2015
e1b4fa6
Bug fix for dataflow workaround.
deflaux Mar 13, 2015
6ab2cb4
Add 'start' to partial request fields.
deflaux Mar 13, 2015
85f648d
Merge remote-tracking branch 'upstream/master'
deflaux Mar 13, 2015
de9882d
Update changes from upstream/master.
deflaux Mar 13, 2015
c28b005
Allow start/end when reading from the API, improve script documentation.
iliat Mar 13, 2015
feab158
Merge pull request #42 from iliat/dev-broad
deflaux Mar 13, 2015
d6c0453
Merge pull request #41 from deflaux/master
deflaux Mar 13, 2015
9f1d939
Add annotation utilities for determining variant effect
calbach Mar 6, 2015
ad072a2
Implement sample dataflow pipeline for variant annotation
calbach Mar 6, 2015
62e78bb
Merge pull request #37 from googlegenomics/variant-annotation
deflaux Mar 19, 2015
b76cc7b
Bump utils-java version.
deflaux Mar 25, 2015
f92cb4c
Merge pull request #43 from deflaux/master
deflaux Mar 25, 2015
2a515c1
Remove work around for multiple dataflow workers.
deflaux Mar 25, 2015
b3702a9
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Mar 25, 2015
1f05576
[maven-release-plugin] prepare for next development iteration
deflaux Mar 25, 2015
dd9a649
Fixing some bugs in ReadConverter
lbergelson Mar 27, 2015
63ef838
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Mar 25, 2015
821a956
[maven-release-plugin] prepare for next development iteration
deflaux Mar 25, 2015
59ab093
Update to DataflowSDK version 0.3.150326.
deflaux Mar 30, 2015
6ea98b1
Nuke direct dependency on google-api-services-dataflow.
deflaux Mar 30, 2015
d59a9fb
Merge pull request #45 from lbergelson/lb_ReadConverterFix
wbrockman Mar 31, 2015
f9e63a2
Pipeline-specific options are now listed in --help.
deflaux Mar 31, 2015
211b426
Merge remote-tracking branch 'origin/remove-workaround'
deflaux Apr 1, 2015
38ec487
Added instructions for pre-built jar.
deflaux Apr 1, 2015
2ce3cfc
Merge pull request #47 from deflaux/master
deflaux Apr 2, 2015
a1ad861
Fix sign up link.
deflaux Apr 2, 2015
b6652d0
Add support for headless usage.
deflaux Apr 6, 2015
b0437ee
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
jean-philippe-martin Apr 8, 2015
5c0cd2b
[maven-release-plugin] prepare for next development iteration
jean-philippe-martin Apr 8, 2015
a375a25
Rename --headless to --noLaunchBrowser
deflaux Apr 9, 2015
af27f3a
Whitespace fix for java style compliance.
deflaux Apr 9, 2015
4bb3cb8
Merge pull request #51 from googlegenomics/makeRead
deflaux Apr 9, 2015
cc83a44
Merge pull request #50 from deflaux/master
deflaux Apr 9, 2015
b87d77a
Direct readers to the Google Genomics Cookbook.
deflaux Apr 9, 2015
1e187ce
Switch back to lowercase bucket names
deflaux Apr 10, 2015
cb8656e
Merge pull request #52 from deflaux/master
deflaux Apr 10, 2015
1699ce5
Update annotationSetIds.
deflaux Apr 14, 2015
e396870
Merge pull request #53 from deflaux/master
calbach Apr 14, 2015
f206eee
fixed CountReads
jean-philippe-martin Apr 16, 2015
98af774
added GCSFilenameTest
jean-philippe-martin Apr 16, 2015
17f5eca
Merge pull request #55 from googlegenomics/fixCountReads
Apr 16, 2015
f56011c
GenomicsDatasetOptions:validateOptions calls into GCSFilename
jean-philippe-martin Apr 16, 2015
923157b
m
jean-philippe-martin Apr 16, 2015
df746bc
Merge pull request #56 from googlegenomics/refactorSample
jean-philippe-martin Apr 17, 2015
79d3ed6
Updated plink link
gregmcinnes Apr 27, 2015
eba9948
Updated plink link
gregmcinnes Apr 27, 2015
aa1b46c
Merge pull request #58 from gregmcinnes/patch-1
deflaux Apr 28, 2015
f072097
Merge pull request #59 from gregmcinnes/patch-2
deflaux Apr 29, 2015
71cd095
GCSHelper to download files easily
jean-philippe-martin May 4, 2015
93add95
reusing the app name already in offlineAuth
jean-philippe-martin May 4, 2015
d90d87b
using an API key, for continuous testing
jean-philippe-martin May 4, 2015
0ea122b
Remove @Required validation for --output.
deflaux May 4, 2015
a0d6965
Ensure that deletions with a null alt are considered variants.
deflaux May 4, 2015
3864849
Removing GCSFilename, as it is better to use GcsPath
jean-philippe-martin May 5, 2015
107db7c
GSCHelper "integration" test
jean-philippe-martin May 5, 2015
2b9baa4
GCSHelper integration test
jean-philippe-martin May 5, 2015
95bf2b9
Merge pull request #62 from googlegenomics/GCSHelper
deflaux May 5, 2015
81230e0
Merge remote-tracking branch 'upstream/master'
deflaux May 5, 2015
f2b912e
check for empty bucket or file name
jean-philippe-martin May 5, 2015
f29ac44
m
jean-philippe-martin May 6, 2015
eaec01e
Improve readability of predicates.
deflaux May 6, 2015
d649a65
Merge pull request #64 from googlegenomics/RemoveGCSFilename
deflaux May 6, 2015
46d5caa
Merge remote-tracking branch 'upstream/master'
deflaux May 6, 2015
6572cda
Merge pull request #63 from deflaux/master
deflaux May 6, 2015
af2f102
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux May 6, 2015
159df22
[maven-release-plugin] prepare for next development iteration
deflaux May 6, 2015
bc7f3f2
update to Dataflow SDK 0.4.150414
jean-philippe-martin May 6, 2015
6843aaf
Merge pull request #67 from googlegenomics/NewSDKVersion
deflaux May 6, 2015
0dc5649
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux May 7, 2015
e286ebe
[maven-release-plugin] prepare for next development iteration
deflaux May 7, 2015
3b0a6f2
first integration test for CountReads
jean-philippe-martin May 7, 2015
722762a
Merge pull request #68 from googlegenomics/CountReadsTest
deflaux May 7, 2015
1fb131a
add failsafe plugin
jean-philippe-martin May 8, 2015
6fa0d38
Merge pull request #69 from googlegenomics/CountReadsTest2
deflaux May 8, 2015
30a8780
test for running on cloud.
jean-philippe-martin May 8, 2015
83afed8
project is an env var
jean-philippe-martin May 8, 2015
cb82931
Merge pull request #71 from googlegenomics/CountReadsTest2
jean-philippe-martin May 8, 2015
df520fb
testCloudWithAPI
jean-philippe-martin May 11, 2015
9e59fbd
delete files after use
jean-philippe-martin May 11, 2015
8e3cf64
config change that fixes mvn verify
jean-philippe-martin May 11, 2015
43a94b8
Merge pull request #75 from googlegenomics/Failsafe
deflaux May 12, 2015
ee64f69
testCloudWithAPI
jean-philippe-martin May 12, 2015
55ae81f
Merge pull request #73 from googlegenomics/CountReadsTest3
jean-philippe-martin May 12, 2015
0c25f32
merge
jean-philippe-martin May 12, 2015
107f8c0
Merge branch 'CountReadsCleanup'
jean-philippe-martin May 12, 2015
83709c6
delete files after use
jean-philippe-martin May 12, 2015
7227113
Fixing BAM sharding bugs
iliat May 13, 2015
57f2888
type change in a finalize method
iliat May 14, 2015
bddb143
Merge pull request #76 from iliat/count-reads
iliat May 14, 2015
1e8a9a4
Update UCSC transcript set to reimported version
calbach May 18, 2015
95ddaf1
added ValidationStringency option to BAM reader
jean-philippe-martin May 19, 2015
58fa8b3
Merge pull request #82 from googlegenomics/Stringency
jean-philippe-martin May 19, 2015
247411f
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
jean-philippe-martin May 21, 2015
2066566
[maven-release-plugin] prepare for next development iteration
jean-philippe-martin May 22, 2015
855d018
Add integration tests using the NA12877_S1 dataset.
jakeakopp May 22, 2015
a7c4c74
More unit tests for CountReads and BAM functionality
May 24, 2015
08c161b
Merge pull request #85 from iliat/master
iliat May 26, 2015
e3006ad
Merge pull request #86 from jakeakopp/integration-test
deflaux May 27, 2015
1456031
Removing the special case genomicsSecretsFile in favor of secretsFile…
tovanadler May 27, 2015
cb5a245
Remove one extra newline from previous commit.
tovanadler May 27, 2015
ea7f487
Adding and updated jar and removing count_reads.sh after removing
tovanadler May 28, 2015
d4dfa70
fix bug when working across projects
jean-philippe-martin May 29, 2015
7ba9bee
Merge pull request #87 from tovanadler/master
deflaux May 29, 2015
f0b7f1c
Merge pull request #88 from googlegenomics/fix_gcs_auth
jean-philippe-martin May 29, 2015
9b1b7e7
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
jean-philippe-martin May 29, 2015
25b5f51
[maven-release-plugin] prepare for next development iteration
jean-philippe-martin May 29, 2015
45ce54b
Added the CalculateCoverage pipeline/tests.
Jun 3, 2015
5e841d3
Merge pull request #90 from googlegenomics/coveragePipeline
Jun 3, 2015
d44bccd
Moving the PosRgsMq file from pipelines directory to model directory.
Jun 3, 2015
3b49d89
Merge pull request #91 from googlegenomics/movePosRgsMq
deflaux Jun 3, 2015
49398f4
class doc for ReadReader
jean-philippe-martin Jun 3, 2015
73eaa37
Merge pull request #92 from googlegenomics/docReadReader
jean-philippe-martin Jun 4, 2015
8783939
Minor documentation tweaks.
deflaux Jun 4, 2015
6058e28
Update maven to create fat jar in the package phase.
deflaux Jun 4, 2015
a542111
Alert code readers to the relevant cookbook entries.
deflaux Jun 4, 2015
a16affb
Merge pull request #93 from deflaux/master
deflaux Jun 4, 2015
ddf6a0f
Merge pull request #94 from deflaux/bundle
mbookman Jun 5, 2015
33956fd
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Jun 8, 2015
9dd8c4d
[maven-release-plugin] prepare for next development iteration
deflaux Jun 8, 2015
fdbdf7a
Merge pull request #96 from googlegenomics/automate-bundle
deflaux Jun 10, 2015
342cf78
Fix jar name.
deflaux Jun 10, 2015
f50a32a
Adding gRPC integration into CalculateCoverage pipeline
Jun 11, 2015
eea898d
Remove whitespace
Jun 11, 2015
b27ecf8
Merge pull request #97 from Careyjmac/master
Jun 11, 2015
91a8140
Adding classes to assist with streaming from gRPC
Jun 17, 2015
c267856
Updating pom.xml
Jun 19, 2015
fdac679
Bump utils-java version for latest GRPC.
deflaux Jun 22, 2015
509bf1a
Merge pull request #99 from Careyjmac/master
Jun 22, 2015
760637c
mention gcloud auth login
jean-philippe-martin Jun 26, 2015
3f88836
Merge pull request #103 from jean-philippe-martin/gcloud_login
jean-philippe-martin Jun 26, 2015
db0b66e
Merge remote-tracking branch 'upstream/master'
deflaux Jun 29, 2015
0f31ead
Use Dataflow's oauth flow.
deflaux Jun 30, 2015
9fc2bdd
The SamRecord can have attributes that are byte[]. This results in in…
davidadamsphd Jun 26, 2015
4ab8db4
Merge pull request #105 from googlegenomics/ReadConverterAttributeFix
davidadamsphd Jun 30, 2015
c918581
Updating streamers
Jul 1, 2015
19c492e
Have shard size be a constant
Jul 1, 2015
fc1fda5
Revert hard-coded appname.
deflaux Jul 1, 2015
815779d
Merge remote-tracking branch 'upstream/master'
deflaux Jul 1, 2015
3607624
Merge pull request #104 from deflaux/master
deflaux Jul 1, 2015
9ff458a
Removing ReadConverter as it's functionality is now in utils-java.
davidadamsphd Jul 2, 2015
42d8a5e
Merge pull request #106 from Careyjmac/master
Jul 10, 2015
d84ccef
Merge pull request #108 from googlegenomics/UpdateReadConversions
davidadamsphd Jul 10, 2015
bdf1875
Adding option to read unmapped reads as expected by GATK/Picard
iliat Jul 2, 2015
a0a9765
Merge pull request #107 from iliat/bam
iliat Jul 17, 2015
4dbe7b8
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Jul 17, 2015
ad95847
Revert "[maven-release-plugin] prepare release google-genomics-datafl…
deflaux Jul 17, 2015
104e65e
[maven-release-plugin] prepare for next development iteration
deflaux Jul 17, 2015
3838b84
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Jul 17, 2015
d4628c0
[maven-release-plugin] prepare for next development iteration
deflaux Jul 17, 2015
a6c2200
Update travis to check against java8.
deflaux Jul 24, 2015
4c1bdb7
Fix javadoc lint.
deflaux Jul 24, 2015
56e63b4
Adding ReadUtils and updating VariantUtils for upcoming VerifyBamId p…
Jul 24, 2015
411f260
Merge pull request #110 from deflaux/fix-javadoc
Jul 24, 2015
bc1f5c5
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Jul 24, 2015
74c6014
[maven-release-plugin] prepare for next development iteration
deflaux Jul 24, 2015
24e1007
Bump DataflowSDK version.
deflaux Jul 27, 2015
9da2b73
Merge pull request #112 from deflaux/master
deflaux Jul 28, 2015
9433f18
Bump DataflowSDK version.
deflaux Jul 28, 2015
957c21b
Merge branch 'master' of https://github.com/Careyjmac/dataflow-java i…
deflaux Jul 28, 2015
affee08
Merge branch 'Careyjmac-master'
deflaux Jul 28, 2015
61be3ba
Merge remote-tracking branch 'upstream/master'
deflaux Jul 29, 2015
393dc31
Add Apache license header.
deflaux Jul 29, 2015
5d0650c
Merge pull request #113 from deflaux/master
deflaux Jul 29, 2015
ef59009
Add VariantSimilarity integration test.
deflaux Jul 29, 2015
226d7c8
Update expected result to reflect stable sort on indices.
deflaux Jul 30, 2015
5f75155
Switch from string to integer callSet indices.
deflaux Jul 30, 2015
3a88854
Merge pull request #115 from deflaux/master
deflaux Jul 31, 2015
64ffb47
Fix package paths.
deflaux Jul 29, 2015
ab4f48a
Refactor sharding into utils-java.
deflaux Jul 29, 2015
aca8bfb
More refactoring of sharding into utils-java.
deflaux Aug 3, 2015
a9a8503
Update VariantSimilarity pipeline to optionally use streaming.
deflaux Aug 3, 2015
2047ca2
Merge pull request #120 from deflaux/refactor-redo
deflaux Aug 3, 2015
38a2b5a
Remove obsolete option.
deflaux Aug 3, 2015
62ee355
Merge pull request #121 from deflaux/refactor-redo
deflaux Aug 3, 2015
9f586bc
Adding VerifyBamId pipeline
Aug 4, 2015
653eb91
Adding math library to pom.xml
Aug 4, 2015
bfbc797
Fixing compile issue
Aug 4, 2015
28fb4b5
Merge pull request #123 from Careyjmac/master
Aug 6, 2015
c1cc742
Refactor streamers to utils-java for reuse by spark.
deflaux Aug 6, 2015
29b306d
Merge pull request #124 from deflaux/refactor-redo
deflaux Aug 7, 2015
502a1bd
mention mvn install -DskipITs
jean-philippe-martin Aug 7, 2015
11b78c0
Merge pull request #125 from jean-philippe-martin/mention_skiptests
jean-philippe-martin Aug 7, 2015
88ac703
Moved sharding outside DF pipeline to prevent erroneous fusing
iliat Aug 7, 2015
61a2a1f
Merge pull request #126 from iliat/fixing-sharding
iliat Aug 7, 2015
7b584eb
Sharded BAM Writer
iliat Jul 20, 2015
4c855ab
Merge pull request #127 from iliat/sharded-bam-writer
iliat Aug 8, 2015
c4857d8
Shard boundary semantics now supported for gRPC.
deflaux Aug 8, 2015
e7fab16
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
jean-philippe-martin Aug 10, 2015
a3d2603
[maven-release-plugin] prepare for next development iteration
jean-philippe-martin Aug 10, 2015
20dc2a1
Merge pull request #130 from deflaux/refactor-redo
deflaux Aug 10, 2015
c73bb7a
Update to release of utils-java.
deflaux Aug 11, 2015
cc4dd97
Merge pull request #131 from googlegenomics/refactor-redo
deflaux Aug 11, 2015
b1f00ae
stop registerGenomicsCoders from filling up the screen with log messages
jean-philippe-martin Aug 11, 2015
b695ce0
Merge pull request #132 from jean-philippe-martin/log_less
jean-philippe-martin Aug 11, 2015
c2f1e0f
Revert back to string indices for variant similarity.
deflaux Aug 13, 2015
3028a83
Merge pull request #134 from deflaux/master
deflaux Aug 13, 2015
7ea0676
Bump utils-java version.
deflaux Aug 13, 2015
62e88ff
Merge pull request #136 from deflaux/master
ssgross Aug 18, 2015
1531321
Sharded BAM Writer, merged from dev.branch
iliat Jul 20, 2015
f17bbbc
Merge pull request #140 from iliat/sharded-bam-writer-merge
iliat Sep 11, 2015
b318a5a
Bump utils-java.
deflaux Sep 21, 2015
49fb9b1
Merge pull request #144 from deflaux/bump-utils-java
Sep 21, 2015
b7b2d9b
Break out ShardedBAMWriting.java into multiple files.
jakeakopp Sep 16, 2015
4883074
Upgrade to dataflows 1.1.0 and fix all tests broken by the change.
jakeakopp Sep 29, 2015
9203f61
Merge pull request #145 from jakeakopp/upgrade-sdk
jakeakopp Sep 30, 2015
efd10ba
Merge pull request #143 from jakeakopp/sharded-bam-writer
jakeakopp Sep 30, 2015
40c8189
Use a newer version of maven-jar-plugin.
deflaux Oct 6, 2015
0022af3
Minor optimization to skip cloning of non-variant segments.
deflaux Oct 9, 2015
b8f53d8
Merge pull request #146 from deflaux/master
deflaux Oct 13, 2015
085fce0
Refactor read sharding to support streaming of a potentially large nu…
deflaux Oct 13, 2015
7d04f3d
Fix formatting.
deflaux Oct 14, 2015
d7a5396
Merge pull request #147 from deflaux/master
deflaux Oct 16, 2015
f602016
Add a v1 version of JoinNonVariantSegmentsWithVariants.
deflaux Oct 22, 2015
2b920e5
Merge pull request #148 from deflaux/master
deflaux Oct 23, 2015
bd799dd
Fix an off-by-one error in the bam Sharder and bam Reader.
jakeakopp Oct 23, 2015
fea57d6
Merge pull request #150 from jakeakopp/tweak-expecation
jakeakopp Nov 2, 2015
849e78d
Simplify input to this transform.
deflaux Oct 31, 2015
f2dbade
Merge pull request #151 from deflaux/master
deflaux Nov 2, 2015
098fa72
Bump utils-java.
deflaux Nov 13, 2015
3d9b038
Merge pull request #152 from deflaux/master
deflaux Nov 14, 2015
a8d9337
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Nov 16, 2015
dab162d
[maven-release-plugin] prepare for next development iteration
deflaux Nov 16, 2015
b57d023
Bump dataflow and genomics v1beta2 dependencies.
deflaux Nov 20, 2015
4f0678e
Fallback to GenericJsonCoder provider.
deflaux Nov 21, 2015
d7490d4
Correct the details in the error message.
deflaux Nov 23, 2015
e294e0e
Merge pull request #153 from deflaux/master
deflaux Nov 24, 2015
50f5da3
Merge remote-tracking branch 'upstream/verifyBamId'
deflaux Nov 30, 2015
16ced46
Update VerifyBamId for recent code changes.
deflaux Dec 2, 2015
1eaa70d
Fix command line parameter description.
deflaux Dec 3, 2015
930895c
Merge pull request #154 from deflaux/master
deflaux Dec 3, 2015
a2054e9
Bump dataflow version.
deflaux Dec 9, 2015
1b6dc6a
Use Proto2Coder instead of SerializableCoder.
deflaux Dec 9, 2015
7a2fe79
Remove obsolete includes.
deflaux Dec 4, 2015
86faeeb
Switch VerifyBamId to v1 Position.
deflaux Dec 9, 2015
77f5a1e
Merge pull request #157 from deflaux/master
deflaux Dec 10, 2015
8346f31
Update IdentityByState pipeline to gRPC.
deflaux Nov 25, 2015
53321f2
Refactor pipeline auth.
deflaux Dec 9, 2015
4c16c16
Fix whitespace.
deflaux Dec 11, 2015
b177826
Add alpn jars.
deflaux Dec 11, 2015
1336b21
Merge branch 'master' of github.com:deflaux/dataflow-java
deflaux Dec 11, 2015
7c7742e
Merge pull request #159 from deflaux/master
deflaux Dec 11, 2015
00e9396
Refactor options.
deflaux Dec 15, 2015
2b3ebbd
Fix all local variables hiding others.
deflaux Dec 16, 2015
0320af9
Update instructions.
deflaux Dec 17, 2015
1463b59
Merge pull request #160 from deflaux/master
deflaux Dec 17, 2015
89f52dd
[maven-release-plugin] prepare release google-genomics-dataflow-v1bet…
deflaux Dec 17, 2015
3599f15
[maven-release-plugin] prepare for next development iteration
deflaux Dec 17, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
client_secrets*.json
.idea
*.iml
target
.classpath
.project
.settings
bin
*~
lib/bwa-0.7.9a
.metadata
*TestPipeline.java
.Rproj.user
11 changes: 11 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
language: java

jdk:
- oraclejdk8
- oraclejdk7
- openjdk7

script: mvn test javadoc:javadoc

after_success:
- mvn clean cobertura:cobertura -Dcobertura.report.format=xml coveralls:cobertura
75 changes: 75 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
How to contribute
===================================

First of all, thank you for contributing!

The mailing list
----------------

For general questions or if you are having trouble getting started, try the
`Google Genomics Discuss mailing list <https://groups.google.com/forum/#!forum/google-genomics-discuss>`_.
It's a good way to sync up with other people who use googlegenomics including the core developers. You can subscribe
by sending an email to ``[email protected]`` or just post using
the `web forum page <https://groups.google.com/forum/#!forum/google-genomics-discuss>`_.


Local development
-----------------

With Maven you can locally install a SNAPSHOT version of the code, to use from other projects
directly without having to wait for the Maven repository. Use:

`mvn install`

to run the full tests and do a local install. You can also use

`mvn install -DskipITs`

to run only the unit tests and do a local install. This is faster.

Submitting issues
-----------------

If you are encountering a bug in the code or have a feature request in mind - file away!


Submitting a pull request
-------------------------

If you are ready to contribute code, Github provides a nice `overview on how to create a pull request
<https://help.github.com/articles/creating-a-pull-request>`_.

Some general rules to follow:

* Do your work in `a fork <https://help.github.com/articles/fork-a-repo>`_ of this repo.
* Create a branch for each update that you're working on.
These branches are often called "feature" or "topic" branches. Any changes
that you push to your feature branch will automatically be shown in the pull request.
* Keep your pull requests as small as possible. Large pull requests are hard to review.
Try to break up your changes into self-contained and incremental pull requests.
* The first line of commit messages should be a short (<80 character) summary,
followed by an empty line and then any details that you want to share about the commit.
* Please try to follow the existing syntax style

When you submit or change your pull request, the Travis build system will automatically run tests.
If your pull request fails to pass tests, review the test log, make changes and
then push them to your feature branch to be tested again.


Contributor License Agreements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All pull requests are welcome. Before we can submit them though, there is a legal hurdle we have to jump.
You'll need to fill out either the individual or corporate Contributor License Agreement
(CLA).

* If you are an individual writing original source code and you're sure you
own the intellectual property, then you'll need to sign an `individual CLA
<https://developers.google.com/open-source/cla/individual>`_.
* If you work for a company that wants to allow you to contribute your work,
then you'll need to sign a `corporate CLA
<https://developers.google.com/open-source/cla/corporate>`_.

Follow either of the two links above to access the appropriate CLA and
instructions for how to sign and return it. Once we receive it, we'll be able to
accept your pull requests.
341 changes: 341 additions & 0 deletions EclipseCodeFormat.xml

Large diffs are not rendered by default.

2 changes: 0 additions & 2 deletions README.md

This file was deleted.

146 changes: 146 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
==============
dataflow-java |Build Status|_ |Build Coverage|_
==============

.. |Build Status| image:: http://img.shields.io/travis/googlegenomics/dataflow-java.svg?style=flat
.. _Build Status: https://travis-ci.org/googlegenomics/dataflow-java

.. |Build Coverage| image:: http://img.shields.io/coveralls/googlegenomics/dataflow-java.svg?style=flat
.. _Build Coverage: https://coveralls.io/r/googlegenomics/dataflow-java?branch=master

If you are ready to start coding, take a look at the information below. But if you are
looking for a task-oriented list (e.g., `How do I compute principal coordinate analysis
with Google Genomics? <http://googlegenomics.readthedocs.org/en/latest/use_cases/compute_principal_coordinate_analysis/index.html>`_),
a better place to start is the `Google Genomics Cookbook <http://googlegenomics.readthedocs.org/en/latest/index.html>`_ .

Getting started
---------------

#. First git clone this repository.

#. If you have not already done so, follow the Google Genomics `getting started instructions <https://cloud.google.com/genomics/install-genomics-tools>`_ to set up your environment including `installing gcloud <https://cloud.google.com/sdk/>`_ and running ``gcloud init``.

#. If you have not already done so, follow the Google Cloud Dataflow `getting started instructions <https://cloud.google.com/dataflow/getting-started>`_ to set up your environment for Dataflow.

#. This project now includes code for calling the Genomics API using `gRPC <http://www.grpc.io>`_. To use gRPC, you'll need a version of ALPN that matches your JRE version.

#. See the `ALPN documentation <http://www.eclipse.org/jetty/documentation/9.2.10.v20150310/alpn-chapter.html>`_ for a table of which ALPN jar to use for your JRE version.
#. Then download the correct version from `here <http://mvnrepository.com/artifact/org.mortbay.jetty.alpn/alpn-boot>`_.

Local Run
---------
To use this code, build the client using `Apache Maven`_::

cd dataflow-java
mvn package

Then you can run a pipeline locally with the command line, passing in the Project ID and Google Cloud Storage bucket you made in the first step. This command runs the VariantSimilarity pipeline (which runs PCoA on a dataset)::

java -Xbootclasspath/p:/YOUR/PATH/TO/alpn-boot-YOUR-VERSION.jar \
-cp target/google-genomics-dataflow*-runnable.jar \
com.google.cloud.genomics.dataflow.pipelines.VariantSimilarity \
--variantSetId=3049512673186936334 \
--references=chr17:41196311:41277499 \
--output=gs://your-bucket/output/localtest.txt

Run on Google Compute Engine
----------------------------
To deploy your pipeline (which runs on Google Compute Engine), ALPN is no longer needed but some additional command line arguments are required::

java -cp target/google-genomics-dataflow*-runnable.jar \
com.google.cloud.genomics.dataflow.pipelines.VariantSimilarity \
--project=your-project-id \
--variantSetId=3049512673186936334 \
--references=chr17:41196311:41277499 \
--output=gs://your-bucket/output/test.txt \
--runner=BlockingDataflowPipelineRunner \
--project=your-project-id \
--stagingLocation=gs://your-bucket/staging \
--numWorkers=1

**See the** `Google Genomics Cookbook <http://googlegenomics.readthedocs.org/>`_ **for more sample command lines for the various pipelines.**

.. _Apache Maven: http://maven.apache.org/download.cgi

Command Line Options
--------------------

Use ``--help`` to get more information about the command line options. Change
the pipeline class name below to match the one you would like to run::

java -cp google-genomics-dataflow*-runnable.jar \
com.google.cloud.genomics.dataflow.pipelines.VariantSimilarity --help

Code layout
-----------

The `Main code directory </src/main/java/com/google/cloud/genomics/dataflow>`_
contains several useful utilities:

coders:
includes ``Coder`` classes that are useful for Genomics pipelines. ``GenericJsonCoder``
can be used with any of the Java client library classes (like ``Read``, ``Variant``, etc)

functions:
contains common DoFns that can be reused as part of any pipeline.
``OutputPCoAFile`` is an example of a complex ``PTransform`` that provides a useful common analysis.

pipelines:
contains example pipelines which demonstrate how Google Cloud Dataflow can work with Google Genomics

* ``VariantSimilarity`` runs a principal coordinates analysis over a dataset containing variants, and
writes a file of graph results that can be easily displayed by Google Sheets.

* ``IdentityByState`` runs IBS over a dataset containing variants. See the `results/ibs <results/ibs>`_
directory for more information on how to use the pipeline's results.

* and several others!

readers:
contains functions that perform API calls to read data from the genomics API

utils:
contains utilities for running dataflow workflows against the genomics API

* ``DataflowWorkarounds``
contains workarounds needed to use the Google Cloud Dataflow APIs.

* ``GenomicsOptions.java`` and ``GenomicsDatasetOptions``
extend these classes for your command line options to take advantage of common command
line functionality


Maven artifact
--------------
This code is also deployed as Maven artifacts through Sonatype, including both a normal jar and a runnable jar containing all dependencies (a fat jar). The
`utils-java readme <https://github.com/googlegenomics/utils-java#releasing-new-versions>`_
has detailed instructions on how to deploy new versions.

To depend on this code, add the following to your ``pom.xml`` file::

<project>
<dependencies>
<dependency>
<groupId>com.google.cloud.genomics</groupId>
<artifactId>google-genomics-dataflow</artifactId>
<version>LATEST</version>
</dependency>
</dependencies>
</project>

You can find the latest version in
`Maven's central repository <https://search.maven.org/#search%7Cga%7C1%7Ca%3A%22google-genomics-dataflow%22>`_

For an example pipeline that depends on this code in another GitHub repository, see https://github.com/googlegenomics/codelabs/tree/master/Java/PlatinumGenomes-variant-transformation.

Project status
--------------

Goals
~~~~~
* Provide a Maven artifact which makes it easier to use Google Genomics within Google Cloud Dataflow.
* Provide some example pipelines which demonstrate how Dataflow can be used to analyze Genomics data.

Current status
~~~~~~~~~~~~~~
This code is in active development. See the github issues for more detail.
Binary file added lib/alpn-boot-7.1.3.v20150130.jar
Binary file not shown.
Binary file added lib/alpn-boot-8.1.3.v20150130.jar
Binary file not shown.
Loading