(1) Update binary version to 0.7.1. Re-run all case studies and updat…

…ed information. Note that previously there was a typo in WGS case study -- instead of 209m 53s, it was 309m 53s. (2) Remove BIN_VERSION in the *_binaries.sh script because it was unused. (3) Update a few instructions in training case study. PiperOrigin-RevId: 220115965
google · Nov 6, 2018 · aba5553 · aba5553
1 parent 2040913
commit aba5553
Show file tree

Hide file tree

Showing 10 changed files with 52 additions and 72 deletions.
diff --git a/docs/deepvariant-case-study.md b/docs/deepvariant-case-study.md
@@ -138,12 +138,12 @@ fewer cores for this step.
 ## Resources used by each step
 
 Step                               | wall time
----------------------------------- | ------------------
-`make_examples`                    | 113m 12s
-`call_variants`                    | 176m 30s
-`postprocess_variants` (no gVCF)   | 20m 11s
-`postprocess_variants` (with gVCF) | 51m 25s
-total time (single machine)        | 209m 53s - 341m 7s
+---------------------------------- | -------------------
+`make_examples`                    | 113m 19s
+`call_variants`                    | 181m 40s
+`postprocess_variants` (no gVCF)   | 20m 40s
+`postprocess_variants` (with gVCF) | 54m 49s
+total time (single machine)        | 315m 39s - 349m 48s
 
 ## Variant call quality
 

diff --git a/docs/deepvariant-exome-case-study.md b/docs/deepvariant-exome-case-study.md
@@ -91,12 +91,12 @@ More discussion can be found in the
 ## Resources used by each step
 
 Step                               | wall time
----------------------------------- | ---------
-`make_examples`                    | 13m 38s
-`call_variants`                    | 1m 55s
-`postprocess_variants` (no gVCF)   | 0m 12s
-`postprocess_variants` (with gVCF) | 1m 17s
-total time (single machine)        | ~17m
+---------------------------------- | -----------------
+`make_examples`                    | 13m 39s
+`call_variants`                    | 2m 0s
+`postprocess_variants` (no gVCF)   | 0m 13s
+`postprocess_variants` (with gVCF) | 1m 18s
+total time (single machine)        | 15m 52s - 17m 10s
 
 ## Variant call quality
 

diff --git a/docs/deepvariant-quick-start.md b/docs/deepvariant-quick-start.md
@@ -54,7 +54,7 @@ Before you start running, you need to have the following input files:
 1.  A model checkpoint for DeepVariant. We'll refer to this as `${MODEL}` below.
 
 ```bash
-BIN_VERSION="0.7.0"
+BIN_VERSION="0.7.1"
 MODEL_VERSION="0.7.0"
 
 MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"

diff --git a/docs/deepvariant-tpu-training-case-study.md b/docs/deepvariant-tpu-training-case-study.md
@@ -39,7 +39,7 @@ YOUR_PROJECT=REPLACE_WITH_YOUR_PROJECT
 OUTPUT_GCS_BUCKET=REPLACE_WITH_YOUR_GCS_BUCKET
 
 BUCKET="gs://deepvariant"
-BIN_VERSION="0.7.0"
+BIN_VERSION="0.7.1"
 
 MODEL_VERSION="0.7.0"
 MODEL_BUCKET="${BUCKET}/models/DeepVariant/${MODEL_VERSION}/DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"
@@ -142,7 +142,7 @@ sudo docker pull gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}"
 ) >"${LOG_DIR}/training_set.with_label.make_examples.log" 2>&1
 ```
 
-This took 107m50.530s. We will want to shuffle this on Dataflow later, so I will
+This took 107m8.521s. We will want to shuffle this on Dataflow later, so I will
 copy it to GCS bucket first:
 
 ```
@@ -170,7 +170,7 @@ gsutil -m cp ${OUTPUT_DIR}/training_set.with_label.tfrecord-?????-of-00064.gz \
 ) >"${LOG_DIR}/validation_set.with_label.make_examples.log" 2>&1
 ```
 
-This took: 8m10.566s.
+This took: 8m49.066s.
 
 Validation set is small here. We will just shuffle locally later, so no need to
 copy to out GCS bucket.
@@ -193,7 +193,7 @@ copy to out GCS bucket.
 ) >"${LOG_DIR}/test_set.no_label.make_examples.log" 2>&1
 ```
 
-This took: 2m17.151s.
+This took: 2m14.576s.
 
 We don't need to shuffle test set. It will eventually be used in the final
 evaluation evaluated with `hap.py` on the whole set.
@@ -226,36 +226,22 @@ Here is an example. You might or might not need to install everything below:
 
 ```
 sudo apt -y install python-dev python-pip
-pip install --upgrade pip
-pip install --user --upgrade virtualenv
-```
-
-A virtual environment is a directory tree containing its own Python
-distribution. To create a virtual environment, create a directory and run:
+# This will make sure the pip command is bound to python2
+python2 -m pip install --user --upgrade --force-reinstall pip
 
+export PATH="$HOME/.local/bin:$PATH"
 ```
-virtualenv ${HOME}/virtualenv_beam
-```
-
-A virtual environment needs to be activated for each shell that is to use it.
-Activating it sets some environment variables that point to the virtual
-environment's directories.
 
-To activate a virtual environment in Bash, run:
+Install Beam:
 
 ```
-. ${HOME}/virtualenv_beam/bin/activate
-```
-
-Once this is activated, install Beam:
-
-```
-pip install apache-beam
+pip install --user apache-beam
 ```
 
 Get the code that shuffles:
 
 ```
+mkdir -p ${SHUFFLE_SCRIPT_DIR}
 wget https://raw.githubusercontent.com/google/deepvariant/r0.7/tools/shuffle_tfrecords_beam.py -O ${SHUFFLE_SCRIPT_DIR}/shuffle_tfrecords_beam.py
 ```
 
@@ -280,7 +266,7 @@ Output is in
 
 Data config file is in `${OUTPUT_DIR}/validation_set.dataset_config.pbtxt`.
 
-This took 11m15.090s.
+This took 10m40.558s.
 
 For the training set, it is too large to be running with DirectRunner on this
 instance, so we use the DataflowRunner. Before that, please make sure you enable
@@ -290,7 +276,7 @@ http://console.cloud.google.com/flows/enableapi?apiid=dataflow.
 Then, install Dataflow:
 
 ```
-pip install google-cloud-dataflow
+pip install --user google-cloud-dataflow
 ```
 
 Shuffle using Dataflow.
@@ -320,13 +306,7 @@ In order to have the best performance, you might need extra resources such as
 machines or IPs within a region. That will not be in the scope of this case
 study here.
 
-My run took about 38m3.435s on Dataflow.
-
-After this is done, deactivate the virtualenv:
-
-```
-deactivate
-```
+My run took about 40m28.401s on Dataflow.
 
 The output path can be found in the dataset_config file by:
 
@@ -345,11 +325,12 @@ In the output, the `tfrecord_path` should be valid paths in gs://.
 
 name: "HG001"
 tfrecord_path: "YOUR_GCS_BUCKET/training_set.with_label.shuffled-?????-of-?????.tfrecord.gz"
-num_examples: 3890596
+num_examples: 3857898
+
 ```
 
 In my run, it wrote to 341 shards:
-`${OUTPUT_BUCKET}/training_set.with_label.shuffled-?????-of-00341.tfrecord.gz`
+`${OUTPUT_BUCKET}/training_set.with_label.shuffled-?????-of-00364.tfrecord.gz`
 
 ### Start a Cloud TPU
 
@@ -366,17 +347,18 @@ Here is what I did to start a TPU.
 First, check all existing TPUs by running this command:
 
 ```
-gcloud beta compute tpus list --zone=us-central1-f
+gcloud compute tpus list --zone=us-central1-f
 ```
 
 In my case, I don't see any existing TPUs.
 
 Then, I ran the following command to start a TPU:
 
 ```
-time gcloud beta compute tpus create ${USER}-demo-tpu \
-  --range=10.240.2.0/29 \
-  --version=1.9 \
+time gcloud compute tpus create ${USER}-demo-tpu \
+  --network=default \
+  --range=10.240.1.0/29 \
+  --version=1.11 \
   --zone=us-central1-f
 ```
 
@@ -390,21 +372,21 @@ This command took about 5min to finish.
 After the TPU is created, we can query it by:
 
 ```
-gcloud beta compute tpus list --zone=us-central1-f
+gcloud compute tpus list --zone=us-central1-f
 ```
 
 In my case, I see:
 
 ```
-NAME                ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINTS  NETWORK  RANGE          STATUS
-pichuan-demo-tpu    us-central1-f  v2-8              10.240.2.2:8470    default  10.240.2.0/29  READY
+NAME              ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINTS  NETWORK  RANGE          STATUS
+pichuan-demo-tpu  us-central1-f  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
 ```
 
 In this example, I set up these variables:
 
 ```
 export TPU_NAME="${USER}-demo-tpu"
-export TPU_IP="10.240.2.2"
+export TPU_IP="10.240.1.2"
 ```
 
 (One more reminder about
@@ -436,7 +418,7 @@ Pointers for common issues or things you can tune:
 
 1.  TPU might not have write access to GCS bucket:
 
-    https://cloud.google.com/tpu/docs/storage-buckets#giving_your_product_name_short_access_to_gcs_name_short
+    https://cloud.google.com/tpu/docs/storage-buckets#storage_access
 
 1.  Change `save_interval_secs` to save checkpoints more frequently:
 
@@ -467,9 +449,9 @@ sudo docker run \
   --batch_size=512 > "${LOG_DIR}/eval.log" 2>&1 &
 ```
 
-`model_eval` will watch the `${TRAINING_DIR}` and start evaluting when there are
-newly saved checkpoints. It evaluates the checkpoints on the data specified in
-`validation_set.dataset_config.pbtxt`, and saves `*metrics` file to the
+`model_eval` will watch the `${TRAINING_DIR}` and start evaluating when there
+are newly saved checkpoints. It evaluates the checkpoints on the data specified
+in `validation_set.dataset_config.pbtxt`, and saves `*metrics` file to the
 directory. These files are used later to pick the best model based on how
 accurate they are on the validation set.
 
@@ -483,7 +465,7 @@ kill the process after training is no longer producing more checkpoints. And,
 command to delete the TPU:
 
 ```
-gcloud beta compute tpus delete ${TPU_NAME} --zone us-central1-f
+gcloud compute tpus delete ${TPU_NAME} --zone us-central1-f
 ```
 
 ### Use TensorBoard to visualize progress
@@ -510,7 +492,7 @@ tensorboard --logdir ${TRAINING_DIR} --port=8080
 This gives some message like:
 
 ```
-TensorBoard 1.9.0 at http://cs-6000-devshell-vm-ddb3cd66-9d0b-4e19-afcc-d4a19ba2ee06:8080 (Press CTRL+C to quit)
+TensorBoard 1.11.0 at http://cs-6000-devshell-vm-ec39a769-4665-4f57-bdff-2c9192f44b7e:8080 (Press CTRL+C to quit)
 ```
 
 But that link is not usable directly. I clicked on the “Web Preview” on the top
@@ -538,7 +520,7 @@ things. In my run, I took these screenshots after the run completed:
 When you are done with training, make sure to clean up the TPU:
 
 ```
-gcloud beta compute tpus delete ${TPU_NAME} --zone us-central1-f
+gcloud compute tpus delete ${TPU_NAME} --zone us-central1-f
 ```
 
 ### Pick a model
@@ -562,11 +544,11 @@ python ${SHUFFLE_SCRIPT_DIR}/print_f1.py \
 The top line I got was this:
 
 ```
-43600   96769.0 0.998961331563
+44200   96772.0 0.998945823601
 ```
 
 This means the model checkpoint that performs the best on the validation set is
-`${TRAINING_DIR}/model.ckpt-43600`. Based on this result, a few thoughts came
+`${TRAINING_DIR}/model.ckpt-44200`. Based on this result, a few thoughts came
 into mind:
 
 1.  Training more steps didn't seem to help much. Did the training overfit?
@@ -587,7 +569,7 @@ run on CPUs:
     /opt/deepvariant/bin/call_variants \
     --outfile "${OUTPUT_DIR}/test_set.cvo.tfrecord.gz" \
     --examples "${OUTPUT_DIR}/test_set.no_label.tfrecord@${N_SHARDS}.gz" \
-    --checkpoint "${TRAINING_DIR}/model.ckpt-43600" \
+    --checkpoint "${TRAINING_DIR}/model.ckpt-44200" \
 ) >"${LOG_DIR}/test_set.call_variants.log" 2>&1 &
 ```
 
@@ -635,8 +617,8 @@ To summarize, the accuracy is:
 
 Type  | # FN | # FP | Recall   | Precision | F1\_Score
 ----- | ---- | ---- | -------- | --------- | ---------
-INDEL | 225  | 136  | 0.977552 | 0.986827  | 0.982167
-SNP   | 66   | 49   | 0.999004 | 0.999260  | 0.999132
+INDEL | 229  | 141  | 0.977153 | 0.986343  | 0.981726
+SNP   | 71   | 58   | 0.998928 | 0.999125  | 0.999026
 
 The baseline we're comparing to is to directly use the WGS model (`--checkpoint
 ${GCS_PRETRAINED_WGS_MODEL}`) to make the calls.

diff --git a/docs/images/TensorBoardAccuracy.png b/docs/images/TensorBoardAccuracy.png
diff --git a/docs/images/TensorBoardSpeed.png b/docs/images/TensorBoardSpeed.png
diff --git a/scripts/run_wes_case_study_binaries.sh b/scripts/run_wes_case_study_binaries.sh
@@ -12,7 +12,6 @@ set -euo pipefail
 ## Preliminaries
 # Set a number of shell variables, to make what follows easier to read.
 BASE="${HOME}/exome-case-study"
-BIN_VERSION="0.7.0"
 MODEL_VERSION="0.7.0"
 MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wes_standard"
 MODEL_HTTP_DIR="https://storage.googleapis.com/deepvariant/models/DeepVariant/${MODEL_VERSION}/${MODEL_NAME}"

diff --git a/scripts/run_wes_case_study_docker.sh b/scripts/run_wes_case_study_docker.sh
@@ -6,7 +6,7 @@ set -euo pipefail
 ## Preliminaries
 # Set a number of shell variables, to make what follows easier to read.
 BASE="${HOME}/exome-case-study"
-BIN_VERSION="0.7.0"
+BIN_VERSION="0.7.1"
 MODEL_VERSION="0.7.0"
 MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wes_standard"
 MODEL_HTTP_DIR="https://storage.googleapis.com/deepvariant/models/DeepVariant/${MODEL_VERSION}/${MODEL_NAME}"

diff --git a/scripts/run_wgs_case_study_binaries.sh b/scripts/run_wgs_case_study_binaries.sh
@@ -12,7 +12,6 @@ set -euo pipefail
 ## Preliminaries
 # Set a number of shell variables, to make what follows easier to read.
 BASE="${HOME}/case-study"
-BIN_VERSION="0.7.0"
 MODEL_VERSION="0.7.0"
 MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"
 MODEL_HTTP_DIR="https://storage.googleapis.com/deepvariant/models/DeepVariant/${MODEL_VERSION}/${MODEL_NAME}"

diff --git a/scripts/run_wgs_case_study_docker.sh b/scripts/run_wgs_case_study_docker.sh
@@ -6,7 +6,7 @@ set -euo pipefail
 ## Preliminaries
 # Set a number of shell variables, to make what follows easier to read.
 BASE="${HOME}/case-study"
-BIN_VERSION="0.7.0"
+BIN_VERSION="0.7.1"
 MODEL_VERSION="0.7.0"
 MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"
 MODEL_HTTP_DIR="https://storage.googleapis.com/deepvariant/models/DeepVariant/${MODEL_VERSION}/${MODEL_NAME}"