You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.
But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization
Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?
I would really appreciate the help.
The text was updated successfully, but these errors were encountered:
Hi,
I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.
I also referred to Databricks documentation (https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/train-model/tensorflow) and did install gpu enabled tensorflow mentioned as:
%pip install https://databricks-prod-cloudfront.cloud.databricks.com/artifacts/tensorflow/runtime-7.x/tensorflow-1.15.3-cp37-cp37m-linux_x86_64.whl
But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization
Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?
I would really appreciate the help.
The text was updated successfully, but these errors were encountered: