Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization #482

Open
faltinl opened this issue Sep 12, 2021 · 0 comments

Comments

@faltinl
Copy link

faltinl commented Sep 12, 2021

I have Tensorflow 2.6 installed according to recommendations given in 835#, i.e. together with Reticulate and Keras for R under RStudio together with R4.0.5. My notebook uses an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz processor.

Prior to that, however, I had to update my NVIDIA GeForce RTX 2060 from CUDA10.1 to CUDA10.2. This finally succeeded, albeit with numerous problems caused by apparently incomplete DLLs in NIVIDIA installation files, but requested by TF2.6. Details are given in #577.

A test run under TF2.6 using the Iris toy program shown in #1172 produced the warning message

I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

at the beginning of the first training epoch, but, despite this warning, the training process continued smoothly with the usual results. Otherwise, during the first run, the GPU produced the usual messages that all required DLLs were found etc., from which I conclude that the GPU is working as it should.

However, if I try to run my slightly more elaborate own programm (4xCONV2d & 2xDENSE layers with x/y-dimensions (samples, 5,5,1)/(samples, 4) and very low signal-to-noise ratio), the R session is aborted (!) already during the first epoch right after producing the same MLIR warning shown above. This is extremely inconvenient, as there are no causal hints at all to be found after the crash.

The program was correctly working with the same GPU under CUDA10.1 and Tensorflow 2.4. In the meantime, as I am unable to find any deficiencies of the whole setup, I have deinstalled the GPU completely. The MLIR warning still appears, but the program is running flawlessly - though considerably slower, of course 😢.

I have the impression that perhaps the control functions surrounding the network part of my program are somehow not compatible with the MLIR optimization process (I do very early stopping to prevent overfitting - depending on temporal data, sometimes even single epochs have to be and are executed regularly). This raises the question whether it is possible to somehow deactivate the MLIR optimization completely.

Any hints are welcome.

@faltinl faltinl changed the title Tensorflow 2.6 with GPU aborts program with reference to MLIR optimization Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant