Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization #482

faltinl · 2021-09-12T17:18:45Z

I have Tensorflow 2.6 installed according to recommendations given in 835#, i.e. together with Reticulate and Keras for R under RStudio together with R4.0.5. My notebook uses an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz processor.

Prior to that, however, I had to update my NVIDIA GeForce RTX 2060 from CUDA10.1 to CUDA10.2. This finally succeeded, albeit with numerous problems caused by apparently incomplete DLLs in NIVIDIA installation files, but requested by TF2.6. Details are given in #577.

A test run under TF2.6 using the Iris toy program shown in #1172 produced the warning message

I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

at the beginning of the first training epoch, but, despite this warning, the training process continued smoothly with the usual results. Otherwise, during the first run, the GPU produced the usual messages that all required DLLs were found etc., from which I conclude that the GPU is working as it should.

However, if I try to run my slightly more elaborate own programm (4xCONV2d & 2xDENSE layers with x/y-dimensions (samples, 5,5,1)/(samples, 4) and very low signal-to-noise ratio), the R session is aborted (!) already during the first epoch right after producing the same MLIR warning shown above. This is extremely inconvenient, as there are no causal hints at all to be found after the crash.

The program was correctly working with the same GPU under CUDA10.1 and Tensorflow 2.4. In the meantime, as I am unable to find any deficiencies of the whole setup, I have deinstalled the GPU completely. The MLIR warning still appears, but the program is running flawlessly - though considerably slower, of course 😢.

I have the impression that perhaps the control functions surrounding the network part of my program are somehow not compatible with the MLIR optimization process (I do very early stopping to prevent overfitting - depending on temporal data, sometimes even single epochs have to be and are executed regularly). This raises the question whether it is possible to somehow deactivate the MLIR optimization completely.

Any hints are welcome.

The text was updated successfully, but these errors were encountered:

faltinl changed the title ~~Tensorflow 2.6 with GPU aborts program with reference to MLIR optimization~~ Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization #482

Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization #482

faltinl commented Sep 12, 2021 •

edited

Loading

Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization #482

Tensorflow 2.6 with GPU aborts R session with reference to MLIR optimization #482

Comments

faltinl commented Sep 12, 2021 • edited Loading

faltinl commented Sep 12, 2021 •

edited

Loading