Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot assign a device for operation embedding/embeddings/Initializer/random_uniform/ #379

Open
KaganSenturk opened this issue Jul 14, 2022 · 5 comments

Comments

@KaganSenturk
Copy link

System Information

  • Windows 10
  • Python Version (3.6.13)
  • TensorFlow-DirectML Version (1.15.7)
  • Graphics card driver version ( AMD Radeon Pro V520 MxGPU)

Hi,
I have AMD GPU on my local machine and I want to train the LSTM model that requires TensorFlow. Firstly, by using TensorFlow-directML, the machine can detect GPU in the system. Code and results are below;

**from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5162271997438626014,
name: "/device:DML:0"
device_type: "DML"
memory_limit: 6797208279
locality {
}
incarnation: 12883817374713471833
physical_device_desc: "{"name": "AMD Radeon Pro V520 MxGPU", "vendor_id": 4098, "device_id": 29538, "driver_version": "27.20.11025.4019"}"]

Nothing a problem so far. But while training the model, is there any stage we need to activate this GPU? I am getting this error. Without GPU, the model starts running and I can see epoch stage. But it is a bit complex therefore I takes to time to get a result.
GPU can be detected by tensorflow but while training the model device problem occurred.
Can you guess what is the problem?

nvalidArgumentError: Cannot assign a device for operation embedding/embeddings/Initializer/random_uniform/sub: Could not satisfy explicit device specification '' because the node node embedding/embeddings/Initializer/random_uniform/sub (defined at C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) placed on device Device assignments active during op 'embedding/embeddings/Initializer/random_uniform/sub' creation:
with tf.device(None): <C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1535> was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:DML:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:DML:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:DML:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:DML:0' resource_device_name_='/job:localhost/replica:0/task:0/device:DML:0' supported_device_types_=[CPU] possible_devices_=[]
Add: DML CPU
Const: DML CPU
RandomUniform: DML CPU
Sub: DML CPU
Mul: DML CPU
Sqrt: DML CPU
VarHandleOp: DML CPU
AssignVariableOp: DML CPU
VarIsInitializedOp: DML CPU
ReadVariableOp: DML CPU
ResourceGather: DML CPU
Identity: DML CPU
ResourceScatterAdd: DML CPU
Fill: DML CPU
Shape: DML CPU
Unique: DML CPU
StridedSlice: DML CPU
UnsortedSegmentSum: CPU
AddV2: DML CPU
RealDiv: DML CPU
AssignSubVariableOp: DML CPU
NoOp: DML CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
embedding/embeddings/Initializer/random_uniform/shape (Const)
embedding/embeddings/Initializer/random_uniform/min (Const)
embedding/embeddings/Initializer/random_uniform/max (Const)
embedding/embeddings/Initializer/random_uniform/RandomUniform (RandomUniform) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
embedding/embeddings/Initializer/random_uniform/sub (Sub)
embedding/embeddings/Initializer/random_uniform/mul (Mul)
embedding/embeddings/Initializer/random_uniform (Add)
embedding/embeddings (VarHandleOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
embedding/embeddings/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
embedding/embeddings/Assign (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
embedding/embeddings/Read/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
embedding/embedding_lookup (ResourceGather) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
embedding/embedding_lookup/Identity (Identity)
VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0
training/Adam/embedding/embeddings/m/Initializer/zeros/shape_as_tensor (Const)
training/Adam/embedding/embeddings/m/Initializer/zeros/Const (Const)
training/Adam/embedding/embeddings/m/Initializer/zeros (Fill)
training/Adam/embedding/embeddings/m (VarHandleOp)
training/Adam/embedding/embeddings/m/IsInitialized/VarIsInitializedOp (VarIsInitializedOp)
training/Adam/embedding/embeddings/m/Assign (AssignVariableOp)
training/Adam/embedding/embeddings/m/Read/ReadVariableOp (ReadVariableOp)
training/Adam/embedding/embeddings/v/Initializer/zeros/shape_as_tensor (Const)
training/Adam/embedding/embeddings/v/Initializer/zeros/Const (Const)
training/Adam/embedding/embeddings/v/Initializer/zeros (Fill)
training/Adam/embedding/embeddings/v (VarHandleOp)
training/Adam/embedding/embeddings/v/IsInitialized/VarIsInitializedOp (VarIsInitializedOp)
training/Adam/embedding/embeddings/v/Assign (AssignVariableOp)
training/Adam/embedding/embeddings/v/Read/ReadVariableOp (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/Unique (Unique)
training/Adam/Adam/update_embedding/embeddings/Shape (Shape)
training/Adam/Adam/update_embedding/embeddings/strided_slice/stack (Const)
training/Adam/Adam/update_embedding/embeddings/strided_slice/stack_1 (Const)
training/Adam/Adam/update_embedding/embeddings/strided_slice/stack_2 (Const)
training/Adam/Adam/update_embedding/embeddings/strided_slice (StridedSlice)
training/Adam/Adam/update_embedding/embeddings/UnsortedSegmentSum (UnsortedSegmentSum)
training/Adam/Adam/update_embedding/embeddings/mul (Mul)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/mul_1 (Mul)
training/Adam/Adam/update_embedding/embeddings/AssignVariableOp (AssignVariableOp)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_1 (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/ResourceScatterAdd (ResourceScatterAdd)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_2 (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/mul_2 (Mul)
training/Adam/Adam/update_embedding/embeddings/mul_3 (Mul)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_3 (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/mul_4 (Mul)
training/Adam/Adam/update_embedding/embeddings/AssignVariableOp_1 (AssignVariableOp)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_4 (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/ResourceScatterAdd_1 (ResourceScatterAdd)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_5 (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/Sqrt (Sqrt)
training/Adam/Adam/update_embedding/embeddings/mul_5 (Mul)
training/Adam/Adam/update_embedding/embeddings/add (AddV2)
training/Adam/Adam/update_embedding/embeddings/truediv (RealDiv)
training/Adam/Adam/update_embedding/embeddings/AssignSubVariableOp (AssignSubVariableOp)
training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_6 (ReadVariableOp)
training/Adam/Adam/update_embedding/embeddings/group_deps (NoOp)
VarIsInitializedOp_19 (VarIsInitializedOp)
VarIsInitializedOp_37 (VarIsInitializedOp)

 [[node embedding/embeddings/Initializer/random_uniform/sub (defined at C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]]Additional information about colocations:No node-device colocations were active during op 'embedding/embeddings/Initializer/random_uniform/sub' creation.

Device assignments active during op 'embedding/embeddings/Initializer/random_uniform/sub' creation:
with tf.device(None): <C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1535>

Original stack trace for 'embedding/embeddings/Initializer/random_uniform/sub':
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel_launcher.py", line 16, in
app.launch_new_instance()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\traitlets\config\application.py", line 664, in launch_instance
app.start()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelapp.py", line 612, in start
self.io_loop.start()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\asyncio\base_events.py", line 442, in run_forever
self._run_once()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\asyncio\base_events.py", line 1462, in _run_once
handle._run()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\asyncio\events.py", line 145, in _run
self._callback(*self._args)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\ioloop.py", line 688, in
lambda f: self._run_callback(functools.partial(callback, future))
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\ioloop.py", line 741, in _run_callback
ret = callback()
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 814, in inner
self.ctx_run(self.run)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
return f(*args, **kw)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 775, in run
yielded = self.gen.send(value)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 234, in wrapper
yielded = ctx_run(next, result)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
return f(*args, **kw)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 234, in wrapper
yielded = ctx_run(next, result)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
return f(*args, **kw)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request
user_expressions, allow_stdin,
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 234, in wrapper
yielded = ctx_run(next, result)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
return f(*args, **kw)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\ipkernel.py", line 306, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 2867, in run_cell
raw_cell, store_history, silent, shell_futures)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 2895, in _run_cell
return runner(coro)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\async_helpers.py", line 68, in pseudo_sync_runner
coro.send(None)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 3072, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 3263, in run_ast_nodes
if (await self.run_code(code, result, async
=asy)):
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
concat_lstm = get_model1(tf_idf_train,X_meta_train, results,embedding_dimensions)
File "", line 17, in get_model1
mask_zero=True)(tf_idf_input) # Use masking to handle the variable sequence lengths
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 824, in call
self._maybe_build(inputs)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 2146, in _maybe_build
self.build(input_shapes)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\utils\tf_utils.py", line 306, in wrapper
output_shape = fn(instance, input_shape)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\layers\embeddings.py", line 146, in build
constraint=self.embeddings_constraint)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 529, in add_weight
aggregation=aggregation)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\training\tracking\base.py", line 712, in _add_variable_with_custom_getter
**kwargs_for_getter)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer_utils.py", line 139, in make_variable
shape=variable_shape if variable_shape else None)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 258, in call
return cls._variable_v1_call(*args, **kwargs)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 219, in _variable_v1_call
shape=shape)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 197, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 2503, in default_variable_creator
shape=shape)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 262, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py", line 1406, in init
distribute_strategy=distribute_strategy)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py", line 1537, in _init_from_args
initial_value() if init_from_fn else initial_value,
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer_utils.py", line 119, in
init_val = lambda: initializer(shape, dtype=dtype)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\init_ops.py", line 283, in call
shape, self.minval, self.maxval, dtype, seed=self.seed)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\random_ops.py", line 246, in random_uniform
result = math_ops.add(rnd * (maxval - minval), minval, name=name)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 899, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 11926, in sub
"Sub", x=x, y=y, name=name)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3371, in create_op
attrs, op_def, compute_device)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3440, in _create_op_internal
op_def=op_def)
File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1762, in init
self._traceback = tf_stack.extract_stack()

@PatriceVignola
Copy link
Contributor

The problem is that UnsortedSegmentSum hasn't been enabled for DML devices. We actually have an implementation already there, but we noticed that it was using way too much memory so we disabled it to revisit it at a later time:

#define REGISTER_UNSORTED_SEGMENT_REDUCTION_KERNEL_INDEX(type, name, op, \

A solution here would be to explicitly place RandomUniform on the CPU instead of letting it go to DML. Random ops are not usually executed that often during one epoch, so it shouldn't affect performance too much.

@Saipavan790
Copy link

I am also facing the same issue. Were you able to resolve it?

@PatriceVignola
Copy link
Contributor

@Saipavan790 We have a few solutions in mind that seem to be working internally, but if you or @KaganSenturk have a sample model that we can try running against to test for accuracy and performance, it would help making sure that your specific scenarios are covered.

@KaganSenturk
Copy link
Author

I will switch to NVIDIA drivers. that's my solution.

@PatriceVignola
Copy link
Contributor

The latest release now has support for the UnsortedSegment* ops. We're currently working on optimizing it as we speak, so it will get faster in the next version. But at least, for the time being, there shouldn't be device placement errors anymore.

Note that most of the latest developments are happening over at the tensorflow-directml-plugin fork and its corresponding pypi package, which are for TensorFlow >= 2.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants