Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaskedLMHead should support dtype=bfloat16 #1195

Closed
g-dspencer opened this issue Aug 4, 2023 · 2 comments · May be fixed by #1197
Closed

MaskedLMHead should support dtype=bfloat16 #1195

g-dspencer opened this issue Aug 4, 2023 · 2 comments · May be fixed by #1197
Labels
type:Bug Something isn't working

Comments

@g-dspencer
Copy link

g-dspencer commented Aug 4, 2023

Describe the bug

I claim that MaskedLMHead should support a dtype argument of tf.bfloat16 (and tf.float16) so that users can look at the effect of reducing their memory usage. This matters more as the vocab gets larger.

To Reproduce

In google corp colab I do "File -> Save a copy as GitHub Gist", enter an OTP, and then there is a message that "github auth fails" so I'll just include the code inline:

!pip install keras-nlp --upgrade --quiet

import tensorflow as tf
import keras_nlp

# Based on test_valid_call()
# https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/layers/modeling/masked_lm_head_test.py#L25
def test_dtype(dtype):
  head = keras_nlp.layers.MaskedLMHead(
      vocabulary_size=100,
      activation="softmax",
      dtype=dtype, # this is the point
  )
  encoded_tokens = tf.keras.Input(shape=(10, 16))
  positions = tf.keras.Input(shape=(5,), dtype="int32")
  outputs = head(encoded_tokens, mask_positions=positions)
  model = tf.keras.Model((encoded_tokens, positions), outputs)

  token_data = tf.random.uniform(shape=(4, 10, 16))
  position_data = tf.random.uniform(minval=0, maxval=10, shape=(4, 5), dtype=tf.int32)
  model((token_data, position_data))

  for w in head.weights:
      assert w.dtype == dtype, ("Wrong type: " + w.name)
      # When it fails it fails with:
      # TypeError: Input 'y' of 'AddV2' Op has type float16 that does not match type float32 of argument 'x'.

print("float32")
test_dtype(tf.float32) # this works

print("bfloat16")
test_dtype(tf.bfloat16) # this fails

print("float64")
test_dtype(tf.float64)

Expected behavior

Lack of a crash.
The loop checking dtypes (assert w.dtype == dtype, ("Wrong type: " + w.name)) should arguably pass - unless we
are hitting some subtle case of wanting mixed types.

Additional context

The error I get is:

TypeError: Exception encountered when calling layer "masked_lm_head_1" (type MaskedLMHead).

in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras_nlp/src/layers/modeling/masked_lm_head.py", line 196, in call  *
        outputs = outputs + self._bias

    TypeError: Input 'y' of 'AddV2' Op has type bfloat16 that does not match type float32 of argument 'x'.


Call arguments received by layer "masked_lm_head_1" (type MaskedLMHead):
  • inputs=tf.Tensor(shape=(None, 10, 16), dtype=bfloat16)
  • mask_positions=tf.Tensor(shape=(None, 5), dtype=int32)

and I suspect we need to pass in a few dtype= parameters in the code.

Would you like to help us fix it?
yes

@vulkomilev
Copy link

hi I have developed a solution to your problem
but it works only for the output kernel and bias
<dtype: 'bfloat16'> <dtype: 'bfloat16'>
masked_lm_head/output_kernel:0
<dtype: 'bfloat16'> <dtype: 'bfloat16'>
masked_lm_head/output_bias:0
<dtype: 'float32'> <dtype: 'bfloat16'>
masked_lm_head/kernel:0
<dtype: 'float32'> <dtype: 'bfloat16'>
masked_lm_head/bias:0
<dtype: 'float32'> <dtype: 'bfloat16'>
masked_lm_head/gamma:0
<dtype: 'float32'> <dtype: 'bfloat16'>
is this sufficient ?

@mattdangerw
Copy link
Member

This is fixed on #1242, and we have a test enforcing this for all layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Bug Something isn't working
Projects
None yet
3 participants