[WIP] Implements Roberta Model #679

JulienDarve · 2024-07-30T18:26:08Z

No description provided.

dlwh

didn't check correctness. just some style/organization thoughts

dlwh · 2024-07-30T21:15:39Z

src/levanter/models/roberta.py

+        distance_embedding = None
+        position_embedding_type = config.position_embedding_type
+
+        if position_embedding_type == "relative_key" or position_embedding_type == "relative_key_query":


raise if it's not recognized

dlwh · 2024-07-30T22:09:17Z

src/levanter/models/roberta.py

+        key = None
+    ) -> Tuple[NamedArray]:
+
+        query_layer = self.transpose_for_scores(self.q_proj(hidden_states))


in theory you shouldn't need this transpose_for_scores

I did it for compatibility with huggingface. I noticed that the llama code has the linear layer output it directly in the correct shape, but I didnt want to deal with communicating that in the state_dict functions. Do you want me to change it?

parameters are best declared in the same order as they are in HF (though that can be worked around) but intermediate values like query_layer can be in any order really. Haliax will automatically transpose things as needed.

dlwh · 2024-07-30T22:10:55Z

src/levanter/models/roberta.py

+
+        attention_scores /= jnp.sqrt(self.HeadSize.size)
+
+        if attention_mask is not None:


masks in Levanter are traditionally binary, which means you need to use something like hax.where(attention_mask, attention_scores, -1E9)

src/levanter/models/roberta.py

dlwh · 2024-07-31T03:36:50Z

src/levanter/models/testing.ipynb

rm before merge

Will do. Do you want me to add it somewhere else in levanter or just keep it to myself?

you can just move it into examples/roberta/ or something. Ideally it was be proper unit tests

dlwh · 2024-07-31T03:37:31Z

src/levanter/models/roberta.py

+    return q_embed, k_embed
+
+
+def llama_rotary_pos_emb(


you'll want to delete this i think? (if you end up needing it, just import from llama.py)

src/levanter/models/roberta.py

dlwh · 2024-07-31T03:41:11Z

I should add: looking good! I know this ended up being a big lift and I appreciate you all taking it on!

RobertaForMaskedLM

…_hidden_states implementation in jax model

… into roberta-model

…ention mask more robust

versae · 2025-01-11T09:37:35Z

With the release of ModernBERT, I was wondering if there are any plans to merge this RoBERTa into levanter and possibly improve it add some nice things from ModernBERT like RoPE.

dlwh · 2025-01-13T17:58:46Z

@versae Yeah I think that's basically the direction they're/we're heading. Basically just use Llama with a different loss function/acausal attention

JulienDarve and others added 4 commits July 30, 2024 11:25

[WIP] Implements Roberta Model

d6284ca

Implements dynamic masking objective

8f7402e

Implements dynamic masked dataset

670b053

Reintroduced accidentally deleted CausalLMDataset class

42f5404

dlwh reviewed Jul 31, 2024

View reviewed changes

JulienDarve and others added 24 commits August 1, 2024 14:43

Everything works except stuck on the final method,

9ad06af

RobertaForMaskedLM

[WIP] Re-implements MLM training objective

53fd8d2

Adds error handling and reverts LmExample class to original

dcd45b2

Testing Modifications

6f21e0d

Merge branch 'stanford-crfm:main' into roberta-model

730d847

Sets RobertaConfig as model architecture and creates default config file

027b176

Adds compute_loss to roberta and changes positional ids to begin from 0

399e08c

Investingating precision loss over time within the model using output…

cd4118c

…_hidden_states implementation in jax model

Merge branch 'roberta-model' of https://github.com/JulienDarve/levanter…

96522f1

… into roberta-model

Model can now successfully import weights from huggingface + made att…

8a732e5

…ention mask more robust

Merge branch 'roberta-training' into roberta-model-copy-2

5f3d8a2

trial

6c105f5

update 1

ab85079

update 2

5b97400

update 3

bd7d411

update

b5d8e14

update

8717c3f

update

10c130c

update

834d88d

update

47fe23b

update

fb5c55c

update

8594e79

update

3ae80d7

update

de93fc9

JulienDarve and others added 4 commits September 12, 2024 12:10

update

896af7d

update

0be9a83

update

0c94a47

Training works!

7ae681d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implements Roberta Model #679

[WIP] Implements Roberta Model #679

JulienDarve commented Jul 30, 2024

dlwh left a comment

dlwh Jul 30, 2024

dlwh Jul 30, 2024

JulienDarve Aug 1, 2024

dlwh Aug 1, 2024

dlwh Jul 30, 2024

dlwh Jul 31, 2024

JulienDarve Aug 1, 2024

dlwh Aug 1, 2024

dlwh Jul 31, 2024

dlwh commented Jul 31, 2024

versae commented Jan 11, 2025

dlwh commented Jan 13, 2025


		attention_scores /= jnp.sqrt(self.HeadSize.size)

		if attention_mask is not None:

[WIP] Implements Roberta Model #679

Are you sure you want to change the base?

[WIP] Implements Roberta Model #679

Conversation

JulienDarve commented Jul 30, 2024

dlwh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlwh commented Jul 31, 2024

versae commented Jan 11, 2025

dlwh commented Jan 13, 2025