Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT LAND] compile more modules #1938

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

felipemello1
Copy link
Contributor

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

We compile only transformer layers. However, we could compile embedding, norm and the output layer.

if hasattr(model, "norm"):
    model.norm.compile(backend=backend)

if hasattr(model, "chunked_output"):
    model.chunked_output = torch.compile(model.chunked_output, backend=backend)

if hasattr(model, "token_embeddings"):
    model.token_embeddings.compile(backend=backend)

Test plan

3b with packing

tune run full_finetune_single_device --config llama3_2/3B_full_single_device optimizer_in_bwd=True enable_activation_checkpointing=True enable_activation_offloading=True optimizer._component_=torch.optim.AdamW optimizer.fused=True compile=True dataset.packed=True dataset.split=train[:5%] tokenizer.max_seq_len=2048 metric_logger=torchtune.training.metric_logging.WandBLogger metric_logger.project=profiling log_every_n_steps=1 log_peak_memory_stats=True gradient_accumulation_steps=1 max_steps_per_epoch=15 epochs=1 batch_size=5 metric_logger.name=baseline loss=torchtune.modules.loss.CEWithChunkedOutputLoss
image image

8b with packing
image

11b NO packing, NO act offloading
image

conclusion:

compiling the extra modules seems to help when there is tied embedding. However, if there is not packing, then there are more graph breaks, slowing down early training. We should fix graphs breaks and then potentially land this PR. Optionally, we can compile the extra layers only if we hav tied embeddings.

Copy link

pytorch-bot bot commented Nov 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1938

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 4 Cancelled Jobs

As of commit edfa188 with merge base eab21f0 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2024
@felipemello1 felipemello1 marked this pull request as draft November 1, 2024 03:39
@felipemello1
Copy link
Contributor Author

compiling the chunked_output will break for tied embeddings + fsdp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants