[AutoTuner]Add memory model #147

Caozhou1995 · 2024-06-13T09:48:36Z

This PR adds memory model, which can be used to speed up pruning and filter out OOM memory strategies and strategies with low memory usage.

aoyulong · 2024-06-17T01:37:46Z

flagscale/auto_tuner/memory_model.py

Please try to reuse this impl

The impl has been reused and the activation section has been refined.

aoyulong · 2024-06-17T01:39:26Z

flagscale/auto_tuner/prune/memory.py

+
+
+def prune_by_memory_model_util(config, strategy, history=[]):
+    if "modeling_memory" in strategy:


It's better to rename "modeling_memory" to "memory_model" as the other places.

aoyulong · 2024-06-17T01:41:46Z

flagscale/auto_tuner/tuner.py

            if os.environ.get("AIRS_ACCELERATOR_COUNT", None):
-                self.config.experiment.auto_tuner.nproc_per_node = int(
-                    os.environ["AIRS_ACCELERATOR_COUNT"]
+                # Set config
+                self.config.experiment.auto_tuner.nproc_per_node = (
+                    int(os.environ["AIRS_ACCELERATOR_COUNT"]) * 2
+                    if "luvatar_BI" in os.environ["AIRS_ACCELERATOR_MODEL"]
+                    else int(os.environ["AIRS_ACCELERATOR_COUNT"])
                )
                # Set original config
-                self.orig_config.experiment.runner.nproc_per_node = int(
-                    os.environ["AIRS_ACCELERATOR_COUNT"]
+                self.orig_config.experiment.runner.nproc_per_node = (
+                    int(os.environ["AIRS_ACCELERATOR_COUNT"]) * 2
+                    if "luvatar_BI" in os.environ["AIRS_ACCELERATOR_MODEL"]
+                    else int(os.environ["AIRS_ACCELERATOR_COUNT"])
                )
                # Set config
-                self.config.experiment.runner.nproc_per_node = int(
-                    os.environ["AIRS_ACCELERATOR_COUNT"]
+                self.config.experiment.runner.nproc_per_node = (
+                    int(os.environ["AIRS_ACCELERATOR_COUNT"]) * 2
+                    if "luvatar_BI" in os.environ["AIRS_ACCELERATOR_MODEL"]
+                    else int(os.environ["AIRS_ACCELERATOR_COUNT"])


Is there any way to move these platform related code into a standalone place? We may support differnt cloud platform.

The platform code has been removed to platform.py and different platforms code will be in this file.

aoyulong · 2024-06-25T03:32:36Z

flagscale/train/theoretical_memory_usage.py

@@ -0,0 +1,351 @@
+"""
+Computes theoretical memory footprint for model training referring to megatron.
+Activation memory is optimized with adding block recompute formula.


Please add reference to megatron original impl

aoyulong · 2024-06-25T03:33:25Z

flagscale/auto_tuner/utils.py

@@ -161,3 +167,114 @@ def compare_by_recompute(strategy1, strategy2):
        result = True

    return result
+
+
+def convert_config_to_megatron_args(config, strategy):


Is there any simpler way to impl this?

aoyulong

LGTM

add memory model

0a33cba

Caozhou1995 force-pushed the memory_model branch from 844c126 to 0a33cba Compare June 13, 2024 17:18

aoyulong reviewed Jun 17, 2024

View reviewed changes

Caozhou1995 force-pushed the memory_model branch 5 times, most recently from d052639 to 69510a0 Compare June 24, 2024 12:25

update by review

e56b149

Caozhou1995 force-pushed the memory_model branch from 69510a0 to e56b149 Compare June 24, 2024 13:15

aoyulong previously approved these changes Jun 25, 2024

View reviewed changes

Caozhou1995 dismissed aoyulong’s stale review via 9d49a2a June 25, 2024 03:46

add reference

9f6ad90

Caozhou1995 force-pushed the memory_model branch from 9d49a2a to 9f6ad90 Compare June 25, 2024 03:51

aoyulong approved these changes Jun 25, 2024

View reviewed changes

aoyulong merged commit f47e6d5 into FlagOpen:main Jun 25, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoTuner]Add memory model #147

[AutoTuner]Add memory model #147

Caozhou1995 commented Jun 13, 2024 •

edited

Loading

aoyulong Jun 17, 2024

Caozhou1995 Jun 25, 2024 •

edited

Loading

aoyulong Jun 17, 2024 •

edited

Loading

Caozhou1995 Jun 25, 2024

aoyulong Jun 17, 2024

Caozhou1995 Jun 25, 2024

aoyulong Jun 25, 2024

aoyulong Jun 25, 2024

aoyulong left a comment



		def prune_by_memory_model_util(config, strategy, history=[]):
		if "modeling_memory" in strategy:

[AutoTuner]Add memory model #147

[AutoTuner]Add memory model #147

Conversation

Caozhou1995 commented Jun 13, 2024 • edited Loading

aoyulong Jun 17, 2024

Choose a reason for hiding this comment

Caozhou1995 Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

aoyulong Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Caozhou1995 Jun 25, 2024

Choose a reason for hiding this comment

aoyulong Jun 17, 2024

Choose a reason for hiding this comment

Caozhou1995 Jun 25, 2024

Choose a reason for hiding this comment

aoyulong Jun 25, 2024

Choose a reason for hiding this comment

aoyulong Jun 25, 2024

Choose a reason for hiding this comment

aoyulong left a comment

Choose a reason for hiding this comment

Caozhou1995 commented Jun 13, 2024 •

edited

Loading

Caozhou1995 Jun 25, 2024 •

edited

Loading

aoyulong Jun 17, 2024 •

edited

Loading