BUG: time_limit not respected; remaining time shouldn't be negative #158

AnirudhDagar · 2024-11-19T14:45:21Z

We should raise a timeout error as soon as time_remaining < 0. Currently it is not raised until we hit AutoGluon Tranining.
To be honest, I don't think it is a very valid use case where a user sets such a short amount of time, but certainly if a user tries to run for less than an ideal time, they can run into unexpected issues (the program continues to run even after the time_limit is reached, until we hit AutoGluon training).

Reproduce

aga run knot_theory --config_overrides "time_limit=5"

INFO:root:Starting AutoGluon-Assistant
INFO:root:Presets is not provided or invalid: None
INFO:root:Using default presets: best_quality
INFO:root:Presets: best_quality
INFO:root:Loading default config from: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/configs/default.yaml
INFO:root:Merging best_quality config from: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/configs/best_quality.yaml
INFO:root:Applying command-line overrides: ['time_limit=5']
INFO:root:Successfully applied command-line overrides
INFO:root:Successfully loaded config
🤖  Welcome to AutoGluon-Assistant
Will use task config:
{
    'infer_eval_metric': True,
    'detect_and_drop_id_column': False,
    'task_preprocessors_timeout': 3600,
    'time_limit': 5,
    'save_artifacts': {'enabled': False, 'append_timestamp': True, 'path': './aga-artifacts'},
    'feature_transformers': {
        'enabled_models': [],
        'models': {
            'CAAFE': {
                '_target_': 'autogluon.assistant.transformer.feature_transformers.caafe.CAAFETransformer',
                'eval_model': 'lightgbm',
                'llm_provider': '${llm.provider}',
                'llm_model': '${llm.model}',
                'num_iterations': 5,
                'optimization_metric': 'roc'
            },
            'OpenFE': {'_target_': 'autogluon.assistant.transformer.feature_transformers.openfe.OpenFETransformer', 'n_jobs': 1, 'num_features_to_keep': 10},
            'PretrainedEmbedding': {'_target_': 'autogluon.assistant.transformer.feature_transformers.scentenceFT.PretrainedEmbeddingTransformer', 'model_name': 'all-mpnet-base-v2'}
        }
    },
    'autogluon': {'predictor_init_kwargs': {}, 'predictor_fit_kwargs': {'presets': 'best_quality'}},
    'llm': {'provider': 'bedrock', 'model': 'anthropic.claude-3-5-sonnet-20241022-v2:0', 'max_tokens': 512, 'proxy_url': None, 'temperature': 0, 'verbose': True}
}
Task path: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/knot_theory
Task loaded!
TabularPredictionTask(name=knot_theory, description=, 3 datasets)
INFO:botocore.credentials:Found credentials in environment variables.
AGA is using model anthropic.claude-3-5-sonnet-20241022-v2:0 from Bedrock to assist you with the task.
INFO:root:It took 1.01 seconds initializing components. Time remaining: 3.98/5.00
Task understanding starts...
description: data_description_file: You are solving this data science tasks:The dataset presented here (knot theory) comprises a lot of numerical features. Some of the features may be missing, with nan value. Your task is to predict the 'signature', which has 18 unique integers. The evaluation metric is the classification accuracy.\n
train_data: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/knot_theory/train.csv
Loaded data from: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/knot_theory/train.csv | Columns = 19 / 19 | Rows = 10000 -> 10000
test_data: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/knot_theory/test.csv
Loaded data from: /Users/anidagar/Desktop/Work/new_code/autogluon-assistant/knot_theory/test.csv | Columns = 19 / 19 | Rows = 5000 -> 5000
WARNING: Failed to identify the sample_submission_data of the task, it is set to None.
label_column: signature
problem_type: multiclass
eval_metric: accuracy
Total number of prompt tokens: 1620
Total number of completion tokens: 194
Task understanding complete!
Automatic feature generation is disabled.
INFO:root:It took 8.60 seconds preprocessing task. Time remaining: -4.62/5.00
Model training starts...
Fitting AutoGluon TabularPredictor
predictor_init_kwargs: {'learner_kwargs': {'ignored_columns': []}, 'label': 'signature', 'problem_type': 'multiclass', 'eval_metric': 'accuracy'}
predictor_fit_kwargs: {'presets': 'best_quality'}
No path specified. Models will be saved in: "AutogluonModels/ag-20241119_144123"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.1.2b20241111
Python Version:     3.10.15
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 23.6.0: Wed Jul 31 20:49:39 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6000
CPU Count:          10
Memory Avail:       10.18 GB / 32.00 GB (31.8%)
Disk Space Avail:   52.76 GB / 460.43 GB (11.5%)
===================================================
Presets specified: ['best_quality']
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting.
	Running DyStack for up to -1s of the -4.622002124786377s of remaining time (25%).
	Running DyStack sub-fit in a ray process to avoid memory leakage. Enabling ray logging (enable_ray_logging=True). Specify `ds_args={'enable_ray_logging': False}` if you experience logging issues.
2024-11-19 15:41:26,129	INFO worker.py:1743 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Warning: Not enough time to fit DyStack! Skipping...
	1	 = Optimal   num_stack_levels (Stacked Overfitting Occurred: False)
	3s	 = DyStack   runtime |	-8s	 = Remaining runtime
Starting main fit with num_stack_levels=1.
	For future fit calls on this dataset, you can skip DyStack to save time: `predictor.fit(..., dynamic_stacking=False, num_stack_levels=1)`
INFO:root:It took 3.37 seconds training model. Time remaining: -8.00/5.00
Traceback (most recent call last):
  File "/Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/assistant.py", line 131, in fit_predictor
    self.predictor.fit(task, time_limit=time_limit)
  File "/Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/predictor.py", line 92, in fit
    self.predictor = TabularPredictor(**predictor_init_kwargs).fit(
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/autogluon/core/utils/decorators.py", line 31, in _call
    return f(*gargs, **gkwargs)
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/autogluon/tabular/predictor/predictor.py", line 1270, in fit
    raise AssertionError(
AssertionError: Not enough time left to train models for the full fit. Consider specifying a larger time_limit or setting `dynamic_stacking=False`. Time remaining: -7.97s

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anidagar/miniconda3/envs/aga310/bin/aga", line 8, in <module>
    sys.exit(main())
  File "/Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/__init__.py", line 264, in main
    app()
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/typer/main.py", line 342, in __call__
    raise e
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/typer/main.py", line 325, in __call__
    return get_command(self)(*args, **kwargs)
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/typer/core.py", line 728, in main
    return _main(
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/typer/core.py", line 197, in _main
    rv = self.invoke(ctx)
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/anidagar/miniconda3/envs/aga310/lib/python3.10/site-packages/typer/main.py", line 707, in wrapper
    return callback(**use_params)
  File "/Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/__init__.py", line 226, in run_assistant
    assistant.fit_predictor(task, time_limit=timer.time_remaining)
  File "/Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/assistant.py", line 133, in fit_predictor
    self.handle_exception("Predictor Fit", e)
  File "/Users/anidagar/Desktop/Work/new_code/autogluon-assistant/src/autogluon/assistant/assistant.py", line 70, in handle_exception
    raise Exception(str(exception), stage)
Exception: ('Not enough time left to train models for the full fit. Consider specifying a larger time_limit or setting `dynamic_stacking=False`. Time remaining: -7.97s', 'Predictor Fit')

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: time_limit not respected; remaining time shouldn't be negative #158

BUG: time_limit not respected; remaining time shouldn't be negative #158

AnirudhDagar commented Nov 19, 2024

BUG: time_limit not respected; remaining time shouldn't be negative #158

BUG: time_limit not respected; remaining time shouldn't be negative #158

Comments

AnirudhDagar commented Nov 19, 2024

Reproduce