Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training error for "LINEAR" model #512

Open
whoisroop opened this issue Mar 4, 2024 · 1 comment
Open

Training error for "LINEAR" model #512

whoisroop opened this issue Mar 4, 2024 · 1 comment

Comments

@whoisroop
Copy link

Pipeline Configuration:

pj = PredictionJobDataClass(
id=102,
model='linear',
quantiles=[0.1,0.3,0.5,0.7,0.9],
forecast_type="load",
lat=19.0760,
lon=72.8777,
horizon_minutes=24*60,
resolution_minutes=60,
name="Mumbai",
# hyper_params={},
# feature_names=None,
default_modelspecs=None,
save_train_forecasts=True,
)

Training Forecast:

start = time.time()
train_model_pipeline(
pj,
train_data,
check_old_model_age=False,
mlflow_tracking_uri="./mlflow_trained_models",
artifact_folder="./mlflow_artifacts",
)
end = time.time()

ERROR:

ValueError: Input X contains NaN.
LinearRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

@whoisroop whoisroop changed the title Training error for "LINEAR" & "LGB" models Training error for "LINEAR" model Mar 4, 2024
@bartpleiter
Copy link
Collaborator

Hi Roop,
Thank you for writing this issue. The decision tree based models are able to work with NaNs natively. I would suggest you use xgboost or another GBDT model if possible. If not, then I would happily review your PR to fix the issue! I think adding an optional preprocessing step to input or drop data would be useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants