Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when run custom model using benchmark_single_table #327

Open
T0217 opened this issue Aug 4, 2024 · 2 comments
Open

Error when run custom model using benchmark_single_table #327

T0217 opened this issue Aug 4, 2024 · 2 comments
Assignees
Labels
bug Something isn't working under discussion Issue is currently being discussed

Comments

@T0217
Copy link

T0217 commented Aug 4, 2024

Environment Details

  • SDGym version: 0.8.0
  • Python version: 3.11.5
  • Operating System: Windows 11

Error Description

When running the same code as #321 , the following error was encountered.

image

Steps to reproduce

import os
import shutil
import sdgym
from sdgym import create_single_table_synthesizer
from sdgym.synthesizers import (UniformSynthesizer,
                                GaussianCopulaSynthesizer,
                                TVAESynthesizer)
import warnings
warnings.filterwarnings('ignore')

synthesizers = [
    UniformSynthesizer,
    GaussianCopulaSynthesizer,
    TVAESynthesizer
]


# YData
# CTGAN
def ctgan_get_trained_synthesizer(data, metadata):
    from ydata_synthetic.synthesizers.regular import RegularSynthesizer
    from ydata_synthetic.synthesizers import ModelParameters, TrainParameters

    ctgan_args = ModelParameters(batch_size=500, lr=2e-4, betas=(0.5, 0.9))
    train_args = TrainParameters(epochs=2)

    synthesizer = RegularSynthesizer(modelname='ctgan', model_parameters=ctgan_args)

    num_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] in ['numerical', 'datetime']]
    cat_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] == 'categorical']

    synthesizer.fit(data=data,
                    train_arguments=train_args,
                    num_cols=num_cols,
                    cat_cols=cat_cols)

    return synthesizer


def sample_from_synthesizer(synthesizer, n_rows):
    synthetic_data = synthesizer.sample(n_rows)
    return synthetic_data


YData_CTGANSynthesizer = create_single_table_synthesizer(
    get_trained_synthesizer_fn=ctgan_get_trained_synthesizer,
    sample_from_synthesizer_fn=sample_from_synthesizer,
    display_name='YData-CTGAN'
)


custom_synthesizers = [YData_CTGANSynthesizer]

# Detect the existence of the folder
detailed_results_folder = r"C:\Users\18840\Desktop\result"

if os.path.isdir(detailed_results_folder) and\
   os.path.exists(detailed_results_folder):
    print('The folder where the intermediate files are stored already exists and is processed for deletion.')
    shutil.rmtree(detailed_results_folder, ignore_errors=True)
    print('-' * 50)

results = sdgym.benchmark_single_table(
    synthesizers=synthesizers,
    custom_synthesizers=custom_synthesizers,
    show_progress=True,
    multi_processing_config={
     'package_name': 'multiprocessing',
     'num_workers': 8
    },
    sdv_datasets=['adult'],
    detailed_results_folder=detailed_results_folder
)
@T0217 T0217 added bug Something isn't working new Automatic label applied to new issues labels Aug 4, 2024
T0217 added a commit to T0217/SDGym that referenced this issue Aug 5, 2024
@srinify
Copy link

srinify commented Sep 13, 2024

Hi there @T0217 👋 Do you mind updating SDGym and related libraries in our ecosystem to see if you're still running into this issue? We released some changes, so I'm always curious to validate if it's still relevant!

Second -- this is a bit challenging for us to debug because we aren't authors of Custom:YData-CTGAN etc. I'm curious if you were able to figure out the source of your error since posting this issue?

@srinify srinify self-assigned this Sep 13, 2024
@srinify srinify added under discussion Issue is currently being discussed and removed new Automatic label applied to new issues labels Sep 13, 2024
@T0217
Copy link
Author

T0217 commented Sep 13, 2024

Thanks for the feedback. I've updated SDGym to test it out. The TypeError issue with the Ydata CTGAN model, caused by weak references, persists. This is likely due to certain attributes or components within the model that use weak references. Switching from pickle to dill for serialization, as suggested in #328, or using the model from the SDV library, can resolve this problem. However, the issue mentioned in #321 remains unresolved, regardless of whether the model from SDV or Ydata is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working under discussion Issue is currently being discussed
Projects
None yet
Development

No branches or pull requests

2 participants