Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor auto factory #1

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Refactor auto factory #1

wants to merge 22 commits into from

Conversation

psmyth94
Copy link
Owner

This pull request is for supporting experiment-based processor and plotter mappings.

Changes to Experiment Handling:

Updates to Plotting and Processing Modules:

Minor Adjustments:

Added a comprehensive guide to the `CREATING_PROCESSORS.md` document,
covering the BaseProcessor class, fit and transform methods, batch
processing, and configuration management.
Added functionality to register experiments and map configurations,
processors, and plotters for various experiment types. Updated
auto_factory, configuration_auto, plotting_auto, and processing_auto
modules to support experiment-based registration and mapping.
- Added LassoConfigForOTU to lasso model
- Added UpSamplerConfigForMetagenomics to upsampling
- Added DistanceStatConfigForSNP to distance statistics
- Reordered DistanceStatConfigForOTU in distance statistics
Renamed all occurrences of `dataset_name` to `experiment_name` across
various configuration classes and related files.
Removed outdated tests for auto plotting and preprocessing. These tests
were no longer relevant due to recent changes in the codebase.
Updated the test_processing.py file to rename config_class to
_config_class for consistency. This change affects the MockModel
and MockPreprocessor classes.
Renamed all instances of 'dataset' to 'experiment' in the codebase to
better reflect the new terminology. This includes variable names,
function names, and comments. Updated import statements and mappings
accordingly.
- Added new test file `test_auto_plotting.py` for testing auto plotting
  functionalities.
- Added new test file `test_auto_preprocessing.py` for testing auto
  preprocessing functionalities.
- Included various test cases for different data formats and scenarios.
- Utilized `unittest` and `pytest` frameworks for writing the tests.
- Ensured integration and unit tests are marked appropriately.
Renamed all instances of genomicsml_module to biofit_module in
auto_factory.py and configuration_auto.py for consistency and
clarity.
- Renamed `dataset_name` to `experiment_name` for consistency across
  the codebase.
- Simplified `_plotter_mapping` assignment by directly accessing
  `_experiment_mapping` dictionary.
Updated the ProcessorConfig class to rename the dataset_name attribute
to experiment_name for better clarity.
- Refactor input column handling logic in TransformationMixin
- Simplify condition checks and improve readability
- Ensure input_columns is properly set based on feature type or unused columns
- Remove redundant code and handle ValueError exceptions
Renamed the `dataset_name` attribute to `experiment_name` in the
BaseProcessor class to better reflect its purpose.
Replaced the default implementation of the `fit` method with a
`NotImplementedError` to ensure that subclasses implement their own
`fit` method. Also added `NotImplementedError` for `fit_transform` and
`transform` methods to enforce their implementation in subclasses.
- Updated import statement in `test_processing.py` to include `require_polars`.
- Modified test cases to use `numpy` as source format and `pandas` as target format.
- Adjusted assertions to match the new expected values.
- Added `require_polars` decorator to `test_process_transform_batch_output_valid`.
- Ensured fingerprints remain unchanged after fit and predict operations.
- Modified `tests/auto/test_auto_plotting.py` for improved readability
- Refactored test cases to ensure better coverage and maintainability
- Updated test assertions to reflect recent changes in plotting logic
Removed the `create_omic_dataset` function and related fixtures from
`tests/fixtures/files.py`. These functions were no longer in use and
cluttered the codebase.
- Renamed methods from `for_processor` to `from_processor` for consistency.
- Added `unregister_experiment` and `unregister_pipeline` methods to
  `AutoConfig`, `AutoPlotterConfig`, and `AutoPreprocessorConfig`.
- Updated imports and type hints to reflect changes.
- Modified unit tests to include new unregister methods and ensure proper
  cleanup.
- Added unit tests for `PlotterPipeline` class in `test_plotter_pipeline.py`.
- Included tests for plotting with valid and invalid datasets.
- Added tests for plotting with and without fitting the processor.
- Added tests for plotting with multiple processors and plotters.
- Added fit_transform method to PCoAFeatureExtractor class
- Removed unnecessary ValueError in AutoPreprocessor class
- Updated test_abundance_filter_otu to include dataset_cached format
- Fixed specificity check to handle list, tuple, and ndarray correctly
- Updated PCoAFeatureExtractor to use config parameter in DistanceStat
- Added input_columns parameter to PCoAFeatureExtractor transform method
- Fixed BaseProcessor to correctly check for fit method
- Removed unnecessary output_dir parameter in test_eval
- Removed unused os import from test_eval.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant