Refactor auto factory #1

psmyth94 · 2024-11-22T16:27:50Z

This pull request is for supporting experiment-based processor and plotter mappings.

Changes to Experiment Handling:

src/biofit/auto/auto_factory.py: Added _experiment_mapping to the _BaseAutoProcessorClass and introduced a new register_experiment method to register processors for experiments. [1] [2] [3]

Updates to Plotting and Processing Modules:

src/biofit/auto/plotting_auto.py: Added experiment-based plotter mappings and updated methods to handle experiments, including for_experiment, from_dataset, and from_bioset. [1] [2] [3] [4] [5] [6]
src/biofit/auto/processing_auto.py: Added experiment-based processor mappings and updated the for_dataset method to handle experiments. [1] [2] [3] [4]

Minor Adjustments:

src/biofit/models/lasso/__init__.py: Added LassoConfigForOTU to the imports.
src/biofit/models/lasso/lasso.py: Changed dataset_name to experiment_name in LassoConfigForOTU and updated config_class to _config_class in LassoModel.
src/biofit/models/lightgbm/lightgbm.py: Updated config_class to _config_class in LightGBMModel for consistency.

Added a comprehensive guide to the `CREATING_PROCESSORS.md` document, covering the BaseProcessor class, fit and transform methods, batch processing, and configuration management.

Added functionality to register experiments and map configurations, processors, and plotters for various experiment types. Updated auto_factory, configuration_auto, plotting_auto, and processing_auto modules to support experiment-based registration and mapping.

- Added LassoConfigForOTU to lasso model - Added UpSamplerConfigForMetagenomics to upsampling - Added DistanceStatConfigForSNP to distance statistics - Reordered DistanceStatConfigForOTU in distance statistics

Renamed all occurrences of `dataset_name` to `experiment_name` across various configuration classes and related files.

Removed outdated tests for auto plotting and preprocessing. These tests were no longer relevant due to recent changes in the codebase.

Updated the test_processing.py file to rename config_class to _config_class for consistency. This change affects the MockModel and MockPreprocessor classes.

Renamed all instances of 'dataset' to 'experiment' in the codebase to better reflect the new terminology. This includes variable names, function names, and comments. Updated import statements and mappings accordingly.

- Added new test file `test_auto_plotting.py` for testing auto plotting functionalities. - Added new test file `test_auto_preprocessing.py` for testing auto preprocessing functionalities. - Included various test cases for different data formats and scenarios. - Utilized `unittest` and `pytest` frameworks for writing the tests. - Ensured integration and unit tests are marked appropriately.

Renamed all instances of genomicsml_module to biofit_module in auto_factory.py and configuration_auto.py for consistency and clarity.

- Renamed `dataset_name` to `experiment_name` for consistency across the codebase. - Simplified `_plotter_mapping` assignment by directly accessing `_experiment_mapping` dictionary.

Updated the ProcessorConfig class to rename the dataset_name attribute to experiment_name for better clarity.

- Refactor input column handling logic in TransformationMixin - Simplify condition checks and improve readability - Ensure input_columns is properly set based on feature type or unused columns - Remove redundant code and handle ValueError exceptions

Renamed the `dataset_name` attribute to `experiment_name` in the BaseProcessor class to better reflect its purpose.

Replaced the default implementation of the `fit` method with a `NotImplementedError` to ensure that subclasses implement their own `fit` method. Also added `NotImplementedError` for `fit_transform` and `transform` methods to enforce their implementation in subclasses.

- Updated import statement in `test_processing.py` to include `require_polars`. - Modified test cases to use `numpy` as source format and `pandas` as target format. - Adjusted assertions to match the new expected values. - Added `require_polars` decorator to `test_process_transform_batch_output_valid`. - Ensured fingerprints remain unchanged after fit and predict operations.

- Modified `tests/auto/test_auto_plotting.py` for improved readability - Refactored test cases to ensure better coverage and maintainability - Updated test assertions to reflect recent changes in plotting logic

Removed the `create_omic_dataset` function and related fixtures from `tests/fixtures/files.py`. These functions were no longer in use and cluttered the codebase.

- Renamed methods from `for_processor` to `from_processor` for consistency. - Added `unregister_experiment` and `unregister_pipeline` methods to `AutoConfig`, `AutoPlotterConfig`, and `AutoPreprocessorConfig`. - Updated imports and type hints to reflect changes. - Modified unit tests to include new unregister methods and ensure proper cleanup.

- Added unit tests for `PlotterPipeline` class in `test_plotter_pipeline.py`. - Included tests for plotting with valid and invalid datasets. - Added tests for plotting with and without fitting the processor. - Added tests for plotting with multiple processors and plotters.

- Added fit_transform method to PCoAFeatureExtractor class - Removed unnecessary ValueError in AutoPreprocessor class - Updated test_abundance_filter_otu to include dataset_cached format

- Fixed specificity check to handle list, tuple, and ndarray correctly - Updated PCoAFeatureExtractor to use config parameter in DistanceStat - Added input_columns parameter to PCoAFeatureExtractor transform method - Fixed BaseProcessor to correctly check for fit method - Removed unnecessary output_dir parameter in test_eval

- Removed unused os import from test_eval.py

psmyth94 added 22 commits November 22, 2024 07:15

feat(docs): add detailed guide for creating processors

5210f6e

Added a comprehensive guide to the `CREATING_PROCESSORS.md` document, covering the BaseProcessor class, fit and transform methods, batch processing, and configuration management.

feat: add new configurations for OTU and Metagenomics

79671ea

- Added LassoConfigForOTU to lasso model - Added UpSamplerConfigForMetagenomics to upsampling - Added DistanceStatConfigForSNP to distance statistics - Reordered DistanceStatConfigForOTU in distance statistics

refactor: rename dataset_name to experiment_name

b6c376c

Renamed all occurrences of `dataset_name` to `experiment_name` across various configuration classes and related files.

refactor(tests): remove obsolete auto tests

3192f3a

Removed outdated tests for auto plotting and preprocessing. These tests were no longer relevant due to recent changes in the codebase.

refactor(tests): update config_class to _config_class

103379e

Updated the test_processing.py file to rename config_class to _config_class for consistency. This change affects the MockModel and MockPreprocessor classes.

refactor: rename dataset to experiment

b1222a2

Renamed all instances of 'dataset' to 'experiment' in the codebase to better reflect the new terminology. This includes variable names, function names, and comments. Updated import statements and mappings accordingly.

refactor: rename genomicsml_module to biofit_module

170569c

Renamed all instances of genomicsml_module to biofit_module in auto_factory.py and configuration_auto.py for consistency and clarity.

refactor(plotting_auto): simplify type checks and naming

e9a29de

- Renamed `dataset_name` to `experiment_name` for consistency across the codebase. - Simplified `_plotter_mapping` assignment by directly accessing `_experiment_mapping` dictionary.

refactor(processing): rename dataset_name to experiment_name

94319b6

Updated the ProcessorConfig class to rename the dataset_name attribute to experiment_name for better clarity.

refactor(processing): rename dataset_name to experiment_name

7186f68

Renamed the `dataset_name` attribute to `experiment_name` in the BaseProcessor class to better reflect its purpose.

refactor(tests): update auto plotting test cases

4583bb5

- Modified `tests/auto/test_auto_plotting.py` for improved readability - Refactored test cases to ensure better coverage and maintainability - Updated test assertions to reflect recent changes in plotting logic

refactor(tests): remove unused omic dataset functions

5463249

Removed the `create_omic_dataset` function and related fixtures from `tests/fixtures/files.py`. These functions were no longer in use and cluttered the codebase.

feat: add fit_transform method to PCoAFeatureExtractor

10b8717

- Added fit_transform method to PCoAFeatureExtractor class - Removed unnecessary ValueError in AutoPreprocessor class - Updated test_abundance_filter_otu to include dataset_cached format

refactor: remove unused os import in test_eval.py

297c3f7

- Removed unused os import from test_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor auto factory #1

Refactor auto factory #1

psmyth94 commented Nov 22, 2024

Refactor auto factory #1

Are you sure you want to change the base?

Refactor auto factory #1

Conversation

psmyth94 commented Nov 22, 2024

Changes to Experiment Handling:

Updates to Plotting and Processing Modules:

Minor Adjustments: