-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor auto factory #1
Open
psmyth94
wants to merge
22
commits into
main
Choose a base branch
from
refactor-auto-factory
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added a comprehensive guide to the `CREATING_PROCESSORS.md` document, covering the BaseProcessor class, fit and transform methods, batch processing, and configuration management.
Added functionality to register experiments and map configurations, processors, and plotters for various experiment types. Updated auto_factory, configuration_auto, plotting_auto, and processing_auto modules to support experiment-based registration and mapping.
- Added LassoConfigForOTU to lasso model - Added UpSamplerConfigForMetagenomics to upsampling - Added DistanceStatConfigForSNP to distance statistics - Reordered DistanceStatConfigForOTU in distance statistics
Renamed all occurrences of `dataset_name` to `experiment_name` across various configuration classes and related files.
Removed outdated tests for auto plotting and preprocessing. These tests were no longer relevant due to recent changes in the codebase.
Updated the test_processing.py file to rename config_class to _config_class for consistency. This change affects the MockModel and MockPreprocessor classes.
Renamed all instances of 'dataset' to 'experiment' in the codebase to better reflect the new terminology. This includes variable names, function names, and comments. Updated import statements and mappings accordingly.
- Added new test file `test_auto_plotting.py` for testing auto plotting functionalities. - Added new test file `test_auto_preprocessing.py` for testing auto preprocessing functionalities. - Included various test cases for different data formats and scenarios. - Utilized `unittest` and `pytest` frameworks for writing the tests. - Ensured integration and unit tests are marked appropriately.
Renamed all instances of genomicsml_module to biofit_module in auto_factory.py and configuration_auto.py for consistency and clarity.
- Renamed `dataset_name` to `experiment_name` for consistency across the codebase. - Simplified `_plotter_mapping` assignment by directly accessing `_experiment_mapping` dictionary.
Updated the ProcessorConfig class to rename the dataset_name attribute to experiment_name for better clarity.
- Refactor input column handling logic in TransformationMixin - Simplify condition checks and improve readability - Ensure input_columns is properly set based on feature type or unused columns - Remove redundant code and handle ValueError exceptions
Renamed the `dataset_name` attribute to `experiment_name` in the BaseProcessor class to better reflect its purpose.
Replaced the default implementation of the `fit` method with a `NotImplementedError` to ensure that subclasses implement their own `fit` method. Also added `NotImplementedError` for `fit_transform` and `transform` methods to enforce their implementation in subclasses.
- Updated import statement in `test_processing.py` to include `require_polars`. - Modified test cases to use `numpy` as source format and `pandas` as target format. - Adjusted assertions to match the new expected values. - Added `require_polars` decorator to `test_process_transform_batch_output_valid`. - Ensured fingerprints remain unchanged after fit and predict operations.
- Modified `tests/auto/test_auto_plotting.py` for improved readability - Refactored test cases to ensure better coverage and maintainability - Updated test assertions to reflect recent changes in plotting logic
Removed the `create_omic_dataset` function and related fixtures from `tests/fixtures/files.py`. These functions were no longer in use and cluttered the codebase.
- Renamed methods from `for_processor` to `from_processor` for consistency. - Added `unregister_experiment` and `unregister_pipeline` methods to `AutoConfig`, `AutoPlotterConfig`, and `AutoPreprocessorConfig`. - Updated imports and type hints to reflect changes. - Modified unit tests to include new unregister methods and ensure proper cleanup.
- Added unit tests for `PlotterPipeline` class in `test_plotter_pipeline.py`. - Included tests for plotting with valid and invalid datasets. - Added tests for plotting with and without fitting the processor. - Added tests for plotting with multiple processors and plotters.
- Added fit_transform method to PCoAFeatureExtractor class - Removed unnecessary ValueError in AutoPreprocessor class - Updated test_abundance_filter_otu to include dataset_cached format
- Fixed specificity check to handle list, tuple, and ndarray correctly - Updated PCoAFeatureExtractor to use config parameter in DistanceStat - Added input_columns parameter to PCoAFeatureExtractor transform method - Fixed BaseProcessor to correctly check for fit method - Removed unnecessary output_dir parameter in test_eval
- Removed unused os import from test_eval.py
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is for supporting experiment-based processor and plotter mappings.
Changes to Experiment Handling:
src/biofit/auto/auto_factory.py
: Added_experiment_mapping
to the_BaseAutoProcessorClass
and introduced a newregister_experiment
method to register processors for experiments. [1] [2] [3]Updates to Plotting and Processing Modules:
src/biofit/auto/plotting_auto.py
: Added experiment-based plotter mappings and updated methods to handle experiments, includingfor_experiment
,from_dataset
, andfrom_bioset
. [1] [2] [3] [4] [5] [6]src/biofit/auto/processing_auto.py
: Added experiment-based processor mappings and updated thefor_dataset
method to handle experiments. [1] [2] [3] [4]Minor Adjustments:
src/biofit/models/lasso/__init__.py
: AddedLassoConfigForOTU
to the imports.src/biofit/models/lasso/lasso.py
: Changeddataset_name
toexperiment_name
inLassoConfigForOTU
and updatedconfig_class
to_config_class
inLassoModel
.src/biofit/models/lightgbm/lightgbm.py
: Updatedconfig_class
to_config_class
inLightGBMModel
for consistency.