Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Renewed] Refactor preprocessor architecture to enhance extensibility #694

Merged
merged 5 commits into from
Oct 30, 2023

Conversation

Forthoney
Copy link
Contributor

Addresses parts of #672
Preparing to add a airflow as a preprocessor target, I noticed that the current architecture is very rigid and difficult to expand upon. This refactoring/reorganization hopefully addresses some of these. Notable changes (and absence of changes) are

No AST traversal or transformation logic has been rewritten

Files outside of shell_ast have not been changed*

preprocess/preprocess.py has been changed, but simply to import new files (some new modules were made, as I discuss below).

Differing behavior depending on target (vanilla, spec, airflow) is no longer handled by an if statement in ast_to_ast.

Previously, replace_df_regions was a function that ran a if statement on a property of TransformationState to check which specific TransformationState implementation we were dealing with (this would tell us the output target). Now, replace_df_regions is an instance method of AbstractTransformationState. This means that any variation in AST transformation between different output targets can be addressed in a single class that implements of AbstractTransformationState.

The location of the AST Transformation logic has been moved from replace_df_regions() to a method on TransformationState.

For example, this pull request created three implementations of AbstractTransformationState: TransformationState for the vanilla pash behavior, SpeculativeTransformationState for pash spec, and AirflowTransformationState for airflow (this one is empty). The ast_to_ast package contains no checks in its logic for checking how to handle AST transformations - they only need to call replace_df_regions of whatever trans_options argument they have at hand.

The TransformationState code has been moved to a separate file transformation_options.py. I do suggest renaming "State" with some more appropriate name, but this is a minor nitpick

preprocess_node_<node_type> functions have been

  • moved to its own module
  • no longer reliant on a large dictionary to dispatch the appropriate function
  • type-annotated.

@github-actions
Copy link

OS:ubuntu-20.04
Sat Oct 28 17:17:55 UTC 2023
intro: 2/2 tests passed.
interface: 41/41 tests passed.
compiler: 54/54 tests passed.

@github-actions
Copy link

OS:ubuntu-20.04
Sat Oct 28 17:19:14 UTC 2023
intro: 2/2 tests passed.
interface: 41/41 tests passed.
compiler: 54/54 tests passed.

@github-actions
Copy link

OS = Debian 10
CPU = Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Ram = 15752
Hash = 4ab14e5
Kernel= Linux 4.15.0-197-generic x86_64

benchmark tests passed failed untested unresolved unsupported not_in_use other_status
posix 494 375 41 31 6 40 1 0
intro 2 2 0 0 0 0 0 0
interface 41 41 0 0 0 0 0 0
compiler 54 54 0 0 0 0 0 0

@github-actions
Copy link

OS = Debian 10
CPU = Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Ram = 15752
Hash = 4436c96
Kernel= Linux 4.15.0-197-generic x86_64

benchmark tests passed failed untested unresolved unsupported not_in_use other_status
posix 494 375 41 31 6 40 1 0
intro 2 2 0 0 0 0 0 0
interface 41 41 0 0 0 0 0 0
compiler 54 54 0 0 0 0 0 0

@angelhof
Copy link
Member

This looks great to me!

@angelhof angelhof merged commit 5fd6cdb into future Oct 30, 2023
6 checks passed
@angelhof angelhof deleted the future-airflow branch October 30, 2023 13:58
@Forthoney
Copy link
Contributor Author

Closes #672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants