-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow df_concat to be created without a linker #2144
Conversation
|
||
pipeline = CTEPipeline(input_dataframes) | ||
# If the user has provided any ColumnExpression, convert to string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 256-264 is a separate bug I found in this code whereby ColumnExpressions were not being converted into strings
@@ -14,7 +15,11 @@ | |||
from .linker import Linker | |||
|
|||
|
|||
def vertically_concatenate_sql(linker: Linker) -> str: | |||
def vertically_concatenate_sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main changes are here - rather than accepting a linker, this now accepts arguments, meaning it can be used outside the context of a linker
|
||
tables_as_splink_dataframes = {} | ||
existing_tables = [] | ||
|
||
if not input_aliases: | ||
# If any of the input_tables are strings, this means they refer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work if the user provides e.g. fake_1000.parquet (a filename rather than a table). To be consistent with what happens when we register tables in linker.py, we'll just give every table an alias for now
f"_splink__input_table{i}" for i, _ in enumerate(input_tables)
6bca1ea
to
bc5eb21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! 👍
splink/vertically_concatenate.py
Outdated
def vertically_concatenate_sql(linker: Linker) -> str: | ||
def vertically_concatenate_sql( | ||
input_tables: Dict[str, SplinkDataFrame], | ||
salting_reqiured: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo of required
here
Closes #2142
The
vertically_concatenate_sql
routine now takes specific arguments rather than accepting a linker. This allows it to be used in cases where a linker is unavailable such asprofile_columns
.This will allow the code to be used in a variety of other scenarios where __splink__df_concat` is needed such as if we're analysing blocking rules outside of the context of a linker.
This is built on top of #2143 which should be merged first.