-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache schema for selected models #4860
Changes from 2 commits
54f69f8
7a89c54
b56534f
b710ea9
5613ec6
345adec
9885b5a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
kind: Features | ||
body: Add `--selected-schema-cache` flag to cache schema object of selected models | ||
only. | ||
time: 2022-03-16T00:38:47.8468296-05:00 | ||
custom: | ||
Author: karunpoudel | ||
Issue: "4688" | ||
PR: "4860" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1084,6 +1084,27 @@ def parse_args(args, cls=DBTArgumentParser): | |
""", | ||
) | ||
|
||
schema_cache_flag = p.add_mutually_exclusive_group() | ||
schema_cache_flag.add_argument( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fairly new to all of the command arguments, would like to hear why you chose to add two arguments that would change the same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One could set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right on! The inheritance order for "global" configs is CLI flag > env var ( I would vote for naming this config in a way that's slightly more generic (not every database calls it "schema," not every database caches at the schema level). Perhaps |
||
'--selected-schema-cache', | ||
action='store_const', | ||
const=True, | ||
default=None, | ||
dest='selected_schema_cache', | ||
help=''' | ||
Pre cache objects of schema relevant to selected resource only. | ||
''' | ||
) | ||
schema_cache_flag.add_argument( | ||
'--no-selected-schema-cache', | ||
action='store_const', | ||
const=False, | ||
dest='selected_schema_cache', | ||
help=''' | ||
Pre cache objects of all schema. | ||
''' | ||
) | ||
|
||
subs = p.add_subparsers(title="Available sub-commands") | ||
|
||
base_subparser = _build_base_subparser() | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -436,8 +436,9 @@ def defer_to_manifest(self, adapter, selected_uids: AbstractSet[str]): | |
|
||
def before_run(self, adapter, selected_uids: AbstractSet[str]): | ||
with adapter.connection_named("master"): | ||
self.create_schemas(adapter, selected_uids) | ||
self.populate_adapter_cache(adapter) | ||
required_schemas = self.get_model_schemas(adapter, selected_uids) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jtcohen6 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ChenyuLInx I'm not sure! That's the critical question animating in this issue/PR. I will say, having this PR makes it a lot easier to test.
It's likely that this will be less efficient. So, I think it's important that the behavior remains configurable. I'm not opposed to moving forward with this, in order to test the waters. We may someday wish to turn it on by default. |
||
self.create_schemas(adapter, required_schemas) | ||
self.populate_adapter_cache(adapter, required_schemas) | ||
self.defer_to_manifest(adapter, selected_uids) | ||
self.safe_run_hooks(adapter, RunHookType.Start, {}) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -390,8 +390,11 @@ def _mark_dependent_errors(self, node_id, result, cause): | |
for dep_node_id in self.graph.get_dependent_nodes(node_id): | ||
self._skipped_children[dep_node_id] = cause | ||
|
||
def populate_adapter_cache(self, adapter): | ||
adapter.set_relations_cache(self.manifest) | ||
def populate_adapter_cache(self, adapter, required_schemas: Set[BaseRelation] = None): | ||
if flags.SELECTED_SCHEMA_CACHE is True: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feels like we can just remove this function and call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
adapter.set_relations_cache(self.manifest, required_schemas=required_schemas) | ||
else: | ||
adapter.set_relations_cache(self.manifest) | ||
|
||
def before_hooks(self, adapter): | ||
pass | ||
|
@@ -489,8 +492,7 @@ def get_model_schemas(self, adapter, selected_uids: Iterable[str]) -> Set[BaseRe | |
|
||
return result | ||
|
||
def create_schemas(self, adapter, selected_uids: Iterable[str]): | ||
required_schemas = self.get_model_schemas(adapter, selected_uids) | ||
def create_schemas(self, adapter, required_schemas: Set[BaseRelation]): | ||
# we want the string form of the information schema database | ||
required_databases: Set[BaseRelation] = set() | ||
for required in required_schemas: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are here, let's just remove this function and move all of the logic into
set_relations_cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these methods are defined in the
adapters
module, these are potentially breaking changes to the adapter interface. E.g. the Postgres plugin reimplements this method (currently with two arguments), and would require updating:dbt-core/plugins/postgres/dbt/adapters/postgres/impl.py
Lines 129 to 131 in 5e0a765