Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DB Performance improvements #1899

Merged

Conversation

tanscorpio7
Copy link
Contributor

@tanscorpio7 tanscorpio7 commented Oct 9, 2023

Description

In Multi DB mode, as the number of databases increases, so does the time to create the next new DB.
This is because we create three internal roles for each new DB and internally when run the DB subcommands, multiple calls to roles_is_member_of("sysadmin") is made. Now that output of this list contains all the three roles of every db created. This is the major reason for the perfomance degradation of CREATE DB command.

We fix this in three different places.

  1. getAvailDbid - this functions makes a call to nextval function, which by default checks for current user's permission and makes a call to roles_is_member_of. Instead we could call the nextval_internal which is the same function but with the additional option of check permissions flag which we will set to false. To double check we can just ensure that the current user is "sysadmin" when getAvailDbid is called. (Currently we only call this when user is sysadmin)

  2. Set temporary user when creating schema - when we create the dbo and guest schema for the new database, the create schema function fetches all the roles that current role is member of (recursively) to check if if current role can actually become the target schema owner role. To bypass this we can assume the newdb_dbo role when creating these schemas. In this case all the roles that newdb_dbo is member of will be fetched, but this list is much smaller than sysadmin.

  3. Select best grantor - Select best grantor first fetches the roles_list that sysadmin is member of and then start checking for permissions. But sysadmin is always the first to be checked. That is sysadmin is always top of the roles_list.
    We can add a quick check to this. That is, first check if current role is sysadmin and can it give us all the permission needed. If yes, simply return. Note** This does not change any behaviour since this will anyway be done in the first loop after fetching roles_list. We are instead running the first loop before fetching the whole list.

  4. Set newdb_dbo user when creating sysdatabases view in new db

Create DB After Before
DBs10 2ms 5ms
DBs500 3ms 163ms
DBs1000 4ms 740ms
DBs3000 9ms NA(~18secs)

Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#234

Extension PR: #1899

Issues Resolved

BABEL-3869

Signed-off-by: Tanzeel Khan [email protected]

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tanzeel Khan <[email protected]>
@shalinilohia50
Copy link
Contributor

Could you please mention how much improvement we have observed after these changes?

@tanscorpio7 tanscorpio7 changed the title BABEL_4438 Create DB Performance improvements Create DB Performance improvements Oct 9, 2023
@forestkeeper forestkeeper merged commit 08c886f into babelfish-for-postgresql:BABEL_3_X_DEV Oct 12, 2023
ParikshitSarode pushed a commit to amazon-aurora/babelfish_extensions that referenced this pull request Oct 17, 2023
In Multi DB mode, as the number of databases increases, so does the time to create the next new DB.
This is because we create three internal roles for each new DB and internally when run the DB subcommands, multiple calls to roles_is_member_of("sysadmin") is made. Now that output of this list contains all the three roles of every db created. This is the major reason for the perfomance degradation of CREATE DB command.

We fix this in three different places.

1. getAvailDbid - this functions makes a call to nextval function, which by default checks for current user's permission and makes a call to roles_is_member_of. Instead we could call the nextval_internal which is the same function but with the additional option of check permissions flag which we will set to false. To double check we can just ensure that the current user is "sysadmin" when getAvailDbid is called. (Currently we only call this when user is sysadmin)

2. Set temporary user when creating schema - when we create the dbo and guest schema for the new database, the create schema function fetches all the roles that current role is member of (recursively) to check if if current role can actually become the target schema owner role. To bypass this we can assume the newdb_dbo role when creating these schemas. In this case all the roles that newdb_dbo is member of will be fetched, but this list is much smaller than sysadmin.

3. Select best grantor - Select best grantor first fetches the roles_list that sysadmin is member of and then start checking for permissions. But sysadmin is always the first to be checked. That is sysadmin is always top of the roles_list.
We can add a quick check to this. That is, first check if current role is sysadmin and can it give us all the permission needed. If yes, simply return. Note** This does not change any behaviour since this will anyway be done in the first loop after fetching roles_list. We are instead running the first loop before fetching the whole list.

4. Set newdb_dbo user when creating sysdatabases view in new db

Task: BABEL-3869
Signed-off-by: Tanzeel Khan <[email protected]>
@tanscorpio7 tanscorpio7 deleted the BABEL_4438 branch October 20, 2023 10:00
@tanscorpio7 tanscorpio7 mentioned this pull request Oct 21, 2023
1 task
Deepesh125 pushed a commit to babelfish-for-postgresql/postgresql_modified_for_babelfish that referenced this pull request Oct 24, 2023
In Multi DB mode, as the number of databases increases, so does the time to create the next new DB.

This is because we create three internal roles for each new DB and internally when run the DB subcommands, multiple
calls to roles_is_member_of("sysadmin") is made. Now that output of this list contains all the three roles of every db
created. This is the major reason for the perfomance degradation of CREATE DB command.

We fix this in three different places.

getAvailDbid - this functions makes a call to nextval function, which by default checks for current user's permission and
makes a call to roles_is_member_of. Instead we could call the nextval_internal which is the same function but with the
additional option of check permissions flag which we will set to false. To double check we can just ensure that the
current user is "sysadmin" when getAvailDbid is called. (Currently we only call this when user is sysadmin)

Set temporary user when creating schema - when we create the dbo and guest schema for the new database, the
create schema function fetches all the roles that current role is member of (recursively) to check if if current role can
actually become the target schema owner role. To bypass this we can assume the newdb_dbo role when creating these
schemas. In this case all the roles that newdb_dbo is member of will be fetched, but this list is much smaller than
sysadmin.

Select best grantor - Select best grantor first fetches the roles_list that sysadmin is member of and then start checking
for permissions. But sysadmin is always the first to be checked. That is sysadmin is always top of the roles_list.
We can add a quick check to this. That is, first check if current role is sysadmin and can it give us all the permission
needed. If yes, simply return. Note** This does not change any behaviour since this will anyway be done in the first loop
after fetching roles_list. We are instead running the first loop before fetching the whole list.

Engine PR: #234
Extension PR: babelfish-for-postgresql/babelfish_extensions#1899
Extension PR: (cache sysadmin oid) babelfish-for-postgresql/babelfish_extensions#1942

Task: BABEL-4438
Signed-off-by: Tanzeel Khan <[email protected]>
Deepesh125 pushed a commit that referenced this pull request Oct 24, 2023
This is continuation of the original PR #1899. We cache the sysadmin role oid and expose it through a hook for engine
code changes

Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#234
Extension PR: #1899
Extension PR: (cache sysadmin oid) #1942

Signed-off-by: Tanzeel Khan <[email protected]>
ahmed-shameem pushed a commit to amazon-aurora/babelfish_extensions that referenced this pull request Oct 25, 2023
In Multi DB mode, as the number of databases increases, so does the time to create the next new DB.
This is because we create three internal roles for each new DB and internally when run the DB subcommands, multiple calls to roles_is_member_of("sysadmin") is made. Now that output of this list contains all the three roles of every db created. This is the major reason for the perfomance degradation of CREATE DB command.

We fix this in three different places.

1. getAvailDbid - this functions makes a call to nextval function, which by default checks for current user's permission and makes a call to roles_is_member_of. Instead we could call the nextval_internal which is the same function but with the additional option of check permissions flag which we will set to false. To double check we can just ensure that the current user is "sysadmin" when getAvailDbid is called. (Currently we only call this when user is sysadmin)

2. Set temporary user when creating schema - when we create the dbo and guest schema for the new database, the create schema function fetches all the roles that current role is member of (recursively) to check if if current role can actually become the target schema owner role. To bypass this we can assume the newdb_dbo role when creating these schemas. In this case all the roles that newdb_dbo is member of will be fetched, but this list is much smaller than sysadmin.

3. Select best grantor - Select best grantor first fetches the roles_list that sysadmin is member of and then start checking for permissions. But sysadmin is always the first to be checked. That is sysadmin is always top of the roles_list.
We can add a quick check to this. That is, first check if current role is sysadmin and can it give us all the permission needed. If yes, simply return. Note** This does not change any behaviour since this will anyway be done in the first loop after fetching roles_list. We are instead running the first loop before fetching the whole list.

4. Set newdb_dbo user when creating sysdatabases view in new db

Task: BABEL-3869
Signed-off-by: Tanzeel Khan <[email protected]>
Sairakan pushed a commit to amazon-aurora/postgresql_modified_for_babelfish that referenced this pull request Nov 16, 2023
In Multi DB mode, as the number of databases increases, so does the time to create the next new DB.

This is because we create three internal roles for each new DB and internally when run the DB subcommands, multiple
calls to roles_is_member_of("sysadmin") is made. Now that output of this list contains all the three roles of every db
created. This is the major reason for the perfomance degradation of CREATE DB command.

We fix this in three different places.

getAvailDbid - this functions makes a call to nextval function, which by default checks for current user's permission and
makes a call to roles_is_member_of. Instead we could call the nextval_internal which is the same function but with the
additional option of check permissions flag which we will set to false. To double check we can just ensure that the
current user is "sysadmin" when getAvailDbid is called. (Currently we only call this when user is sysadmin)

Set temporary user when creating schema - when we create the dbo and guest schema for the new database, the
create schema function fetches all the roles that current role is member of (recursively) to check if if current role can
actually become the target schema owner role. To bypass this we can assume the newdb_dbo role when creating these
schemas. In this case all the roles that newdb_dbo is member of will be fetched, but this list is much smaller than
sysadmin.

Select best grantor - Select best grantor first fetches the roles_list that sysadmin is member of and then start checking
for permissions. But sysadmin is always the first to be checked. That is sysadmin is always top of the roles_list.
We can add a quick check to this. That is, first check if current role is sysadmin and can it give us all the permission
needed. If yes, simply return. Note** This does not change any behaviour since this will anyway be done in the first loop
after fetching roles_list. We are instead running the first loop before fetching the whole list.

Engine PR: babelfish-for-postgresql#234
Extension PR: babelfish-for-postgresql/babelfish_extensions#1899
Extension PR: (cache sysadmin oid) babelfish-for-postgresql/babelfish_extensions#1942

Task: BABEL-4438
Signed-off-by: Tanzeel Khan <[email protected]>
priyansx pushed a commit to amazon-aurora/postgresql_modified_for_babelfish that referenced this pull request Nov 22, 2023
In Multi DB mode, as the number of databases increases, so does the time to create the next new DB.

This is because we create three internal roles for each new DB and internally when run the DB subcommands, multiple
calls to roles_is_member_of("sysadmin") is made. Now that output of this list contains all the three roles of every db
created. This is the major reason for the perfomance degradation of CREATE DB command.

We fix this in three different places.

getAvailDbid - this functions makes a call to nextval function, which by default checks for current user's permission and
makes a call to roles_is_member_of. Instead we could call the nextval_internal which is the same function but with the
additional option of check permissions flag which we will set to false. To double check we can just ensure that the
current user is "sysadmin" when getAvailDbid is called. (Currently we only call this when user is sysadmin)

Set temporary user when creating schema - when we create the dbo and guest schema for the new database, the
create schema function fetches all the roles that current role is member of (recursively) to check if if current role can
actually become the target schema owner role. To bypass this we can assume the newdb_dbo role when creating these
schemas. In this case all the roles that newdb_dbo is member of will be fetched, but this list is much smaller than
sysadmin.

Select best grantor - Select best grantor first fetches the roles_list that sysadmin is member of and then start checking
for permissions. But sysadmin is always the first to be checked. That is sysadmin is always top of the roles_list.
We can add a quick check to this. That is, first check if current role is sysadmin and can it give us all the permission
needed. If yes, simply return. Note** This does not change any behaviour since this will anyway be done in the first loop
after fetching roles_list. We are instead running the first loop before fetching the whole list.

Engine PR: babelfish-for-postgresql#234
Extension PR: babelfish-for-postgresql/babelfish_extensions#1899
Extension PR: (cache sysadmin oid) babelfish-for-postgresql/babelfish_extensions#1942

Task: BABEL-4438
Signed-off-by: Tanzeel Khan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants