Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP:MySQL Directory Format #84

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shivansh02
Copy link
Contributor

Adds support for directory format for MySQL databases.

To Do:

  • Write new/modify unit tests
  • Modify E2E tests

@shivansh02 shivansh02 marked this pull request as draft August 8, 2024 11:45
@shivansh02
Copy link
Contributor Author

To test the feature, I tried running mysqldump on a database but got the following error:
The MySQL server is running with the --secure-file-priv option so it cannot execute this statement](https://stackoverflow.com/questions/74932524/mysql-error-code-1290-the-mysql-server-is-running-with-the-secure-file-priv)

To fix that, I added the following lines to mysql config files:

[mysqld]
secure-file-priv = ""

Now, the error I'm getting is
unknown variable 'secure-file-priv='
I removed the variable from all the config files I tried setting it in, the error still remains.
Tried to allow AppArmor according to this answer, but to no avail:
https://stackoverflow.com/a/57502632/20831469

@witten
Copy link
Collaborator

witten commented Aug 20, 2024

Maybe try secure_file_priv with underscores instead of dashes? Although these docs say an empty value has "no effect" which is not reassuring. You could also try setting it to the actual directory that you'll be dumping to.

Copy link
Collaborator

@witten witten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of making these changes to the MariaDB database hook as well, as it's pretty identical to the MySQL hook right now? The main difference is that the specific binaries it runs have different names.

@@ -86,6 +86,10 @@ def execute_dump_command(
mysql_dump_command = tuple(
shlex.quote(part) for part in shlex.split(database.get('mysql_dump_command') or 'mysqldump')
)

dump_format = database.get('format')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: This variable could easily inlined given that it's not used anywhere else besides the next line.

extra_environment=extra_environment,
)
return None
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could lose the else: given that the other code path returns early. (Do not feel strongly.)

log_prefix,
dump_path,
dump_database_names,
(dump_name,),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following this changed and the earlier deletions. Don't you still need some of that logic if the database name is all?

if 'restore_options' in data_source
else ()

dump_format = data_source.get('format')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here; could be inlined.

+ (('--port', str(port)) if port else ())
+ (('--protocol', 'tcp') if hostname or port else ())
+ (('--user', username) if username else ())
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something.. These two commands look identical to me. Maybe it's just still WIP.

)
for file in os.listdir(dump_directory):
file_path = os.path.join(dump_directory, file)
if file.endswith('.sql'):
Copy link
Collaborator

@witten witten Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say these are .txt files...?

@shivansh02 shivansh02 closed this Aug 21, 2024
@shivansh02 shivansh02 reopened this Aug 21, 2024
@shivansh02
Copy link
Contributor Author

What do you think of making these changes to the MariaDB database hook as well, as it's pretty identical to the MySQL hook right now? The main difference is that the specific binaries it runs have different names.

Yes I proposed making similar changes to MariaDB hook as well. I'll get on it as soon as I can get this PR approved, so it saves time.

@shivansh02 shivansh02 requested a review from witten August 22, 2024 19:44
@shivansh02
Copy link
Contributor Author

@witten I made changes that hopefully resolve all the issues above. Please take a look.

Copy link
Collaborator

@witten witten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got around to giving this a look. Thank you so much for your patience! I'll try to be a little faster on the next round.

@@ -61,27 +61,39 @@ def database_names_to_dump(database, extra_environment, log_prefix, dry_run):
)


def ensure_directory_exists(path):
if not os.path.exists(path):
os.makedirs(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually replace this whole function with a call to:

os.makedirs(path, exist_ok=True)

And that should have the same effect! More info in the docs: https://docs.python.org/3/library/os.html#os.makedirs


if os.path.exists(dump_filename):
logger.warning(
f'{log_prefix}: Skipping duplicate dump of MySQL database "{database_name}" to {dump_filename}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this duplicate-skipping logic also apply to directory-style dumps? They don't have a single filename, but they do have a dump dir that may or may not already exist. (Do not feel strongly.)


if use_tab:
ensure_directory_exists(dump_path)
dump_dir = os.path.join(dump_path, database_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that dump.make_data_source_dump_filename() doesn't work for this purpose? That would have the benefit of using the standard directory layout that other dump formats use (~/.borgmatic/mysql_databases/[hostname]/[dbname]).

@@ -96,16 +108,17 @@ def execute_dump_command(
+ (('--user', database['username']) if 'username' in database else ())
+ ('--databases',)
+ database_names
+ ('--result-file', dump_filename)
+ result_file_option
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

if os.path.exists(dump_filename):
logger.warning(
f'{log_prefix}: Skipping duplicate dump of MySQL database "{database_name}" to {dump_filename}'
use_tab = '--tab' in (database.get('options') or '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to look at the database's configured format instead. e.g. database.get('format') == 'directory' or whatever.

Maybe I'm not understanding the schema.

'''
return any(databases)
return not any('--tab' in (db.get('options') or '') for db in databases)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here as above. Shouldn't this use format instead of options? And again, let me know if I'm just missing something.


Return a sequence of subprocess.Popen instances for the dump processes ready to spew to a named
pipe. But if this is a dry run, then don't actually dump anything and return an empty sequence.
Return a sequence of subprocess.Popen instances for the dump processes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect the directory format dumps to return any streaming Popen instances since presumably they won't stream! So that logic might need to change in this function for that case.

The existing PostgreSQL hook does something similar for the directory format.

restore_command = (
mysql_restore_command
+ ('--batch',)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this still needed/beneficial? We certainly don't want interactive behavior from mysql when it's invoked from borgmatic.

return

for filename in os.listdir(dump_directory):
if filename.endswith('.sql'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I might just be reading this wrong, but shouldn't these be .txt files, not .sql files? https://dev.mysql.com/doc/refman/8.4/en/mysqldump.html#option_mysqldump_tab

file_path = os.path.join(dump_directory, filename)
logger.debug(f"{log_prefix}: Restoring from {file_path}")
with open(file_path, 'r') as sql_file:
execute_command_with_processes(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want the _with_processes() variant for this use case. That's intended for passing in, say, an extract_process so as to extract a single file into a restore command via background streaming. But that's not what you're doing here; this is presumably a synchronous call (... it's not streaming from borg extract). So, for that, maybe try just execute_command()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants