Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunAllAnyTimeScripts set to true can result in a very large RoundhousE.ScriptsRun Table #382

Open
jonreis opened this issue Jul 22, 2019 · 1 comment

Comments

@jonreis
Copy link

jonreis commented Jul 22, 2019

If RunAllAnyTimeScripts is set to true then every time you run a migration, every AnyTimeScript gets run and is recorded in the RoundhousE.ScriptsRun. This can result in a very large database if you have many AnyTimeScripts.

We run scripts against our database fairly frequently in our dev environments (yes a worst case scenario). Since Feb, 2019 we have over 13K records in this table with the total space taken being 205MB due to the number of entries and the size of the text_of_script.

Because of DefaultDatabase.get_all_scripts(), the entire table must be retrieved and since this table is so large, it takes some time.

My current thought is to delete all but the latest run of a script based on its hash. For example:

DELETE FROM [RoundhousE].[ScriptsRun]
WHERE one_time_script = 0 AND id not in(select MAX(Id) FROM [RoundhousE].[ScriptsRun]
where one_time_script = 0
group by text_hash)

Does this seem like a reasonable approach?

Another option might be to skip recording the script in the table unless it has changed. For example,

            if (!is_dryrun)
            {
               if ((run_this_script_once || this_script_has_changed_since_last_run(script_name, sql_to_run)))
               {
                  record_script_in_scripts_run_table(script_name, sql_to_run, run_this_script_once, version_id);
               }
               this_sql_ran = true;
            }
@jonreis jonreis changed the title RunAllAnyTimeScripts set to true can result in a very large RoundhousE.ScriptsRun RunAllAnyTimeScripts set to true can result in a very large RoundhousE.ScriptsRun Table Jul 23, 2019
@jdupont
Copy link

jdupont commented Jun 4, 2020

Because of DefaultDatabase.get_all_scripts(), the entire table must be retrieved and since this table is so large, it takes some time.

We've fixed the long runtime problem in our dev environments by changing the sql query logic in get_all_scripts() to group by script_name and select only the latest ScriptRun for each script:

protected IList<ScriptsRun> get_all_scripts()
{
            var latest_script_run_for_each_script_name = QueryOver.Of<ScriptsRun>()
                .Select(Projections.SqlGroupProjection(
                    $"MAX({nameof(ScriptsRun.id)}) as maxId",
                    nameof(ScriptsRun.script_name), 
                    new[] { "maxId" }, 
                    new IType[] { NHibernateUtil.Int32 }));

            var query_criteria = QueryOver.Of<ScriptsRun>()
                .WithSubquery
                .WhereProperty(s => s.id).In(latest_script_run_for_each_script_name);

            return retry_policy.Execute(() => repository.get_with_criteria(query_criteria));
}

This won't change the size of the table on disk, but that was less important to us than the added run time in each deployment. For context, we deploy continuously to our dev environment. We're deploying many databases (~30) and get_all_scripts was taking 5-10 seconds per database, so we were looking at adding minutes to each deploy. The change above brings us down to less than a half-second per db, since we're loading only a few thousand records instead of hundreds of thousands.

If there's interest, I'll try to open a PR to apply the above fix to main repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants