Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: delete multiple keys with one call #54

Open
GoogleCodeExporter opened this issue Mar 16, 2015 · 11 comments
Open

Enhancement: delete multiple keys with one call #54

GoogleCodeExporter opened this issue Mar 16, 2015 · 11 comments

Comments

@GoogleCodeExporter
Copy link
Contributor

This is a lot more efficient then making multiple calls to the tracker for each 
file you want to delete.
I've attached a patch that does this. Has been working well for me.




Original issue reported on code.google.com by [email protected] on 16 Jan 2012 at 5:56

Attachments:

@GoogleCodeExporter
Copy link
Contributor Author

just a heads up that the patch you attached is susceptible to an SQL injection 
attack here:

    my $in = join("','", @$keys);
    my $sth = $self->dbh->prepare("SELECT fid FROM file WHERE dkey IN ('$in')");
    $sth->execute;

you should never use input directly in SQL queries, instead you should use 
bound parameters instead like this:

my $sth = $self->dbh->prepare("SELECT fid FROM file WHERE dkey IN (" . 
join(',', (('?') x scalar @$keys)) . ")");
$sth->execute(@$keys);

this chunk of code: (('?') x scalar @$keys) generates an array of '?' strings 
that is the same size as @$keys

Original comment by daniel.frett on 17 Feb 2012 at 4:05

@GoogleCodeExporter
Copy link
Contributor Author

Good point. I've attached is a newer version of the patch with that suggestion 
implemented and also uses the already existing enqueue_fids_to_delete2 instead 
of enqueue_for_delete2_multiple which was supplied by the patch.

Original comment by [email protected] on 17 Feb 2012 at 9:17

Attachments:

@GoogleCodeExporter
Copy link
Contributor Author

Looks better. At a glance, I think it could use some hardcoded (or configured 
fetched via server_settings_cached) limit on how many keys you can request for 
delete at once. Just so people can't shoot themselves in the foot.

Original comment by [email protected] on 23 Feb 2012 at 8:42

@GoogleCodeExporter
Copy link
Contributor Author

good call, here is a new patch against 2.59 that uses server_setting_cached 
with a default of 1000 when not set.

Original comment by [email protected] on 9 Mar 2012 at 12:15

Attachments:

@GoogleCodeExporter
Copy link
Contributor Author

Moving this to "accepted" - hope to pull it in either this cycle or next.

Original comment by [email protected] on 20 Jun 2012 at 12:56

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Contributor Author

I found a bug in my code where I'm passing the file ids to 
enqueue_fids_to_delete2.
An array referenced is passed when an array should be.

in Store.pm, delete_keys_multiple
$self->enqueue_fids_to_delete2(\@fidids);
should be:
$self->enqueue_fids_to_delete2(@fidids);

Original comment by [email protected] on 24 Jul 2012 at 6:16

@GoogleCodeExporter
Copy link
Contributor Author

[deleted comment]

1 similar comment
@GoogleCodeExporter
Copy link
Contributor Author

[deleted comment]

@GoogleCodeExporter
Copy link
Contributor Author

I found a bug in the original implementation of this that caused orphaned 
entries in the file table when there was deadlock on file_to_delete2 because 
that part was retried but it rolled back the delete from file. I've changed 
delete_keys_multiple to retry all commands on deadlock instead of just the 
insert into file_to_delete2. I've verified the code below fixes the issue when 
there is deadlock.

sub delete_keys_multiple {
    my ($self, $keys, $limit) = @_;
    my $sth = $self->dbh->prepare("SELECT fid,dkey FROM file WHERE dkey IN (" . join(',', (('?') x scalar @$keys)) . ") LIMIT ?");
    $sth->execute(@$keys, $limit);
    my (@fidids, @row, @deleted);
    while(my @row = $sth->fetchrow_array()){
        push @fidids, $row[0];
        push @deleted, $row[1];
    }
    my $in = join(",", @fidids);
    $self->dbh->begin_work();
    my $nexttry = $self->unix_timestamp;
    $self->retry_on_deadlock(sub {
        $self->dbh->do("DELETE FROM file WHERE fid IN ($in)");
        $self->dbh->do("DELETE FROM tempfile WHERE fid IN ($in)");
        $self->dbh->do($self->ignore_replace . " INTO file_to_delete2 (fid,
        nexttry) VALUES " .
                       join(",", map { "(" . int($_) . ", $nexttry)" } @fidids));
    });
    $self->condthrow;
    $self->dbh->commit();
    return \@deleted;
}

Original comment by [email protected] on 4 Jan 2013 at 7:06

@GoogleCodeExporter
Copy link
Contributor Author

Hi

Comment from Eric on the ML:

"This seems to ignore dmid when deleting keys, so it can inadvertantly
delete keys in a different domain than the one specified."

Can this be fixed?

Thanks!

Original comment by [email protected] on 3 Feb 2013 at 12:19

@GoogleCodeExporter
Copy link
Contributor Author

Good point.

Here is a new function for Store.pm

sub delete_keys_multiple {
    my ($self, $dmid, $keys, $limit) = @_;
    my $sth = $self->dbh->prepare("SELECT fid,dkey FROM file WHERE dmid=? AND dkey IN (" . join(',', (('?') x scalar @$keys)) . ") LIMIT ?");
    $sth->execute($dmid, @$keys, $limit);
    my (@fidids, @row, @deleted);
    while(my @row = $sth->fetchrow_array()){
        push @fidids, $row[0];
        push @deleted, $row[1];
    }
    my $in = join(",", @fidids);
    $self->dbh->trace(2, '/tmp/dbi_trace.out');
    $self->dbh->begin_work();
    my $nexttry = $self->unix_timestamp;
    $self->retry_on_deadlock(sub {
        $self->dbh->do("DELETE FROM file WHERE fid IN ($in)");
        $self->dbh->do("DELETE FROM tempfile WHERE fid IN ($in)");
        $self->dbh->do($self->ignore_replace . " INTO file_to_delete2 (fid,
        nexttry) VALUES " .
                       join(",", map { "(" . int($_) . ", $nexttry)" } @fidids));
    });
    $self->condthrow;
    $self->dbh->commit();
    $self->dbh->trace(0);
    return \@deleted;
}

Here is a new function for Worker/Query.pm

sub cmd_delete_multiple {
    my MogileFS::Worker::Query $self = shift;
    my $args = shift;

    # validate domain for plugins
    $args->{dmid} = $self->check_domain($args)
        or return $self->err_line('domain_not_found');

    # now invoke the plugin, abort if it tells us to
    my $rv = MogileFS::run_global_hook('cmd_delete_multiple', $args);
    return $self->err_line('plugin_aborted')
        if defined $rv && ! $rv;

    my $limit = $args->{limit};
    my $max_limit = MogileFS::Config->server_setting_cached('delete_multiple_limit') || 1000;

    $limit ||= $max_limit;
    $limit += 0;
    $limit = $max_limit if $limit > $max_limit;

    # validate parameters
    my $dmid = $args->{dmid};
    my @keys;
    my $i = 0;
    for my $arg ( keys %$args ){
        if ( $arg =~ /^key\d/ ) {
            $keys[$i] = $args->{$arg};
            $i++;
        }
    }
    debug("delete_mutiple: $limit: @keys");
    my $deleted = Mgd::get_store()->delete_keys_multiple($args->{dmid}, \@keys, $limit);
    my $ret = { key_count => 0 };
    foreach my $del_key (@$deleted){
        $ret->{key_count}++;
        $ret->{"key$ret->{key_count}"} = $del_key;
    }
    return $self->ok_line($ret);
}

Original comment by [email protected] on 14 May 2013 at 8:28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant