Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CrateDB bulk operations for improved DML efficiency #143

Merged
merged 2 commits into from
May 4, 2023

Conversation

amotl
Copy link
Member

@amotl amotl commented Apr 26, 2023

About

At GH-139, we discovered the driver does not utilize the CrateDB HTTP bulk operations interface yet. This patch aims to address this shortcoming, by adding a proprietary code path, similar to what the Python driver does, and similarly easy to use.

Documentation

https://crate-pdo--143.org.readthedocs.build/en/143/connect.html#bulk-operations

/cc @hlcianfagna, @mkleen, @hammerhead, @proddata, @seut, @matriv

@amotl amotl changed the title Add support for CrateDB bulk operations for improved DML efficiency [WIP] Add support for CrateDB bulk operations for improved DML efficiency Apr 26, 2023
@@ -42,6 +42,7 @@ class PDO extends BasePDO implements PDOInterface

public const CRATE_ATTR_HTTP_BASIC_AUTH = 1000;
public const CRATE_ATTR_DEFAULT_SCHEMA = 1001;
public const CRATE_ATTR_BULK_MODE = 1009;
Copy link
Member Author

@amotl amotl Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firsthand, I thought it would be a good idea to introduce bulk mode operations, well, by adding another CRATE_ATTR_BULK_MODE item to the available custom PDO settings for CrateDB.

In general, it worked well so far to route this attribute through the layers of the machinery, in order to fulfil its main goal....

Comment on lines 351 to 377
/**
* Verify support for CrateDB bulk-operations endpoint.
* https://crate.io/docs/crate/reference/en/5.2/interfaces/http.html#bulk-operations
*/
public function testInsertBulk()
{
// Insert records in bulk mode.
$this->pdo->setAttribute(PDO::CRATE_ATTR_BULK_MODE, true);
$parameters = [[5, 'foo', 1], [6, 'bar', 2], [7, 'foo', 3], [8, 'bar', 4]];
$statement = $this->pdo->prepare('INSERT INTO test_table (id, name, int_type) VALUES (?, ?, ?)');
$this->assertTrue($statement->execute($parameters));

// Verify records have been inserted correctly.
$this->pdo->setAttribute(PDO::CRATE_ATTR_BULK_MODE, false);
$this->pdo->exec("REFRESH TABLE test_table");
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table");
$results = $statement->fetchAll(PDO::FETCH_NUM);
$this->assertEquals([5, 'foo', 1], $results[0]);
$this->assertEquals([8, 'bar', 4], $results[3]);
}

Copy link
Member Author

@amotl amotl Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... however, it turned out that it has a usability glitch in its current form. Because bulk operations are only applicable to DML operations (INSERT, UPDATE, DELETE), and deliver a response of a different shape, they are not suitable for SELECT-type statements.

Because the CRATE_ATTR_BULK_MODE setting is currently handled on the PDO instance level, it currently needs to be switched back to non-bulk mode, in order to conduct SELECT statements properly.

That's obviously not a good way to work with that mode, so I will try to push back this feature to be enabled on the PDOStatement-level only, by using its $options dictionary and the corresponding bulkMode slot, probably removing CRATE_ATTR_BULK_MODE completely again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dissolved PDO::CRATE_ATTR_BULK_MODE again with c7d4b3b. CrateDB bulk operations will exclusively be available by creating a dedicated PDOStatement instance, with ->prepare($sql, array("bulkMode" => true)), as outlined in the documentation example. This interface is modeless, and as such simpler and better.

-- https://crate-pdo--143.org.readthedocs.build/en/143/connect.html#bulk-operations

Comment on lines +194 to +184
if ($bulk_mode) {
return new BulkResponse(
$responseBody['results'],
$responseBody['cols'],
$responseBody['duration']
);
} else {
return new Collection(
$responseBody['rows'],
$responseBody['cols'],
$responseBody['duration'],
$responseBody['rowcount']
);
}
Copy link
Member Author

@amotl amotl Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As outlined at CrateDB HTTP bulk operations interface, the response for bulk operations is slightly different from the non-bulk response. I've reflected that by following the white rabbit of typedness.

Another option, which I've actually evaluated a bit beforehand, for easyness, would have been to converge both result formats into the same response object type. In Python DBAPI, where this is only a dictionary 1, it is not of any concern that this data structure does not have any kind of enforcing schema -- however, in PHP/PDO world, it's differently as it seems both are heavily based on typing in this area. That's why I chose to dedicate a specific response type in PHP now, called BulkResponse.

Nevertheless, if you think converging it into a single response type would be a better option, similar to how the Python driver does it, I will take it into consideration.

Footnotes

  1. https://github.com/crate/crate-python/blob/0.31.1/src/crate/client/cursor.py#L76-L89

@amotl amotl force-pushed the amo/bulk-operations branch from 0ce9e90 to 53bae4a Compare May 3, 2023 18:34
@amotl amotl marked this pull request as ready for review May 3, 2023 18:41
@amotl amotl changed the title [WIP] Add support for CrateDB bulk operations for improved DML efficiency Add support for CrateDB bulk operations for improved DML efficiency May 3, 2023
Copy link

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!
Left a comment about link to crate docs and another one about methods not covered by tests?

Please wait for a review by @seut as well who has experience with php code.

*
* Bulk operations are only supported for `INSERT`, `UPDATE`, and `DELETE` statements.
*
* https://crate.io/docs/crate/reference/en/5.2/interfaces/http.html#bulk-operations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we maybe replace 5.2 with latest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 3e4eb52.

/**
* {@Inheritdoc}
*/
public function map(callable $callback): array
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, it seems that those methods are not covered by the tests currently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BulkResponse is undertested, right. I will improve it.

Copy link
Member Author

@amotl amotl May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed with 06bd4a5. BulkResponseTest.php is now a 1:1 copy of CollectionTest.php.

Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I've added a few comments mainly related to testing.

Comment on lines 360 to 361
$retval = $statement->execute($parameters);
$this->assertTrue($statement->execute($parameters));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this execute the insert twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a leftover from moving code around. Thanks for spotting.

// Insert records in bulk mode.
$parameters = [[5, 'foo', 1], [6, 'bar', 2], [7, 'foo', 3], [8, 'bar', 4]];
$statement = $this->pdo->prepare('INSERT INTO test_table (id, name, int_type) VALUES (?, ?, ?)', array("bulkMode" => true));
$retval = $statement->execute($parameters);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert the result and such the BulkResponse? Seems like there is no test ensuring that the BulkResponse work as expected or do I miss it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, there is still a gap around testing the newly introduced BulkResponse object. I will try to fill the gap.

Copy link
Member Author

@amotl amotl May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed with 06bd4a5. BulkResponseTest.php is now a 1:1 copy of CollectionTest.php. Other than this, the patch fixes other shortcomings which prevented the BulkResponse being propagated properly.


// Verify records have been inserted correctly.
$this->pdo->exec("REFRESH TABLE test_table");
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no ORDER BY is defined, results can be flaky as the ordering isn't deterministic otherwise.

Suggested change
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table");
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table ORDER BY id");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 8a8e369.

@amotl amotl requested review from matriv and seut May 4, 2023 11:39
@amotl
Copy link
Member Author

amotl commented May 4, 2023

@seut said:

I've added a few comments mainly related to testing.

Thank you. You have been right on the spot, because some details wrt. to processing BulkResponse instances have not been in order yet. It looks better now, and the standalone examples added with GH-145 demonstrate that.

Also, the documentation has been improved to emphasize this important detail:

Please note that you must use PDO::FETCH_NUM on the fetch operation, because the response object type BulkResponse is different than the regular response type Collection. 1

Footnotes

  1. https://crate-pdo--143.org.readthedocs.build/en/143/connect.html#bulk-operations

Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I suggest to squash at least some, maybe even all commits.

@amotl amotl force-pushed the amo/bulk-operations branch 4 times, most recently from 67b4c84 to 8c706da Compare May 4, 2023 14:54
@amotl amotl force-pushed the amo/bulk-operations branch 2 times, most recently from 4d628d2 to e79518c Compare May 4, 2023 17:21
https://crate.io/docs/crate/reference/en/latest/interfaces/http.html#bulk-operations

- In order to use the bulk operations interface, a `PDOStatement` needs
  to be prepared using the `bulkMode` option, like
  `->prepare($sql, ["bulkMode" => true])`.
- The interface of `BulkResponse` has been made compatible with
  `Collection`, specifically wrt. the `getRows()` method, in order to
  return data from the driver without needing other proprietary methods.
- In order to propagate the non-standard bulk response shape back, the
  user has to select the `PDO::FETCH_NUM` fetch style.
- Documentation: Add two example programs about insert operations
@amotl amotl force-pushed the amo/bulk-operations branch from e79518c to e092c0d Compare May 4, 2023 17:28
@amotl amotl merged commit 82d1e2e into main May 4, 2023
@amotl amotl deleted the amo/bulk-operations branch May 4, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants