-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CrateDB bulk operations for improved DML efficiency #143
Conversation
src/Crate/PDO/PDO.php
Outdated
@@ -42,6 +42,7 @@ class PDO extends BasePDO implements PDOInterface | |||
|
|||
public const CRATE_ATTR_HTTP_BASIC_AUTH = 1000; | |||
public const CRATE_ATTR_DEFAULT_SCHEMA = 1001; | |||
public const CRATE_ATTR_BULK_MODE = 1009; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firsthand, I thought it would be a good idea to introduce bulk mode operations, well, by adding another CRATE_ATTR_BULK_MODE
item to the available custom PDO settings for CrateDB.
In general, it worked well so far to route this attribute through the layers of the machinery, in order to fulfil its main goal....
/** | ||
* Verify support for CrateDB bulk-operations endpoint. | ||
* https://crate.io/docs/crate/reference/en/5.2/interfaces/http.html#bulk-operations | ||
*/ | ||
public function testInsertBulk() | ||
{ | ||
// Insert records in bulk mode. | ||
$this->pdo->setAttribute(PDO::CRATE_ATTR_BULK_MODE, true); | ||
$parameters = [[5, 'foo', 1], [6, 'bar', 2], [7, 'foo', 3], [8, 'bar', 4]]; | ||
$statement = $this->pdo->prepare('INSERT INTO test_table (id, name, int_type) VALUES (?, ?, ?)'); | ||
$this->assertTrue($statement->execute($parameters)); | ||
|
||
// Verify records have been inserted correctly. | ||
$this->pdo->setAttribute(PDO::CRATE_ATTR_BULK_MODE, false); | ||
$this->pdo->exec("REFRESH TABLE test_table"); | ||
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table"); | ||
$results = $statement->fetchAll(PDO::FETCH_NUM); | ||
$this->assertEquals([5, 'foo', 1], $results[0]); | ||
$this->assertEquals([8, 'bar', 4], $results[3]); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.... however, it turned out that it has a usability glitch in its current form. Because bulk operations are only applicable to DML operations (INSERT
, UPDATE
, DELETE
), and deliver a response of a different shape, they are not suitable for SELECT
-type statements.
Because the CRATE_ATTR_BULK_MODE
setting is currently handled on the PDO instance level, it currently needs to be switched back to non-bulk mode, in order to conduct SELECT
statements properly.
That's obviously not a good way to work with that mode, so I will try to push back this feature to be enabled on the PDOStatement
-level only, by using its $options
dictionary and the corresponding bulkMode
slot, probably removing CRATE_ATTR_BULK_MODE
completely again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dissolved PDO::CRATE_ATTR_BULK_MODE
again with c7d4b3b. CrateDB bulk operations will exclusively be available by creating a dedicated PDOStatement
instance, with ->prepare($sql, array("bulkMode" => true))
, as outlined in the documentation example. This interface is modeless, and as such simpler and better.
-- https://crate-pdo--143.org.readthedocs.build/en/143/connect.html#bulk-operations
if ($bulk_mode) { | ||
return new BulkResponse( | ||
$responseBody['results'], | ||
$responseBody['cols'], | ||
$responseBody['duration'] | ||
); | ||
} else { | ||
return new Collection( | ||
$responseBody['rows'], | ||
$responseBody['cols'], | ||
$responseBody['duration'], | ||
$responseBody['rowcount'] | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As outlined at CrateDB HTTP bulk operations interface, the response for bulk operations is slightly different from the non-bulk response. I've reflected that by following the white rabbit of typedness.
Another option, which I've actually evaluated a bit beforehand, for easyness, would have been to converge both result formats into the same response object type. In Python DBAPI, where this is only a dictionary 1, it is not of any concern that this data structure does not have any kind of enforcing schema -- however, in PHP/PDO world, it's differently as it seems both are heavily based on typing in this area. That's why I chose to dedicate a specific response type in PHP now, called BulkResponse
.
Nevertheless, if you think converging it into a single response type would be a better option, similar to how the Python driver does it, I will take it into consideration.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall!
Left a comment about link to crate docs and another one about methods not covered by tests?
Please wait for a review by @seut as well who has experience with php code.
* | ||
* Bulk operations are only supported for `INSERT`, `UPDATE`, and `DELETE` statements. | ||
* | ||
* https://crate.io/docs/crate/reference/en/5.2/interfaces/http.html#bulk-operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we maybe replace 5.2
with latest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 3e4eb52.
/** | ||
* {@Inheritdoc} | ||
*/ | ||
public function map(callable $callback): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, it seems that those methods are not covered by the tests currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BulkResponse is undertested, right. I will improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with 06bd4a5. BulkResponseTest.php
is now a 1:1 copy of CollectionTest.php
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I've added a few comments mainly related to testing.
$retval = $statement->execute($parameters); | ||
$this->assertTrue($statement->execute($parameters)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this execute the insert twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a leftover from moving code around. Thanks for spotting.
// Insert records in bulk mode. | ||
$parameters = [[5, 'foo', 1], [6, 'bar', 2], [7, 'foo', 3], [8, 'bar', 4]]; | ||
$statement = $this->pdo->prepare('INSERT INTO test_table (id, name, int_type) VALUES (?, ?, ?)', array("bulkMode" => true)); | ||
$retval = $statement->execute($parameters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert the result and such the BulkResponse
? Seems like there is no test ensuring that the BulkResponse
work as expected or do I miss it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, there is still a gap around testing the newly introduced BulkResponse object. I will try to fill the gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with 06bd4a5. BulkResponseTest.php
is now a 1:1 copy of CollectionTest.php
. Other than this, the patch fixes other shortcomings which prevented the BulkResponse
being propagated properly.
|
||
// Verify records have been inserted correctly. | ||
$this->pdo->exec("REFRESH TABLE test_table"); | ||
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no ORDER BY
is defined, results can be flaky as the ordering isn't deterministic otherwise.
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table"); | |
$statement = $this->pdo->prepare("SELECT id, name, int_type FROM test_table ORDER BY id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 8a8e369.
@seut said:
Thank you. You have been right on the spot, because some details wrt. to processing Also, the documentation has been improved to emphasize this important detail:
Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I suggest to squash at least some, maybe even all commits.
67b4c84
to
8c706da
Compare
4d628d2
to
e79518c
Compare
https://crate.io/docs/crate/reference/en/latest/interfaces/http.html#bulk-operations - In order to use the bulk operations interface, a `PDOStatement` needs to be prepared using the `bulkMode` option, like `->prepare($sql, ["bulkMode" => true])`. - The interface of `BulkResponse` has been made compatible with `Collection`, specifically wrt. the `getRows()` method, in order to return data from the driver without needing other proprietary methods. - In order to propagate the non-standard bulk response shape back, the user has to select the `PDO::FETCH_NUM` fetch style. - Documentation: Add two example programs about insert operations
About
At GH-139, we discovered the driver does not utilize the CrateDB HTTP bulk operations interface yet. This patch aims to address this shortcoming, by adding a proprietary code path, similar to what the Python driver does, and similarly easy to use.
Documentation
https://crate-pdo--143.org.readthedocs.build/en/143/connect.html#bulk-operations
/cc @hlcianfagna, @mkleen, @hammerhead, @proddata, @seut, @matriv