Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more DriverExceptionHandler to new error code structure #1394

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package io.stargate.sgv2.jsonapi.exception;

/**
* Errors related to the authentication in a request.
*
* <p>See {@link APIException} for steps to add a new code.
*/
public class AuthException extends RequestException {

public static final Scope SCOPE = Scope.AUTHENTICATION;

public AuthException(ErrorInstance errorInstance) {
super(errorInstance);
}

public enum Code implements ErrorCode<AuthException> {
INVALID_TOKEN;

private final ErrorTemplate<AuthException> template;

Code() {
template = ErrorTemplate.load(AuthException.class, FAMILY, SCOPE, name());
}

@Override
public ErrorTemplate<AuthException> template() {
return template;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,14 @@ public DatabaseException(ErrorInstance errorInstance) {
}

public enum Code implements ErrorCode<DatabaseException> {
CLOSED_CONNECTION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold on merging this - we need to work out how to hide errors like the ones below from the users.

  • CLOSED_CONNECTION,
  • DRIVER_TIMEOUT,
    NO_NODE_AVAILABLE,

also, it is sloppy to have a TABLE_WRITE_TIMEOUT, and WRITE_TIMEOUT; we need to work out if these are table specific errors or not.

DRIVER_TIMEOUT,
NO_NODE_AVAILABLE,
READ_FAILURE,
READ_TIMEOUT,
TABLE_WRITE_TIMEOUT,
CLOSED_CONNECTION;
WRITE_FAILURE,
WRITE_TIMEOUT;

private final ErrorTemplate<DatabaseException> template;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ public static Map<String, String> errVars(SchemaObject schemaObject) {
* <pre>
* public RuntimeException handle(TableSchemaObject schemaObject, WriteTimeoutException exception) {
* return DatabaseException.Code.TABLE_WRITE_TIMEOUT.get(
* errFmt(
* errVars(
* schemaObject,
* m -> {
* m.put("blockFor", String.valueOf(exception.getBlockFor()));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
* etc is kept away the "business logic" of how to handle the exceptions.
* </ul>
*
* <p>Users of the interface shoudl call {@link #maybeHandle(SchemaObject, RuntimeException)} and
* <p>Users of the interface should call {@link #maybeHandle(SchemaObject, RuntimeException)} and
* then throw or otherwise deal with the object returned, which wil be the original exception or a
* new one. Example:
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ public RequestException(ErrorInstance errorInstance) {
}

public enum Scope implements ErrorScope {
DOCUMENT,
/** See {@link AuthException} */
AUTHENTICATION,
/** See {@link DocumentException} */
DOCUMENT,
/** See {@link FilterException} */
FILTER;

/** See {@link FilterException} */
@Override
public String scope() {
return name();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,18 @@

import static io.stargate.sgv2.jsonapi.exception.ErrorFormatters.errVars;

import com.datastax.oss.driver.api.core.AllNodesFailedException;
import com.datastax.oss.driver.api.core.DriverTimeoutException;
import com.datastax.oss.driver.api.core.NoNodeAvailableException;
import com.datastax.oss.driver.api.core.auth.AuthenticationException;
import com.datastax.oss.driver.api.core.connection.ClosedConnectionException;
import com.datastax.oss.driver.api.core.metadata.Node;
import com.datastax.oss.driver.api.core.servererrors.*;
import io.stargate.sgv2.jsonapi.config.constants.ErrorObjectV2Constants.TemplateVars;
import io.stargate.sgv2.jsonapi.exception.AuthException;
import io.stargate.sgv2.jsonapi.exception.DatabaseException;
import java.util.List;
import java.util.Map;

/**
* Default implementation of the {@link DriverExceptionHandler} interface, we keep the interface so
Expand Down Expand Up @@ -33,4 +42,136 @@ public RuntimeException handle(SchemaT schemaObject, ClosedConnectionException e
return DatabaseException.Code.CLOSED_CONNECTION.get(
errVars(schemaObject, map -> map.put(TemplateVars.ERROR_MESSAGE, exception.getMessage())));
}

@Override
public RuntimeException handle(SchemaT schemaObject, DriverTimeoutException exception) {
// TODO(Hazel): Aaron said "DRIVER" is a bad code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that "DRIVER" would be too vague, but "DRIVER_TIMEOUT" might be misleading -- maybe something like "DRIVER_UKNOWN_FAILURE" or something? This is assuming that what we map here is something nothing else recognizes, and that should eventually be mapped to something specific.

return DatabaseException.Code.DRIVER_TIMEOUT.get(
errVars(schemaObject, map -> map.put(TemplateVars.ERROR_MESSAGE, exception.getMessage())));
}

@Override
public RuntimeException handle(SchemaT schemaObject, AllNodesFailedException exception) {
Map<Node, List<Throwable>> allErrors = exception.getAllErrors();
if (!allErrors.isEmpty()) {
List<Throwable> errors = allErrors.values().iterator().next();
if (errors != null && !errors.isEmpty()) {
Throwable error =
errors.stream()
.findAny()
.filter(
t ->
t instanceof AuthenticationException
|| t instanceof IllegalArgumentException
|| t instanceof NoNodeAvailableException
|| t instanceof DriverTimeoutException)
.orElse(null);

if (error == null) {
return exception;
}

return switch (error) {
case AuthenticationException e ->
// connect to OSS Cassandra throws AuthenticationException for invalid credentials
// TODO(Hazel): AuthException and INVALID_TOKEN?
AuthException.Code.INVALID_TOKEN.get(errVars(e));
case IllegalArgumentException e -> {
// AstraDB throws IllegalArgumentException for invalid token/credentials
if (e.getMessage().contains("AUTHENTICATION ERROR")
|| e.getMessage()
.contains("Provided username token and/or password are incorrect")) {
// TODO(Hazel): AuthException and INVALID_TOKEN?
yield AuthException.Code.INVALID_TOKEN.get(errVars(e));
} else {
yield exception;
}
}
case NoNodeAvailableException e -> handle(schemaObject, e);
case DriverTimeoutException e ->
// [data-api#1205] Need to map DriverTimeoutException as well
handle(schemaObject, e);
default -> exception;
};
}
}
return exception;
}

@Override
public RuntimeException handle(SchemaT schemaObject, NoNodeAvailableException exception) {
// TODO(Hazel): Aaron said NO_NODE_AVAILABLE is a bad code.
return DatabaseException.Code.NO_NODE_AVAILABLE.get(
errVars(schemaObject, map -> map.put(TemplateVars.ERROR_MESSAGE, exception.getMessage())));
}

@Override
public RuntimeException handle(SchemaT schemaObject, QueryValidationException exception) {
String message = exception.getMessage();
if (message.contains(
"If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING")
|| message.contains("ANN ordering by vector requires the column to be indexed")) {
// TODO(Hazel): Original code is NO_INDEX_ERROR, I think we need to change but am not sure
// what to change
return exception;
}
if (message.contains("vector<float,")) {
// It is tricky to find the actual vector dimension from the message, include as-is
// TODO(Hazel): the code VECTOR_SIZE_MISMATCH was added recently, should we keep using it?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'd be good because it is very common failure, from ops/metrics -- we do not want to lose this information wrt metrics (leads to unnecessary investigation for common failures).

return exception;
}
// Reuse the default method in the interface
return DriverExceptionHandler.super.handle(schemaObject, exception);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems somehow like abuse of default interface methods, having to specify interface of which default impl to use (but I don't know of a better way without using base class instead of interface).

}

@Override
public RuntimeException handle(SchemaT schemaObject, UnauthorizedException exception) {
return AuthException.Code.INVALID_TOKEN.get(errVars(exception));
}

@Override
public RuntimeException handle(SchemaT schemaObject, ReadFailureException exception) {
return DatabaseException.Code.READ_FAILURE.get(
errVars(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR, but I really think it'd be cleaner to have errVars that takes schema object and Map for extra variables to add, instead of closure for adding variables.

schemaObject,
m -> {
m.put("blockFor", String.valueOf(exception.getBlockFor()));
m.put("received", String.valueOf(exception.getReceived()));
m.put("numFailures", String.valueOf(exception.getNumFailures()));
}));
}

@Override
public RuntimeException handle(SchemaT schemaObject, ReadTimeoutException exception) {
return DatabaseException.Code.READ_TIMEOUT.get(
errVars(
schemaObject,
m -> {
m.put("blockFor", String.valueOf(exception.getBlockFor()));
m.put("received", String.valueOf(exception.getReceived()));
}));
}

@Override
public RuntimeException handle(SchemaT schemaObject, WriteFailureException exception) {
return DatabaseException.Code.WRITE_FAILURE.get(
errVars(
schemaObject,
m -> {
m.put("blockFor", String.valueOf(exception.getBlockFor()));
m.put("received", String.valueOf(exception.getReceived()));
m.put("numFailures", String.valueOf(exception.getNumFailures()));
}));
}

@Override
public RuntimeException handle(SchemaT schemaObject, WriteTimeoutException exception) {
return DatabaseException.Code.WRITE_TIMEOUT.get(
errVars(
schemaObject,
m -> {
m.put("blockFor", String.valueOf(exception.getBlockFor()));
m.put("received", String.valueOf(exception.getReceived()));
}));
}
}
84 changes: 84 additions & 0 deletions src/main/resources/errors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,16 @@ request-errors:

${SNIPPET.CONTACT_SUPPORT}

# Authentication request errors
- scope: AUTHENTICATION
code: INVALID_AUTHENTICATION_TOKEN
http-status-override: 401
title: Invalid authentication token
body: |-
The authentication token provided is invalid.

${SNIPPET.CONTACT_SUPPORT}

# DOCUMENT request errors
- scope: DOCUMENT
code: MISSING_PRIMARY_KEY_COLUMNS
Expand Down Expand Up @@ -142,6 +152,7 @@ server-errors:
# DATABASE scope server errors
- scope: DATABASE
code: CLOSED_CONNECTION
http-status-override: 502
title: Database connection was closed while processing the request
body: |-
The Data API connection to the database was closed by the database while processing the request.
Expand All @@ -152,7 +163,80 @@ server-errors:

${SNIPPET.RETRY}
- scope: DATABASE
code: DRIVER_TIMEOUT
http-status-override: 504
title: Database connection was timed out while processing the request
body: |-
The Data API connection to the database was timed out by the database while processing the request.

Writing to the ${schemaType} ${keyspace}.${table} failed to complete successfully. If this request modified data the changes may have been written to by some replicas, but not all. Future read requests may return eventually consistent results.

The detailed response from the database was: ${errorMessage}

${SNIPPET.RETRY}
- scope: DATABASE
code: NO_NODE_AVAILABLE
http-status-override: 502
title: Database connection was timed out while processing the request
body: |-
No node was available to execute the query.

Writing to the ${schemaType} ${keyspace}.${table} failed to complete successfully.

The detailed response from the database was: ${errorMessage}

${SNIPPET.RETRY}
- scope: DATABASE
code: READ_FAILURE
http-status-override: 502
title: Database read failed
body: |-
The Data API failed to read from the database.

Reading from the ${schemaType} ${keyspace}.${table} failed to complete successfully.

The request was was waiting for ${blockFor} replicas to acknowledge the read, but only ${received} replicas responded, ${numFailures} failed.

${SNIPPET.RETRY}
- scope: DATABASE
code: READ_TIMEOUT
http-status-override: 504
title: Database read timed out
body: |-
The Data API timed out while reading from the database.

Reading from the ${schemaType} ${keyspace}.${table} failed to complete successfully.
#TODO(Hazel): ReadTimeoutException has multiple ways to format the error message. What message should we use?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to have comment within text block? (it will be included as text, not comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove it later - just want to leave the comments at the nearest place

The request was was waiting for ${blockFor} replicas to acknowledge the read, but only ${received} replicas responded within the timeout period.

${SNIPPET.RETRY}
- scope: DATABASE
code: WRITE_FAILURE
http-status-override: 502
title: Database write failed
body: |-
The Data API failed to write from the database.

Writing from the ${schemaType} ${keyspace}.${table} failed to complete successfully.

The request was was waiting for ${blockFor} replicas to acknowledge the write, but only ${received} replicas responded, ${numFailures} failed.

${SNIPPET.RETRY}
- scope: DATABASE
code: WRITE_TIMEOUT
http-status-override: 504
title: Database write timed out
body: |-
The Data API timed out while writing from the database.

Writing from the ${schemaType} ${keyspace}.${table} failed to complete successfully.

The request was was waiting for ${blockFor} replicas to acknowledge the write, but only ${received} replicas responded within the timeout period.

${SNIPPET.RETRY}
- scope: DATABASE
code: TABLE_WRITE_TIMEOUT
http-status-override: 504
title: Timeout writing to table
body: |-
The Data API timed out while writing to the table.
Expand Down
Loading