Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSQ profile for Brokers and Historicals. #17140

Merged
merged 30 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a27e239
MSQ profile for Brokers and Historicals.
gianm Sep 24, 2024
68c61c2
Style. Tests for LimitedOutputStream.
gianm Sep 24, 2024
f110ad0
Additional tests, consolidation, adjustments.
gianm Sep 25, 2024
df9e8bc
Tests for OutboxImpl.
gianm Sep 25, 2024
f65075c
Additional tests.
gianm Sep 25, 2024
5230880
Fix style, constants.
gianm Sep 25, 2024
b62a149
Style fixes.
gianm Sep 25, 2024
680f741
Additional tests and adjustments.
gianm Sep 26, 2024
b2958c1
Additional javadoc.
gianm Sep 26, 2024
79f5f94
Additional javadoc.
gianm Sep 26, 2024
642f140
Fix missing format arg.
gianm Sep 26, 2024
f5fa4c0
Add coverage.
gianm Sep 26, 2024
35b142f
Additional tests and robustness.
gianm Sep 26, 2024
6d0ff3e
Add ControllerHolder state. Do proper DruidExceptions.
gianm Sep 27, 2024
42e3ff6
DartWorkerRunnerTest.
gianm Sep 28, 2024
796351f
Improved behavior.
gianm Sep 28, 2024
1d2e13b
Merge branch 'master' into msq-dart
gianm Sep 28, 2024
d7c55af
Tests for runtime errors.
gianm Sep 28, 2024
518c21c
Static import assertThat.
gianm Sep 28, 2024
f10be74
Additional tests.
gianm Sep 30, 2024
9177f68
Review comments.
gianm Sep 30, 2024
2c082c9
controllerServerId -> controllerServerHost
gianm Sep 30, 2024
cc28689
Controller -> worker:
gianm Oct 1, 2024
85f4f26
Fix issue with results array not getting closed if an error happened …
gianm Oct 1, 2024
9307407
Merge branch 'master' into msq-dart
gianm Oct 1, 2024
e77610e
Resolve taskLockType() related conflicts
gianm Oct 1, 2024
4157913
NL
gianm Oct 1, 2024
ccfd523
Style.
gianm Oct 1, 2024
92b2d21
Style.
gianm Oct 1, 2024
d123902
Cut loop in tests.
gianm Oct 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.msq.dart;

import com.google.inject.BindingAnnotation;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
* Binding annotation for implements of interfaces that are Dart (MSQ-on-Broker-and-Historicals) focused.
*/
@Target({ElementType.FIELD, ElementType.PARAMETER, ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@BindingAnnotation
public @interface Dart
{
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.msq.dart;

import com.google.common.collect.ImmutableList;
import org.apache.druid.msq.dart.controller.http.DartSqlResource;
import org.apache.druid.msq.dart.worker.http.DartWorkerResource;
import org.apache.druid.msq.rpc.ResourcePermissionMapper;
import org.apache.druid.msq.rpc.WorkerResource;
import org.apache.druid.server.security.Action;
import org.apache.druid.server.security.Resource;
import org.apache.druid.server.security.ResourceAction;

import java.util.List;

public class DartResourcePermissionMapper implements ResourcePermissionMapper
{
/**
* Permissions for admin APIs in {@link DartWorkerResource} and {@link WorkerResource}. Note that queries from
* end users go through {@link DartSqlResource}, which wouldn't use these mappings.
*/
@Override
public List<ResourceAction> getAdminPermissions()
{
return ImmutableList.of(
new ResourceAction(Resource.STATE_RESOURCE, Action.READ),
new ResourceAction(Resource.STATE_RESOURCE, Action.WRITE)
);
}

/**
* Permissions for per-query APIs in {@link DartWorkerResource} and {@link WorkerResource}. Note that queries from
* end users go through {@link DartSqlResource}, which wouldn't use these mappings.
*/
@Override
public List<ResourceAction> getQueryPermissions(String queryId)
{
return getAdminPermissions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so only an admin user can run dart queries right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for the "internal" APIs like the controller->worker APIs. End users would go through the DartSqlResource, which doesn't use this class. I will add comments to clarify.

Copy link
Contributor Author

@gianm gianm Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be very clear, the permission model for Dart queries is the same as regular SQL queries. If Dart is enabled then any regular user can issue queries against the tables that they have permissions for.

}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.msq.dart.controller;

import com.google.common.base.Preconditions;
import org.apache.druid.msq.dart.worker.DartWorkerClient;
import org.apache.druid.msq.dart.worker.WorkerId;
import org.apache.druid.msq.exec.Controller;
import org.apache.druid.msq.exec.ControllerContext;
import org.apache.druid.msq.exec.QueryListener;
import org.apache.druid.msq.indexing.error.MSQErrorReport;
import org.apache.druid.msq.indexing.error.WorkerFailedFault;
import org.apache.druid.server.security.AuthenticationResult;
import org.joda.time.DateTime;

import java.util.concurrent.atomic.AtomicReference;

/**
* Holder for {@link Controller}, stored in {@link DartControllerRegistry}.
*/
public class ControllerHolder
{
public enum State
{
/**
* Query has been accepted, but not yet {@link Controller#run(QueryListener)}.
*/
ACCEPTED,

/**
* Query has had {@link Controller#run(QueryListener)} called.
*/
RUNNING,

/**
* Query has been canceled.
*/
CANCELED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a logic state called finished ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controllers are removed as soon as they finish, so a finished state isn't necessary.

}

private final Controller controller;
private final ControllerContext controllerContext;
private final String sqlQueryId;
private final String sql;
private final AuthenticationResult authenticationResult;
private final DateTime startTime;
private final AtomicReference<State> state = new AtomicReference<>(State.ACCEPTED);

public ControllerHolder(
final Controller controller,
final ControllerContext controllerContext,
final String sqlQueryId,
final String sql,
final AuthenticationResult authenticationResult,
final DateTime startTime
)
{
this.controller = Preconditions.checkNotNull(controller, "controller");
this.controllerContext = controllerContext;
this.sqlQueryId = Preconditions.checkNotNull(sqlQueryId, "sqlQueryId");
this.sql = sql;
this.authenticationResult = authenticationResult;
this.startTime = Preconditions.checkNotNull(startTime, "startTime");
}

public Controller getController()
{
return controller;
}

public String getSqlQueryId()
{
return sqlQueryId;
}

public String getSql()
{
return sql;
}

public AuthenticationResult getAuthenticationResult()
{
return authenticationResult;
}

public DateTime getStartTime()
{
return startTime;
}

public State getState()
{
return state.get();
}

/**
* Call when a worker has gone offline. Closes its client and sends a {@link Controller#workerError}
* to the controller.
*/
public void workerOffline(final WorkerId workerId)
{
final String workerIdString = workerId.toString();

if (controllerContext instanceof DartControllerContext) {
// For DartControllerContext, newWorkerClient() returns the same instance every time.
// This will always be DartControllerContext in production; the instanceof check is here because certain
// tests use a different context class.
((DartWorkerClient) controllerContext.newWorkerClient()).closeClient(workerId.getHostAndPort());
}

if (controller.hasWorker(workerIdString)) {
controller.workerError(
MSQErrorReport.fromFault(
workerIdString,
workerId.getHostAndPort(),
null,
new WorkerFailedFault(workerIdString, "Worker went offline")
)
);
}
}

/**
* Places this holder into {@link State#CANCELED}. Calls {@link Controller#stop()} if it was previously in
* state {@link State#RUNNING}.
*/
public void cancel()
{
if (state.getAndSet(State.CANCELED) == State.RUNNING) {
controller.stop();
}
}

/**
* Calls {@link Controller#run(QueryListener)}, and returns true, if this holder was previously in state
* {@link State#ACCEPTED}. Otherwise returns false.
*
* @return whether {@link Controller#run(QueryListener)} was called.
*/
public boolean run(final QueryListener listener) throws Exception
{
if (state.compareAndSet(State.ACCEPTED, State.RUNNING)) {
controller.run(listener);
return true;
} else {
return false;
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.msq.dart.controller;

import com.google.inject.Inject;
import org.apache.druid.messages.client.MessageListener;
import org.apache.druid.msq.dart.controller.messages.ControllerMessage;
import org.apache.druid.msq.dart.worker.WorkerId;
import org.apache.druid.msq.exec.Controller;
import org.apache.druid.msq.indexing.error.MSQErrorReport;
import org.apache.druid.server.DruidNode;

/**
* Listener for worker-to-controller messages.
* Also responsible for calling {@link Controller#workerError(MSQErrorReport)} when a worker server goes away.
*/
public class ControllerMessageListener implements MessageListener<ControllerMessage>
{
private final DartControllerRegistry controllerRegistry;

@Inject
public ControllerMessageListener(final DartControllerRegistry controllerRegistry)
{
this.controllerRegistry = controllerRegistry;
}

@Override
public void messageReceived(ControllerMessage message)
{
final ControllerHolder holder = controllerRegistry.get(message.getQueryId());
if (holder != null) {
message.handle(holder.getController());
}
}

@Override
public void serverAdded(DruidNode node)
{
// Nothing to do.
}

@Override
public void serverRemoved(DruidNode node)
{
for (final ControllerHolder holder : controllerRegistry.getAllHolders()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be api's on the worker, to cancel all work. Lets say a broker switch happens and we donot have the controller for what ever reason, In that case, all dart work on the historical should stop no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That happens in the DartWorkerRunner#BrokerListener, which is code on the worker side. It cancels all work associated with a given Broker when that Broker goes offline.

final Controller controller = holder.getController();
final WorkerId workerId = WorkerId.fromDruidNode(node, controller.queryId());
holder.workerOffline(workerId);
}
}
}
Loading
Loading