Microsoft FrameworkLauncher User Manual

Index

Concepts
Quick Start
Architecture
Pipeline
Configuration
RestAPI
DataModel and FeatureUsage
EnvironmentVariables
HDFS Published Informations
ExitStatus Convention
RetryPolicy
ApplicationCompletionPolicy
Framework ACL
Best Practices

Concepts

Basic

Different TaskRoles compose a Framework
Same Tasks compose a TaskRole
A User Service is executed by all Tasks in its corresponding TaskRole

YARN Related

A YARN Application is an execution attempt of a Framework
A YARN Container is an execution attempt of a Task

Quick Start

Prepare Framework

Upload Framework Executable to HDFS

Upload the Example Framework Executable to HDFS:

 hadoop fs -mkdir -p /ExampleFramework/
 hadoop fs -put -f ExampleFramework.sh /ExampleFramework/

Write Framework Description File

Just use the Example Framework Description File.

 Example Framework Description Explanation:
 • The Example Framework with Version 1 contains 1 TaskRole named LRSMaster.
 • LRSMaster contains 2 Tasks and they will be executed for LRSMaster's TaskService.
 • LRSMaster's TaskService with Version 1 is defined by its EntryPoint, SourceLocations and Resource.
 • The EntryPoint and SourceLocations defines the Service's corresponding Executable which needs to be ran inside Containers. 
 • The Resource defines the Container Resource Guarantee / Limitation.

Launch Framework

Launcher Service need to be started before Launch Framework.

See README to Start Launcher Service

See Root URI to get {LauncherAddress}

HTTP PUT the Framework Description File as json to:
```
 http://{LauncherAddress}/v1/Frameworks/ExampleFramework
```
For example, with curl, you can execute below cmd line:
```
 curl -X PUT http://{LauncherAddress}/v1/Frameworks/ExampleFramework -d @ExampleFramework.json --header "Content-Type: application/json"
```

Monitor Framework

Check all the Requested Frameworks by:

 http://{LauncherAddress}/v1/Frameworks

Check ExampleFramework by:

 http://{LauncherAddress}/v1/Frameworks/ExampleFramework

Architecture

LauncherInterfaces:

RestAPI
Submit Framework Description

LauncherService:

One Central Service
Manages all Frameworks for the whole Cluster.

LauncherAM:

Per-Framework Service
Manage Tasks for a single Framework by customized feature requirement

Configuration

Launcher Service can be configured by LauncherConfiguration. You can check the Type, Specification and FeatureUsage inside it.

And we also provide a default configuration for you to refer: Default LauncherConfiguration File.

RestAPI

Guarantees

All APIs are IDEMPOTENT and STATELESS, to allowed trivial Work Preserving Client Restart. In other words, User do not need to worry about call one API multiple times by different Client instance (such as Client Restart, etc).
All APIs are DISTRIBUTED THREAD SAFE, to allow multiple distributed Client instances to access. In other words, User do not need to worry about call them at the same time in Multiple Threads/Processes/Nodes.

Best Practices

LauncherService can only handle a finite, limited request volume. User should try to minimize its overall request frequency and payload, so that the LauncherService is not overloaded. To achieve this, User can centralize requests, space out requests, filter respond and so on. Moreover, to get informations in a more scalable way than RestAPI, see HDFS Published Informations.
Completed Frameworks will ONLY be retained in recent FrameworkCompletedRetainSec, in case Client miss to delete the Framework after FrameworkCompleted. One exclusion is the Framework Launched by DataDeployment, it will be retained until the corresponding FrameworkDescriptionFile deleted in the DataDeployment. To avoid missing the CompletedFrameworkStatus, the polling interval seconds of Client should be less than FrameworkCompletedRetainSec. Check the frameworkCompletedRetainSec by GET LauncherStatus.

Root URI

Configure it as webServerAddress inside LauncherConfiguration File.

Types

Refer DataModel and FeatureUsage for the Type of HTTP Request and Response.

Common Request Headers

Headers	Description
UserName	Specifies which User send the Request. It is effective iff webServerAclEnable is true, see Framework ACL.

API Details

PUT Framework

Request

PUT /v1/Frameworks/{FrameworkName}

Type: application/json

Body: FrameworkDescriptor

Description

Add a NOT Requested Framework or Update a Requested Framework.

Add a NOT Requested Framework: Framework will be Added and Launched (Now it is Requested).
Update a Requested Framework:
1. If FrameworkVersion unchanged:
  1. Framework will be Updated to the FrameworkDescription on the fly (i.e. Work Preserving).
  2. To Update Framework on the fly, it is better to use the corresponding PartialUpdate (such as PUT TaskNumber) than PUT the entire FrameworkDescription here. Because, partially update the FrameworkDescription can avoid the Race Condition (or Transaction Conflict) between two PUT Requests. Besides, the behaviour is undefined when change parameters in FrameworkDescription which is not supported by PartialUpdate.
2. Else:
  1. Framework will be NonRolling Upgraded to new FrameworkVersion. (i.e. Not Work Preserving).
  2. NonRolling Upgrade can be used to change parameters in FrameworkDescription which is not supported by PartialUpdate (such as Framework Queue).
  3. NonRolling Upgrade should be triggered by change FrameworkVersion, instead of DELETE then PUT with the same FrameworkVersion.
User is responsible and free to specify the FrameworkName of the Framework, however, the FrameworkName should respect the Framework ACL.
After Accepted Response, its corresponding Status (such as FrameworkStatus and AggregatedFrameworkStatus) exists immediately, too. However, the Status may not be updated according to the Request (FrameworkDescriptor) immediately. So, to check whether it has been updated, Client still needs to poll the GET Status APIs.

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	The Request has been recorded for backend to process, not that the processing of the Request has been completed.
BadRequest(400)	ExceptionMessage	The Request validation failed. So, Client is expected to not retry for this non-transient failure and then correct the Request.
Forbidden(403)	ExceptionMessage	The Request authorization failed. So, Client is expected to not retry for this non-transient failure and then correct the Request or ask Administrator to grant the Request privilege. This Response may happen only if webServerAclEnable is true, see Framework ACL.
TooManyRequests(429)	ExceptionMessage	The Request is rejected due to the New Total TaskNumber will exceed the Max Total TaskNumber if backend accepted it. So, the Client is expected to retry for this transient failure or migrate the whole Framework to another Cluster.
ServiceUnavailable(503)	ExceptionMessage	The Request cannot be recorded for backend to process. In our system, this only happens when target Cluster's Zookeeper is down for a long time. So, the Client is expected to retry for this transient failure or migrate the whole Framework to another Cluster.

DELETE Framework

Request

DELETE /v1/Frameworks/{FrameworkName}

Description

Delete a Framework, no matter it is Requested or not.

Notes:

Framework will be Stopped and Deleted (Now it is NOT Requested).
After Accepted Response, its corresponding Status does not exist immediately, too.
Only recently completed Frameworks will be kept, if Client miss to DELETE the Framework after FrameworkCompleted. One exclusion is the Framework Launched by DataDeployment, it will be kept until the corresponding FrameworkDescriptionFile deleted in the DataDeployment.

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	Same as PUT Framework
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET FrameworkStatus

Request

GET /v1/Frameworks/{FrameworkName}/FrameworkStatus

Description

Get the FrameworkStatus of a Requested Framework

Recipes:

User Level RetryPolicy (Based on FrameworkState, ApplicationExitCode, ApplicationExitDiagnostics, ApplicationExitType)
Directly Monitor Underlay YARN Application by YARN CLI or RestAPI (Based on ApplicationId or ApplicationTrackingUrl)

Response

HttpStatusCode	Body	Description
OK(200)	FrameworkStatus
NotFound(404)	ExceptionMessage	Specified Framework has not been Requested yet. So, Client is expected to not retry for this non-transient failure and then PUT the corresponding Framework first.
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

PUT TaskNumber

Request

PUT /v1/Frameworks/{FrameworkName}/TaskRoles/{TaskRoleName}/TaskNumber

Type: application/json

Body: UpdateTaskNumberRequest

Description

Update TaskNumber for a Requested Framework

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	Same as PUT Framework
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
TooManyRequests(429)	ExceptionMessage	Same as PUT Framework
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

PUT ExecutionType

Request

PUT /v1/Frameworks/{FrameworkName}/ExecutionType

Type: application/json

Body: UpdateExecutionTypeRequest

Description

Update ExecutionType for a Requested Framework

Notes:

It is ignored to change Framework ExecutionType from STOP to START, if previous STOP has already caused the Framework of current FrameworkVersion entered FINAL_STATES, i.e. FRAMEWORK_COMPLETED. So, to ensure starting the Framework again after STOP, just change the FrameworkVersion instead.

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	Same as PUT Framework
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

PUT MigrateTask

Request

PUT /v1/Frameworks/{FrameworkName}/MigrateTasks/{ContainerId}

Type: application/json

Body: MigrateTaskRequest

Description

Migrate a Task from current Container to another Container for a Requested Framework And new Container and old Container will satisfy the AntiAffinityLevel constraint.

Notes:

User is responsible for implement Health/Perf Measurement of the Service based on Monitoring TaskStatuses or self-contained communication. And if found some Health/Perf degradations, User can migrate it by calling this API with corresponding ContainerId as parameter.
Currently, only support Any AntiAffinityLevel.

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	Same as PUT Framework
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

PUT ApplicationProgress

Request

PUT /v1/Frameworks/{FrameworkName}/ApplicationProgress

Type: application/json

Body: OverrideApplicationProgressRequest

Description

Update ApplicationProgress for a Requested Framework

Notes:

If User does not call this API. Default ApplicationProgress is used, and it is calculated as CompletedTaskCount / TotalTaskCount.
User is responsible for implement Progress Measurement of the Service based on Monitoring Task logs or self-contained communication. And then feedback the Progress by calling this API to Override the default ApplicationProgress.

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	Same as PUT Framework
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET Framework

Request

GET /v1/Frameworks/{FrameworkName}

Description

Get the FrameworkInfo of a Requested Framework

FrameworkInfo = SummarizedFrameworkInfo + AggregatedFrameworkRequest + AggregatedFrameworkStatus

Response

HttpStatusCode	Body	Description
OK(200)	FrameworkInfo
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET Frameworks

Request

GET /v1/Frameworks

QueryParameter	Description
UserName	Filter the result to only return Frameworks whose UserName equals the given value.

Description

Get the SummarizedFrameworkInfos of all Requested Frameworks

A Framework's SummarizedFrameworkInfo consists selected fields from its Status and Request

Response

HttpStatusCode	Body	Description
OK(200)	SummarizedFrameworkInfos
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET AggregatedFrameworkStatus

Request

GET /v1/Frameworks/{FrameworkName}/AggregatedFrameworkStatus

Description

Get the AggregatedFrameworkStatus of a Requested Framework

AggregatedFrameworkStatus = FrameworkStatus + all TaskRoles' (TaskRoleStatus + TaskStatuses)

TaskStatuses Recipes:

ServiceDiscovery (Based on TaskRoleName, ContainerHostName, ContainerIPAddress, ServiceId)
TaskLogForwarding (Based on ContainerLogHttpAddress)
MasterSlave and MigrateTask (Based on ContainerId)
DataPartition (Based on TaskIndex) (Note TaskIndex will not change after Task Restart, Migrated or Upgraded)

Response

HttpStatusCode	Body	Description
OK(200)	AggregatedFrameworkStatus
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET FrameworkRequest

Request

GET /v1/Frameworks/{FrameworkName}/FrameworkRequest

Description

Get the FrameworkRequest of a Requested Framework

Current FrameworkDescriptor for the Framework is included in FrameworkRequest and it can reflect latest updates.

Response

HttpStatusCode	Body	Description
OK(200)	FrameworkRequest
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET AggregatedFrameworkRequest

Request

GET /v1/Frameworks/{FrameworkName}/AggregatedFrameworkRequest

Description

Get the AggregatedFrameworkRequest of a Requested Framework

AggregatedFrameworkRequest = FrameworkRequest + all other feedback Request

Response

HttpStatusCode	Body	Description
OK(200)	AggregatedFrameworkRequest
NotFound(404)	ExceptionMessage	Same as GET FrameworkStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET LauncherRequest

Request

GET /v1/LauncherRequest

Description

Get the LauncherRequest

Response

HttpStatusCode	Body	Description
OK(200)	LauncherRequest
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET LauncherStatus

Request

GET /v1/LauncherStatus

Description

Get the LauncherStatus

Current LauncherConfiguration is included in LauncherStatus and it can reflect latest updates.

Response

HttpStatusCode	Body	Description
OK(200)	LauncherStatus
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

PUT ClusterConfiguration

Request

PUT /v1/LauncherRequest/ClusterConfiguration

Type: application/json

Body: ClusterConfiguration

Description

Update the ClusterConfiguration for all Frameworks on the fly

Besides the cluster information provided by YARN, Administrator can use this API to provide external information about current cluster configuration, which helps Launcher to schedule Task based on that. And below features depend on it:

taskGpuType

Response

HttpStatusCode	Body	Description
Accepted(202)	NULL	Same as PUT Framework
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET ClusterConfiguration

Request

GET /v1/LauncherRequest/ClusterConfiguration

Description

Get the ClusterConfiguration

Response

HttpStatusCode	Body	Description
OK(200)	ClusterConfiguration
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

PUT AclConfiguration

Request

PUT /v1/LauncherRequest/AclConfiguration

Type: application/json

Body: AclConfiguration

Description

Update the AclConfiguration

It takes effects immediately after the Response iff webServerAclEnable is true, see Framework ACL.

Response

HttpStatusCode	Body	Description
OK(200)	NULL
BadRequest(400)	ExceptionMessage	Same as PUT Framework
Forbidden(403)	ExceptionMessage	Same as PUT Framework
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

GET AclConfiguration

Request

GET /v1/LauncherRequest/AclConfiguration

Description

Get the AclConfiguration

Response

HttpStatusCode	Body	Description
OK(200)	AclConfiguration
ServiceUnavailable(503)	ExceptionMessage	Same as PUT Framework

DataModel and FeatureUsage

You can check the Type, Specification and FeatureUsage inside Launcher Data Model.

For example:

A Framework is Defined and Requested by FrameworkDescriptor data structure. To find the FeatureUsage inside FrameworkDescriptor, you can refer the comments inside FrameworkDescriptor.

EnvironmentVariables

Launcher sets up below EnvironmentVariables for each User Service to use:

Used to locate itself during the whole Framework life cycle. So, they will not be changed after Migration or Restart.

EnvironmentVariable	Description
LAUNCHER_ADDRESS
FRAMEWORK_NAME
FRAMEWORK_VERSION
TASKROLE_NAME
TASK_INDEX
SERVICE_NAME
SERVICE_VERSION

Used to locate itself during a specific execution attempt. So, they will be changed to a different one after Migration or Restart.

EnvironmentVariable	Description
APP_ID	Framework's current associated Application ID.
CONTAINER_ID	Task's current associated Container ID.
CONTAINER_IP	Task's current associated Container IP address.

Used to access the Launcher managed public HDFS store.

EnvironmentVariable	Description
HDFS_USER_STORE_ROOT_DIR	For User to store data under the directory which can be GC automatically after the Framework Deleted.
HDFS_FRAMEWORK_INFO_FILE	See HDFS Published Informations: FrameworkInfo File.

Used to get the assigned Resource, only valid when corresponding feature is enabled.

EnvironmentVariable	Description
CONTAINER_GPUS	Only valid when gpuNumber is greater than 0. It is a number, each bit of this number represents a Gpu, for example, 3 represents gpu0 and gpu1.
CONTAINER_PORTS	Only valid when portDefinitions is not empty. It is a string, a format example is: "portLabel1:port1,port2;portLabel2:port3,port4;".

HDFS Published Informations

Launcher publishes informations to public HDFS files for User to fetch in a more scalable way than RestAPI.

Notes:

HDFS Published Informations are subsets of RestAPI.
HDFS Published Informations are mainly used to improve the scalability to fetch informations. For example, fetch from a per Framework HDFS FrameworkInfo File is more scalable than fetch from a global single instance LauncherService by RestAPI.
HDFS Published Informations are not so stable and up-to-date as RestAPI, so, Client is suggested to first fetch the HDFS Published Informations, if failed after several retries, then fallback to RestAPI.

LauncherHDFSRootDir

Check the hdfsRootDir by GET LauncherStatus.

File Details

FrameworkInfo File

HDFS Path

/{LauncherHDFSRootDir}/{FrameworkName}/FrameworkInfo.json

And it is also set by below EnvironmentVariable, see EnvironmentVariables:

{HDFS_FRAMEWORK_INFO_FILE}

Description

FrameworkInfo File contains the FrameworkInfo of a Requested Framework in Json format as GET Framework.

Notes:

The FrameworkInfo File may not be up-to-date if the Framework is not in APPLICATION_RUNNING state. So, Client is suggested to only fetch this file inside the Containers of the Framework.

ExitStatus Convention

You can check all the defined ExitStatus by: ExitType, ExitDiagnostics.

Recipes:

Your LauncherClient can depend on the ExitStatus Convention
If your Service failed, the Service can optionally return the ExitCode of USER_APP_TRANSIENT_ERROR and USER_APP_NON_TRANSIENT_ERROR to help FancyRetryPolicy to identify your Service's TRANSIENT_NORMAL and NON_TRANSIENT ExitType. If neither ExitCode is returned, the Service is considered to exit due to UNKNOWN ExitType.

RetryPolicy

Overview

RetryPolicy can be configured for the whole Framework and each TaskRole to control:

Framework RetryPolicy:
The conditions to retry the whole Framework after the Framework's current associated Application completed.
It can also be considered as Framework CompletionPolicy, i.e. the conditions to complete the whole Framework.
Task RetryPolicy:
The conditions to retry a single Task in the TaskRole after the Task's current associated Container completed.
It can also be considered as Task CompletionPolicy, i.e. the conditions to complete a single Task in the TaskRole.

Usage

For details, please check: RetryPolicyDescriptor, RetryPolicyState.

Examples

Notes:

Italic Conditions can be inherited from the DEFAULT RetryPolicy, so no need to specify them explicitly.
For the definition of each ExitType, such as transient failure, see ExitStatus Convention.

FrameworkType	Framework RetryPolicy	TaskRole	Task RetryPolicy	Description
DEFAULT	FancyRetryPolicy = false MaxRetryCount = 0	TaskRole1	FancyRetryPolicy = false MaxRetryCount = 0	The default RetryPolicy: Never Retry for any failure or success.
DEFAULT	FancyRetryPolicy = false MaxRetryCount = 0	TaskRole2	FancyRetryPolicy = false MaxRetryCount = 0
Service	FancyRetryPolicy = false MaxRetryCount = -2	TaskRole1	FancyRetryPolicy = false MaxRetryCount = -2	Always Retry for any failure or success.
Blind Batch Job	FancyRetryPolicy = false MaxRetryCount = -1	TaskRole1	FancyRetryPolicy = false MaxRetryCount = -1	Always Retry for any failure. Never Retry for success.
Batch Job with Task Fault Tolerance	FancyRetryPolicy = true MaxRetryCount = 3	TaskRole1	FancyRetryPolicy = true MaxRetryCount = 3	Always Retry for transient failure. Never Retry for non-transient failure or success. Retry up to 3 times for unknown failure.
Batch Job without Task Fault Tolerance	FancyRetryPolicy = true MaxRetryCount = 3	TaskRole1	FancyRetryPolicy = false MaxRetryCount = 0	For Framework RetryPolicy, same as "Batch Job with Task Fault Tolerance". For Task RetryPolicy, because the Task cannot tolerate any failed Container, such as it cannot recover from previous failed Container, so Never Retry Task for any failure or success.
Debug Mode	FancyRetryPolicy = true MaxRetryCount = 0	TaskRole1	FancyRetryPolicy = true MaxRetryCount = 0	Always Retry for transient failure. Never Retry for non-transient failure or unknown failure or success. This can help to capture the unexpected exit of User Service itself.

ApplicationCompletionPolicy

Overview

ApplicationCompletionPolicy can be configured for each TaskRole to control:

The conditions to complete the Application.
The ExitStatus of the completed Application.

Usage

For details, please check: TaskRoleApplicationCompletionPolicyDescriptor.

Examples

Notes:

Italic Conditions can be inherited from the DEFAULT ApplicationCompletionPolicy, so no need to specify them explicitly.

FrameworkType	TaskRole	ApplicationCompletionPolicy	Description
DEFAULT	TaskRole1	MinFailedTaskCount = 1 MinSucceededTaskCount = null	The default ApplicationCompletionPolicy: Fail the Application immediately if any Task failed. Succeed the Application until all Tasks succeeded.
DEFAULT	TaskRole2	MinFailedTaskCount = 1 MinSucceededTaskCount = null
Service	TaskRole1	MinFailedTaskCount = 1 MinSucceededTaskCount = null	Actually, any ApplicationCompletionPolicy is fine, since Service's Task will never complete, i.e. its Task's MaxRetryCount is -2, see RetryPolicy Examples.
MapReduce	Map	MinFailedTaskCount = {Map.TaskNumber} * {mapreduce.map.failures.maxpercent} + 1 MinSucceededTaskCount = null	A few failed Tasks is acceptable, but always want to wait all Tasks to succeed: Fail the Application immediately if the failed Tasks exceeded the limit. Succeed the Application until all Tasks completed and the failed Tasks is within the limit.
MapReduce	Reduce	MinFailedTaskCount = {Reduce.TaskNumber} * {mapreduce.reduce.failures.maxpercent} + 1 MinSucceededTaskCount = null
TensorFlow	ParameterServer	MinFailedTaskCount = 1 MinSucceededTaskCount = null	Succeed a certain TaskRole is enough, and do not want to wait all Tasks to succeed: Fail the Application immediately if any Task failed. Succeed the Application immediately if Worker's all Tasks succeeded.
TensorFlow	Worker	MinFailedTaskCount = 1 MinSucceededTaskCount = {Worker.TaskNumber}
Arbitrator Dominated	Arbitrator	MinFailedTaskCount = 1 MinSucceededTaskCount = 1	The ApplicationCompletionPolicy is fully delegated to the Application's single instance arbitrator: Fail the Application immediately if the arbitrator failed. Succeed the Application immediately if the arbitrator succeeded.
	TaskRole1	MinFailedTaskCount = null MinSucceededTaskCount = null
	TaskRole2	MinFailedTaskCount = null MinSucceededTaskCount = null
First Completed Task Dominated	TaskRole1	MinFailedTaskCount = 1 MinSucceededTaskCount = 1	The ApplicationCompletionPolicy is fully delegated to the Application's first completed Task: Fail the Application immediately if any Task failed. Succeed the Application immediately if any Task succeeded.

Framework ACL

Overview

Framework ACL specifies which Users/Groups are able to access a specific FrameworkName, no matter the FrameworkName exists or not. So, essentially, it is the ACL for Users/Groups against FrameworkName's Namespace.

Framework ACL helps to:

Avoid one User/Group to occupy the FrameworkName (by Add Framework) reserved for other User/Group.
Avoid one User/Group to modify the Framework (by Update Framework) launched by other User/Group.

Assumption

No naming conflict among UserNames and GroupNames.
UserNames and GroupNames satisfy regex ^[A-Za-z0-9\-._]{1,254}$.

Usage

Framework ACL is enabled iff the webServerAclEnable is true, check it by GET LauncherStatus.

If it is disabled, any User/Group can Read and Write the whole namespace. We assume it is enabled for simplicity below.
Administrator can always Read and Write the whole namespace, this fact will be omitted for simplicity below.

Namespace Privilege:

Namespace Write Privilege: Add or Update Framework in the Namespace

Namespace Read Privilege: Get Framework in the Namespace

Namespace Mechanism:

{Namespace}~(AnyName)

It is pre-created, so no need to create the namespace in advance.
All Users/Groups can Read the namespace.
Initially, only the User named {Namespace} can Write the namespace. To grant the Write Privilege to more Users, see PUT AclConfiguration.
Based on the Assumption and Namespace Mechanism, the suggested Usage Pattern for Users/Groups can be derived below.

Best Practices: Usage Pattern

User Usage Pattern:

The private namespace for the User named {UserName} is:

{UserName}~(AnyName)

It is pre-created, so no need to create the namespace in advance.
Only this User can Write the namespace. However, all other Users/Groups can Read the namespace.

For example, User UA can fully control the namespace:

UA~(AnyName)

Group Usage Pattern:

The private namespace for the Group named {GroupName} is:

The shared namespace for the Users belongs to {GroupName} is:

{GroupName}~(AnyName)

It is pre-created, so no need to create the namespace in advance.
Initially, no User can Write the namespace. Administrator needs to add the UserNames belongs to the {GroupName} to the namespace {GroupName}, see PUT AclConfiguration. Only then, these Users can Write the namespace. However, all other Users/Groups can Read the namespace.

For example, Administrator adds User UA and UB belongs to Group GA to the namespace GA, and then UA and UB can work together to fully control the namespace:

GA~(AnyName)

Best Practices: Naming within Namespace

Launcher does not enforce Users/Groups how to further partition his private namespace or how to avoid naming conflict within his private namespace. Different Users/Groups might choose to use his private namespace differently, depending on their exact requirement, scenario and assumption.

So, here, we just provide the best practices for two scenarios:

Batch Framework:

Batch Framework User tends to Add a new Framework each time instead of Update the existing one.

So, he should ensure the name is not reused within his private namespace each time to call PUT Framework.

Service Framework:

Service Framework User tends to frequently Update the existing Framework instead of Add a new one. And he tends to specify a well-known name which he might want to expose to the Users of the Service itself.

So, he should ensure the name is reused within his small set well-known Services each time to call PUT Framework.

Anyway, If he really want to ensure to Add a new one, he needs to check his small set well-known Services, and then pick a new one.

Best Practices

The Initial Working Directory of your EntryPoint is the root directory of the EntryPoint. Your Service can read data anywhere, however it can ONLY write data under the Initial Working Directory with the Service Directory excluded. And if the Source is a ZIP file, it will be uncompressed before starting your Service. For example:
```
 EntryPoint=HbaseRS.zip/start.bat
 SourceLocations=hdfs:///HbaseRS.zip, hdfs:///HbaseCom <- HbaseRS.zip is a ZIP file
```
The two Sources HbaseRS.zip and HbaseCom will be downloaded (and uncompressed) to local node as below structure:
```
 ./   <- The Initial Working Directory
 ├─HbaseRS.zip <- Service Directory <- HbaseRS.zip is a directory uncompressed from original ZIP file
 └─HbaseCom <- Service Directory
```
Launcher may restart the exited Service on another node even if the original node is free and healthy. So, if you want to always restart the exited Service on the same node, you need to warp the original EntryPoint by another script, such as:
```
 while true; do
     # call the original EntryPoint
 done
```
Increase the replication number your data and binary on target HDFS (Higher ReplicationNumber means faster downloading, higher availability and higher durability).
```
 hadoop fs -setrep -w <ReplicationNumber> <HDFS Path>
```
Do not modify your data and binary on target HDFS. To use new data and binary, upload them to a different HDFS Path and then change the FrameworkVersion and SourceLocations.

Files

USERMANUAL.md

Latest commit

History

USERMANUAL.md

File metadata and controls