Skip to content

Commit

Permalink
Merge pull request #3711 from terascope/job-deletion
Browse files Browse the repository at this point in the history
[teraslice] - Add ability to soft delete a job
  • Loading branch information
jsnoble authored Sep 6, 2024
2 parents 875137c + dab1481 commit dc89d8c
Show file tree
Hide file tree
Showing 48 changed files with 766 additions and 155 deletions.
43 changes: 42 additions & 1 deletion docs/management-apis/endpoints-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ Returns an array of all jobs listed in `${clusterName}__jobs` index.
**Query Options:**

- `active: string = [true|false]`
- `deleted: string = [true|false]`
- `from: number = 0`
- `size: number = 100`
- `sort: string = "_updated:desc"`
Expand All @@ -205,7 +206,11 @@ Setting `active` to `true` will return only the jobs considered active, which
includes the jobs that have `active` set to `true` as well as those that do not
have an `active` property. If your query sets `active` to `false` it will only
return the jobs with the `active` property set to false. If the `active` query
parameteris not provided, all jobs will be returned.
parameter is not provided, all jobs will be returned.

Setting `deleted` to `false` or not setting the option will return jobs
where `_deleted` is set to `false` or the `_deleted` key is not present.
Setting `deleted` to `true` will return all `_deleted: true` jobs.

The parameter `size` is the number of documents returned, `from` is how many
documents in and `sort` is a lucene query.
Expand Down Expand Up @@ -444,6 +449,8 @@ $ curl -XPOST 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd/_work

## POST /v1/jobs/\{jobId\}/_active

**DEPRECATED** - Jobs should instead be deleted

Sets the `active` property on the specified job as `true`.

**Query Options:**
Expand Down Expand Up @@ -473,6 +480,8 @@ $ curl -XPOST 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd/_acti

## POST /v1/jobs/\{jobId\}/_inactive

**DEPRECATED** - Jobs should instead be deleted

Sets the `active` property on the specified job as `false`.

**Query Options:**
Expand Down Expand Up @@ -563,6 +572,34 @@ $ curl 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd/errors'
]
```

## DELETE /v1/jobs/\{jobId\};

Issues a delete command, deleting the job and all related execution contexts. Deletion is PERMANENT. Once a job is deleted it cannot be started, updated, or recovered. The job must have a terminal status to be deleted. Any orphaned K8s resources associated with the job will also be deleted. The `active` field will automatically be set to `false`.


**Usage:**

```sh
$ curl -XDELETE 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd'
{
"name": "Example",
"lifecycle": "persistent",
"workers": 1,
"operations": [
{
"_op": "noop"
}
]
"job_id": "5a50580c-4a50-48d9-80f8-ac70a00f3dbd",
"_context": "job"
"_created": "2018-09-21T17:49:05.029Z",
"_updated": "2019-04-12T09:43:18.301Z",
"_deleted": true,
"_deleted_on": "2019-04-12T09:43:18.301Z",
"active": false,
}
```

## GET /v1/ex

Returns all execution contexts (job invocations).
Expand All @@ -573,9 +610,13 @@ Returns all execution contexts (job invocations).
- `size: number = 100`
- `sort: string = "_updated:desc"`
- `status: string = "*"`
- `deleted: string = [true|false]`

Size is the number of documents returned, from is how many documents in and sort is a lucene query.

Setting `deleted` to `false` or not setting the option will return execution contexts
where `_deleted` is set to `false` or the `_deleted` key is not present.
Setting `deleted` to `true` will return all execution contexts with `_deleted: true`.
**Usage:**

```sh
Expand Down
12 changes: 11 additions & 1 deletion docs/management-apis/endpoints-txt.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,10 @@ Returns a text table of all job listings.
**Query Options:**

- `fields: string`
- `active: [true|false]`
- `active: string = [true|false]`
- `deleted: string = [true|false]`

**Note:** When showing `deleted` records the `_deleted_on` field will be added to the default fields.

The fields parameter is a string that consists of several words, these words will be used to override the default values and only return the values specified
ie `fields="job_id,pid"` or `fields="job_id pid"`.
Expand All @@ -110,6 +113,8 @@ ie `fields="job_id,pid"` or `fields="job_id pid"`.
- `job_id`
- `_created`
- `_updated`
- `_deleted`
- `_deleted_on`

**Default Fields:**

Expand Down Expand Up @@ -139,6 +144,9 @@ Returns a text table of all job execution contexts.
**Query Options:**

- `fields: string`
- `deleted: string = [true|false]]`

**Note:** When showing `deleted` records the `_deleted_on` field will be added to the default fields.

The fields parameter is a string that consists of several words, these words will be used to override the default values and only return the values specified
ie `fields="job_id,pid"` or `fields="job_id pid"`.
Expand All @@ -156,6 +164,8 @@ ie `fields="job_id,pid"` or `fields="job_id pid"`.
- `job_id`
- `_created`
- `_updated`
- `_deleted`
- `_deleted_on`

**Default Fields:**

Expand Down
38 changes: 33 additions & 5 deletions docs/packages/teraslice-cli/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,19 +294,28 @@ teraslice-cli tjm workers remove 5 JOB.JSON
teraslice-cli tjm workers total 50 JOB.JSON
```

### tjm delete

Delete a job or jobs from a teraslice cluster by referencing the job file. Jobs must be stopped.

```sh
teraslice-cli tjm delete JOB.JSON
teraslice-cli tjm delete JOB1.JSON JOB2.JSON
```

## Jobs

*** Job control commands start, stop, pause, resume, and restart all function with the same syntax.***

- `-all` or `-a` performs action on all the jobs on a given cluster.
- Providing a job_id of `all` will perform the action on all the jobs on a given cluster.
- `--yes` or `y` answers yes to all prompts

- When jobs are stopped or paused the state of the jobs are saved in `~/.teraslice/job_state_files`

Commands:

```bash
teraslice-cli jobs <command> <cluster> [-all|-a]
teraslice-cli jobs <command> <cluster> [job_id | all]
# stop
teraslice-cli jobs stop local 99999999-9999-9999-9999-999999999999
# start
Expand All @@ -318,7 +327,7 @@ teraslice-cli jobs resume local 99999999-9999-9999-9999-999999999999
# restart job
teraslice-cli jobs restart local 99999999-9999-9999-9999-999999999999
# restart all jobs, no prompt
teraslice-cli jobs restart local --all -y
teraslice-cli jobs restart local all -y
```

### jobs await
Expand Down Expand Up @@ -372,6 +381,11 @@ Display jobs registered on the cluster
teraslice-cli jobs list <cluster>
# list jobs
teraslice-cli jobs list local
# list only deleted jobs
teraslice-cli jobs list local --deleted=true
# list only active jobs that have not been deleted
teraslice-cli jobs list local --active=true

```

### jobs view
Expand Down Expand Up @@ -405,6 +419,18 @@ teraslice-cli jobs workers add 5 cluster1 99999999-9999-9999-9999-999999999999
teraslice-cli jobs workers remove 5 cluster1 99999999-9999-9999-9999-999999999999
```

### jobs delete

Delete a job or jobs by job_id from a teraslice cluster. Jobs must be in a terminal state.

```sh
teraslice-cli jobs delete <cluster> <job_id>
# delete a job
teraslice-cli jobs delete cluster1 99999999-9999-9999-9999-999999999999
# delete all stopped jobs on a cluster, no prompt. Active jobs will be skipped.
teraslice-cli jobs delete cluster1 all -y
```

## Executions

### ex errors
Expand Down Expand Up @@ -436,14 +462,16 @@ teraslice-cli jobs status local --status failing

### ex list

Display execution ids on the cluster, default is `running` and `failing`
Display execution ids on the cluster, default is to exclude deleted and show all statuses

```bash
teraslice-cli ex list <cluster>
# list ex_ids
teraslice-cli ex list local
# list failed ex_ids
teraslice-cli ex list local --status failed
teraslice-cli ex list local --status=failed
# list deleted ex_ids
teraslice-cli ex list local --deleted=true
```

## Nodes
Expand Down
4 changes: 2 additions & 2 deletions e2e/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@
"ms": "^2.1.3"
},
"devDependencies": {
"@terascope/types": "^1.0.1",
"@terascope/types": "^1.1.0",
"bunyan": "^1.8.15",
"elasticsearch-store": "^1.0.4",
"elasticsearch-store": "^1.1.0",
"fs-extra": "^11.2.0",
"ms": "^2.1.3",
"nanoid": "^3.3.4",
Expand Down
68 changes: 68 additions & 0 deletions e2e/test/cases/cluster/api-spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import { cloneDeep, pDelay } from '@terascope/utils';
import { JobConfig } from '@terascope/types';
import { TerasliceHarness } from '../../teraslice-harness.js';
import { TEST_PLATFORM } from '../../config.js';
import { Ex, Job } from 'teraslice-client-js';

describe('cluster api', () => {
let terasliceHarness: TerasliceHarness;
Expand Down Expand Up @@ -148,4 +149,71 @@ describe('cluster api', () => {
const response = await terasliceHarness.teraslice.cluster.txt('assets/ex1/0.0.1');
expect(response).toBeString();
});

describe('DELETE /jobs/<jobId>', () => {
// NOTE: every test in this section will use a single job

const deletedJobProperties = {
_deleted: true,
_deleted_on: expect.anything(),
active: false
};

let job: Job;
let jobId: string;
let ex: Ex;
let jobSpec: JobConfig;

beforeAll(async () => {
jobSpec = terasliceHarness.newJob('generator');
// Set resource constraints on workers within CI
if (TEST_PLATFORM === 'kubernetes' || TEST_PLATFORM === 'kubernetesV2') {
jobSpec.resources_requests_cpu = 0.05;
}

job = await terasliceHarness.teraslice.jobs.submit(jobSpec, false);
jobId = job.id();
const { ex_id: exId } = await job.execution();
ex = terasliceHarness.teraslice.executions.wrap(exId);
})

it('will not delete a running job', async () => {
await terasliceHarness.waitForExStatus(ex, 'running', 100, 1000);

await expect(terasliceHarness.teraslice.jobs.delete(`/jobs/${jobId}`)).rejects.toThrow();
});

it('will delete a stopped job', async () => {
await terasliceHarness.teraslice.jobs.post(`/jobs/${jobId}/_stop`);
await terasliceHarness.waitForExStatus(ex, 'stopped', 100, 1000);

await expect(terasliceHarness.teraslice.jobs.delete(`/jobs/${jobId}`)).resolves.toMatchObject(deletedJobProperties);
});

it('will not list a deleted job by default', async () => {
const list = await terasliceHarness.teraslice.jobs.list();
const jobIds = list.map((job) => job.job_id);
expect(jobIds).toEqual(expect.arrayContaining([expect.not.stringMatching(jobId)]));
});

it('will list a deleted job when passed "{ deleted: true }"', async () => {
const list = await terasliceHarness.teraslice.jobs.list({ deleted: true });
expect(list).toEqual(expect.arrayContaining([expect.objectContaining({ ...jobSpec, job_id: jobId })]));
});

it('will not start a deleted job', async () => {
await expect(terasliceHarness.teraslice.jobs.post(`/jobs/${jobId}/_start`)).rejects.toThrow(`Job ${jobId} has been deleted and cannot be started.`);

});

it('will not update a deleted job', async () => {
await expect(terasliceHarness.teraslice.jobs.put(`/jobs/${jobId}`, { workers: 1 })).rejects.toThrow(`Job ${jobId} has been deleted and cannot be updated.`);

});

it('will not recover a deleted job', async () => {
await expect(terasliceHarness.teraslice.jobs.post(`/jobs/${jobId}/_recover`)).rejects.toThrow(`Job ${jobId} has been deleted and cannot be recovered.`);

});
});
});
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "teraslice-workspace",
"displayName": "Teraslice",
"version": "2.2.0",
"version": "2.3.0",
"private": true,
"homepage": "https://github.com/terascope/teraslice",
"bugs": {
Expand Down
10 changes: 5 additions & 5 deletions packages/data-mate/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@terascope/data-mate",
"displayName": "Data-Mate",
"version": "1.0.4",
"version": "1.1.0",
"description": "Library of data validations/transformations",
"homepage": "https://github.com/terascope/teraslice/tree/master/packages/data-mate#readme",
"repository": {
Expand Down Expand Up @@ -30,9 +30,9 @@
"test:watch": "ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/data-types": "^1.0.1",
"@terascope/types": "^1.0.1",
"@terascope/utils": "^1.0.1",
"@terascope/data-types": "^1.1.0",
"@terascope/types": "^1.1.0",
"@terascope/utils": "^1.1.0",
"@types/validator": "^13.11.10",
"awesome-phonenumber": "^2.70.0",
"date-fns": "^2.30.0",
Expand All @@ -47,7 +47,7 @@
"uuid": "^9.0.1",
"valid-url": "^1.0.9",
"validator": "^13.12.0",
"xlucene-parser": "^1.0.3"
"xlucene-parser": "^1.1.0"
},
"devDependencies": {
"@types/ip6addr": "^0.2.6",
Expand Down
6 changes: 3 additions & 3 deletions packages/data-types/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@terascope/data-types",
"displayName": "Data Types",
"version": "1.0.1",
"version": "1.1.0",
"description": "A library for defining the data structures and mapping",
"homepage": "https://github.com/terascope/teraslice/tree/master/packages/data-types#readme",
"bugs": {
Expand All @@ -27,8 +27,8 @@
"test:watch": "ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "^1.0.1",
"@terascope/utils": "^1.0.1",
"@terascope/types": "^1.1.0",
"@terascope/utils": "^1.1.0",
"graphql": "^14.7.0",
"lodash": "^4.17.21",
"yargs": "^17.7.2"
Expand Down
8 changes: 4 additions & 4 deletions packages/elasticsearch-api/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@terascope/elasticsearch-api",
"displayName": "Elasticsearch API",
"version": "4.0.1",
"version": "4.1.0",
"description": "Elasticsearch client api used across multiple services, handles retries and exponential backoff",
"homepage": "https://github.com/terascope/teraslice/tree/master/packages/elasticsearch-api#readme",
"bugs": {
Expand All @@ -24,16 +24,16 @@
"test:watch": "TEST_RESTRAINED_ELASTICSEARCH='true' ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "^1.0.1",
"@terascope/utils": "^1.0.1",
"@terascope/types": "^1.1.0",
"@terascope/utils": "^1.1.0",
"bluebird": "^3.7.2",
"setimmediate": "^1.0.5"
},
"devDependencies": {
"@opensearch-project/opensearch": "^1.2.0",
"@types/elasticsearch": "^5.0.43",
"elasticsearch": "^15.4.1",
"elasticsearch-store": "^1.0.4",
"elasticsearch-store": "^1.1.0",
"elasticsearch6": "npm:@elastic/elasticsearch@^6.7.0",
"elasticsearch7": "npm:@elastic/elasticsearch@^7.0.0",
"elasticsearch8": "npm:@elastic/elasticsearch@^8.0.0"
Expand Down
Loading

0 comments on commit dc89d8c

Please sign in to comment.