Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teraslice] - Add ability to soft delete a job #3711

Merged
merged 41 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a595d5d
update types
busma13 Aug 5, 2024
7fda0bf
update mappings
busma13 Aug 5, 2024
009fe97
add job deletion functionality
busma13 Aug 5, 2024
57fb921
add execution deletion functionality
busma13 Aug 5, 2024
bd9e6af
Add 'jobs/:jobId/_delete' api endpoint and update 'get ex' and 'get j…
busma13 Aug 5, 2024
97ede79
fix type
busma13 Aug 6, 2024
7b4822d
update types
busma13 Aug 7, 2024
c3ac591
fix typo
busma13 Aug 7, 2024
26b969e
update teraslice-client-js
busma13 Aug 7, 2024
481e35c
update teraslice cli
busma13 Aug 7, 2024
8aab397
update docs
busma13 Aug 8, 2024
c077598
show deleted_on instead of deleted
busma13 Aug 8, 2024
79065ed
update status codes
busma13 Aug 8, 2024
5f94c41
refactor
busma13 Aug 8, 2024
b1abcb5
missed a rename
busma13 Aug 8, 2024
dd63aee
fix possible undefined cluster alias
busma13 Aug 8, 2024
c39f61a
fix _delete overwritten on update
busma13 Aug 9, 2024
bb4594e
git commit -a -m "release: (minor) [email protected]" -m "bump: (minor)…
busma13 Aug 9, 2024
6b5ecb6
fix buggy skip prompt logic
busma13 Aug 26, 2024
2b0f28b
clarify delete command description
busma13 Aug 26, 2024
7fb510a
fix _deleted key being overwritten on update
busma13 Aug 26, 2024
8941c79
make docs clearer
busma13 Aug 26, 2024
5571b98
fix delete prompt, update docs
busma13 Aug 26, 2024
d5515d0
remove comment
busma13 Aug 26, 2024
8932f11
doc fix
busma13 Aug 26, 2024
ec02b9d
create quesries in requestHandler
busma13 Sep 3, 2024
3a92604
make _deleted an optional field and filtering by _deleted true or false
busma13 Sep 4, 2024
e303400
move query building fns to api_utils.ts
busma13 Sep 4, 2024
990bec3
add tests
busma13 Sep 4, 2024
39fdd97
update docs
busma13 Sep 4, 2024
a300034
deprecate _active and _inactive
busma13 Sep 4, 2024
63db7a1
don't end if job can't be deleted, just skip that job
busma13 Sep 5, 2024
1fef08d
fix docs and examples. Don't use job state file when deleting all
busma13 Sep 5, 2024
d959063
remove unneeded type
busma13 Sep 5, 2024
e3501b5
no need to copy _deleted on job update
busma13 Sep 5, 2024
2db646a
remove unneded type
busma13 Sep 5, 2024
7acd499
release: (minor) [email protected]
busma13 Sep 5, 2024
ac7f6a6
fix test
busma13 Sep 5, 2024
03ee575
fix flaky test
busma13 Sep 5, 2024
26b2349
update docs
busma13 Sep 6, 2024
dab1481
more e2e tests
busma13 Sep 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 42 additions & 1 deletion docs/management-apis/endpoints-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ Returns an array of all jobs listed in `${clusterName}__jobs` index.
**Query Options:**

- `active: string = [true|false]`
- `deleted: string = [true|false]`
- `from: number = 0`
- `size: number = 100`
- `sort: string = "_updated:desc"`
Expand All @@ -205,7 +206,11 @@ Setting `active` to `true` will return only the jobs considered active, which
includes the jobs that have `active` set to `true` as well as those that do not
have an `active` property. If your query sets `active` to `false` it will only
return the jobs with the `active` property set to false. If the `active` query
parameteris not provided, all jobs will be returned.
parameter is not provided, all jobs will be returned.

Setting `deleted` to `false` or not setting the option will return jobs
where `_deleted` is set to `false` or the `_deleted` key is not present.
Setting `deleted` to `true` will return all `_deleted: true` jobs.

The parameter `size` is the number of documents returned, `from` is how many
documents in and `sort` is a lucene query.
Expand Down Expand Up @@ -444,6 +449,8 @@ $ curl -XPOST 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd/_work

## POST /v1/jobs/{jobId}/_active

**DEPRECATED** - Jobs should instead be deleted

Sets the `active` property on the specified job as `true`.

**Query Options:**
Expand Down Expand Up @@ -473,6 +480,8 @@ $ curl -XPOST 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd/_acti

## POST /v1/jobs/{jobId}/_inactive

**DEPRECATED** - Jobs should instead be deleted

Sets the `active` property on the specified job as `false`.

**Query Options:**
Expand Down Expand Up @@ -563,6 +572,34 @@ $ curl 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd/errors'
]
```

## DELETE /v1/jobs/\{jobId\};

Issues a delete command, deleting the job and all related execution contexts. Deletion is PERMANENT. Once a job is deleted it cannot be started, updated, or recovered. The job must have a terminal status to be deleted. Any orphaned K8s resources associated with the job will also be deleted. The `active` field will automatically be set to `false`.


**Usage:**

```sh
$ curl -XDELETE 'localhost:5678/v1/jobs/5a50580c-4a50-48d9-80f8-ac70a00f3dbd'
{
"name": "Example",
"lifecycle": "persistent",
"workers": 1,
"operations": [
{
"_op": "noop"
}
]
"job_id": "5a50580c-4a50-48d9-80f8-ac70a00f3dbd",
"_context": "job"
"_created": "2018-09-21T17:49:05.029Z",
"_updated": "2019-04-12T09:43:18.301Z",
"_deleted": true,
"_deleted_on": "2019-04-12T09:43:18.301Z",
"active": false,
}
```

## GET /v1/ex

Returns all execution contexts (job invocations).
Expand All @@ -573,9 +610,13 @@ Returns all execution contexts (job invocations).
- `size: number = 100`
- `sort: string = "_updated:desc"`
- `status: string = "*"`
- `deleted: string = [true|false]`

Size is the number of documents returned, from is how many documents in and sort is a lucene query.

Setting `deleted` to `false` or not setting the option will return execution contexts
where `_deleted` is set to `false` or the `_deleted` key is not present.
Setting `deleted` to `true` will return all execution contexts with `_deleted: true`.
**Usage:**

```sh
Expand Down
12 changes: 11 additions & 1 deletion docs/management-apis/endpoints-txt.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,10 @@ Returns a text table of all job listings.
**Query Options:**

- `fields: string`
- `active: [true|false]`
- `active: string = [true|false]`
- `deleted: string = [true|false]`

**Note:** When showing `deleted` records the `_deleted_on` field will be added to the default fields.

The fields parameter is a string that consists of several words, these words will be used to override the default values and only return the values specified
ie `fields="job_id,pid"` or `fields="job_id pid"`.
Expand All @@ -110,6 +113,8 @@ ie `fields="job_id,pid"` or `fields="job_id pid"`.
- `job_id`
- `_created`
- `_updated`
- `_deleted`
- `_deleted_on`

**Default Fields:**

Expand Down Expand Up @@ -139,6 +144,9 @@ Returns a text table of all job execution contexts.
**Query Options:**

- `fields: string`
- `deleted: string = [true|false]]`

**Note:** When showing `deleted` records the `_deleted_on` field will be added to the default fields.

The fields parameter is a string that consists of several words, these words will be used to override the default values and only return the values specified
ie `fields="job_id,pid"` or `fields="job_id pid"`.
Expand All @@ -156,6 +164,8 @@ ie `fields="job_id,pid"` or `fields="job_id pid"`.
- `job_id`
- `_created`
- `_updated`
- `_deleted`
- `_deleted_on`

**Default Fields:**

Expand Down
38 changes: 33 additions & 5 deletions docs/packages/teraslice-cli/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,19 +294,28 @@ teraslice-cli tjm workers remove 5 JOB.JSON
teraslice-cli tjm workers total 50 JOB.JSON
```

### tjm delete

Delete a job or jobs from a teraslice cluster by referencing the job file. Jobs must be stopped.

```sh
teraslice-cli tjm delete JOB.JSON
teraslice-cli tjm delete JOB1.JSON JOB2.JSON
```

## Jobs

*** Job control commands start, stop, pause, resume, and restart all function with the same syntax.***

- `-all` or `-a` performs action on all the jobs on a given cluster.
- Providing a job_id of `all` will perform the action on all the jobs on a given cluster.
- `--yes` or `y` answers yes to all prompts

- When jobs are stopped or paused the state of the jobs are saved in `~/.teraslice/job_state_files`

Commands:

```bash
teraslice-cli jobs <command> <cluster> [-all|-a]
teraslice-cli jobs <command> <cluster> [job_id | all]
# stop
teraslice-cli jobs stop local 99999999-9999-9999-9999-999999999999
# start
Expand All @@ -318,7 +327,7 @@ teraslice-cli jobs resume local 99999999-9999-9999-9999-999999999999
# restart job
teraslice-cli jobs restart local 99999999-9999-9999-9999-999999999999
# restart all jobs, no prompt
teraslice-cli jobs restart local --all -y
teraslice-cli jobs restart local all -y
```

### jobs await
Expand Down Expand Up @@ -372,6 +381,11 @@ Display jobs registered on the cluster
teraslice-cli jobs list <cluster>
# list jobs
teraslice-cli jobs list local
# list only deleted jobs
teraslice-cli jobs list local --deleted=true
# list only active jobs that have not been deleted
teraslice-cli jobs list local --active=true

```

### jobs view
Expand Down Expand Up @@ -405,6 +419,18 @@ teraslice-cli jobs workers add 5 cluster1 99999999-9999-9999-9999-999999999999
teraslice-cli jobs workers remove 5 cluster1 99999999-9999-9999-9999-999999999999
```

### jobs delete

Delete a job or jobs by job_id from a teraslice cluster. Jobs must be in a terminal state.

```sh
teraslice-cli jobs delete <cluster> <job_id>
# delete a job
teraslice-cli jobs delete cluster1 99999999-9999-9999-9999-999999999999
# delete all stopped jobs on a cluster, no prompt. Active jobs will be skipped.
teraslice-cli jobs delete cluster1 all -y
```

## Executions

### ex errors
Expand Down Expand Up @@ -436,14 +462,16 @@ teraslice-cli jobs status local --status failing

### ex list

Display execution ids on the cluster, default is `running` and `failing`
Display execution ids on the cluster, default is to exclude deleted and show all statuses

```bash
teraslice-cli ex list <cluster>
# list ex_ids
teraslice-cli ex list local
# list failed ex_ids
teraslice-cli ex list local --status failed
teraslice-cli ex list local --status=failed
# list deleted ex_ids
teraslice-cli ex list local --deleted=true
```

## Nodes
Expand Down
4 changes: 2 additions & 2 deletions e2e/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@
"ms": "^2.1.3"
},
"devDependencies": {
"@terascope/types": "^1.0.1",
"@terascope/types": "^1.1.0",
"bunyan": "^1.8.15",
"elasticsearch-store": "^1.0.4",
"elasticsearch-store": "^1.1.0",
"fs-extra": "^11.2.0",
"ms": "^2.1.3",
"nanoid": "^3.3.4",
Expand Down
68 changes: 68 additions & 0 deletions e2e/test/cases/cluster/api-spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import { cloneDeep, pDelay } from '@terascope/utils';
import { JobConfig } from '@terascope/types';
import { TerasliceHarness } from '../../teraslice-harness.js';
import { TEST_PLATFORM } from '../../config.js';
import { Ex, Job } from 'teraslice-client-js';

describe('cluster api', () => {
let terasliceHarness: TerasliceHarness;
Expand Down Expand Up @@ -148,4 +149,71 @@ describe('cluster api', () => {
const response = await terasliceHarness.teraslice.cluster.txt('assets/ex1/0.0.1');
expect(response).toBeString();
});

describe('DELETE /jobs/<jobId>', () => {
// NOTE: every test in this section will use a single job

const deletedJobProperties = {
_deleted: true,
_deleted_on: expect.anything(),
active: false
};

let job: Job;
let jobId: string;
let ex: Ex;
let jobSpec: JobConfig;

beforeAll(async () => {
jobSpec = terasliceHarness.newJob('generator');
// Set resource constraints on workers within CI
if (TEST_PLATFORM === 'kubernetes' || TEST_PLATFORM === 'kubernetesV2') {
jobSpec.resources_requests_cpu = 0.05;
}

job = await terasliceHarness.teraslice.jobs.submit(jobSpec, false);
jobId = job.id();
const { ex_id: exId } = await job.execution();
ex = terasliceHarness.teraslice.executions.wrap(exId);
})

it('will not delete a running job', async () => {
await terasliceHarness.waitForExStatus(ex, 'running', 100, 1000);

await expect(terasliceHarness.teraslice.jobs.delete(`/jobs/${jobId}`)).rejects.toThrow();
});

it('will delete a stopped job', async () => {
await terasliceHarness.teraslice.jobs.post(`/jobs/${jobId}/_stop`);
await terasliceHarness.waitForExStatus(ex, 'stopped', 100, 1000);

await expect(terasliceHarness.teraslice.jobs.delete(`/jobs/${jobId}`)).resolves.toMatchObject(deletedJobProperties);
});

it('will not list a deleted job by default', async () => {
const list = await terasliceHarness.teraslice.jobs.list();
const jobIds = list.map((job) => job.job_id);
expect(jobIds).toEqual(expect.arrayContaining([expect.not.stringMatching(jobId)]));
});

it('will list a deleted job when passed "{ deleted: true }"', async () => {
const list = await terasliceHarness.teraslice.jobs.list({ deleted: true });
expect(list).toEqual(expect.arrayContaining([expect.objectContaining({ ...jobSpec, job_id: jobId })]));
});

it('will not start a deleted job', async () => {
await expect(terasliceHarness.teraslice.jobs.post(`/jobs/${jobId}/_start`)).rejects.toThrow(`Job ${jobId} has been deleted and cannot be started.`);

});

it('will not update a deleted job', async () => {
await expect(terasliceHarness.teraslice.jobs.put(`/jobs/${jobId}`, { workers: 1 })).rejects.toThrow(`Job ${jobId} has been deleted and cannot be updated.`);

});

it('will not recover a deleted job', async () => {
await expect(terasliceHarness.teraslice.jobs.post(`/jobs/${jobId}/_recover`)).rejects.toThrow(`Job ${jobId} has been deleted and cannot be recovered.`);

});
});
});
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "teraslice-workspace",
"displayName": "Teraslice",
"version": "2.2.0",
"version": "2.3.0",
"private": true,
"homepage": "https://github.com/terascope/teraslice",
"bugs": {
Expand Down
10 changes: 5 additions & 5 deletions packages/data-mate/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@terascope/data-mate",
"displayName": "Data-Mate",
"version": "1.0.4",
"version": "1.1.0",
"description": "Library of data validations/transformations",
"homepage": "https://github.com/terascope/teraslice/tree/master/packages/data-mate#readme",
"repository": {
Expand Down Expand Up @@ -30,9 +30,9 @@
"test:watch": "ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/data-types": "^1.0.1",
"@terascope/types": "^1.0.1",
"@terascope/utils": "^1.0.1",
"@terascope/data-types": "^1.1.0",
"@terascope/types": "^1.1.0",
"@terascope/utils": "^1.1.0",
"@types/validator": "^13.11.10",
"awesome-phonenumber": "^2.70.0",
"date-fns": "^2.30.0",
Expand All @@ -47,7 +47,7 @@
"uuid": "^9.0.1",
"valid-url": "^1.0.9",
"validator": "^13.12.0",
"xlucene-parser": "^1.0.3"
"xlucene-parser": "^1.1.0"
},
"devDependencies": {
"@types/ip6addr": "^0.2.6",
Expand Down
6 changes: 3 additions & 3 deletions packages/data-types/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@terascope/data-types",
"displayName": "Data Types",
"version": "1.0.1",
"version": "1.1.0",
"description": "A library for defining the data structures and mapping",
"homepage": "https://github.com/terascope/teraslice/tree/master/packages/data-types#readme",
"bugs": {
Expand All @@ -27,8 +27,8 @@
"test:watch": "ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "^1.0.1",
"@terascope/utils": "^1.0.1",
"@terascope/types": "^1.1.0",
"@terascope/utils": "^1.1.0",
"graphql": "^14.7.0",
"lodash": "^4.17.21",
"yargs": "^17.7.2"
Expand Down
8 changes: 4 additions & 4 deletions packages/elasticsearch-api/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@terascope/elasticsearch-api",
"displayName": "Elasticsearch API",
"version": "4.0.1",
"version": "4.1.0",
"description": "Elasticsearch client api used across multiple services, handles retries and exponential backoff",
"homepage": "https://github.com/terascope/teraslice/tree/master/packages/elasticsearch-api#readme",
"bugs": {
Expand All @@ -24,16 +24,16 @@
"test:watch": "TEST_RESTRAINED_ELASTICSEARCH='true' ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "^1.0.1",
"@terascope/utils": "^1.0.1",
"@terascope/types": "^1.1.0",
"@terascope/utils": "^1.1.0",
"bluebird": "^3.7.2",
"setimmediate": "^1.0.5"
},
"devDependencies": {
"@opensearch-project/opensearch": "^1.2.0",
"@types/elasticsearch": "^5.0.43",
"elasticsearch": "^15.4.1",
"elasticsearch-store": "^1.0.4",
"elasticsearch-store": "^1.1.0",
"elasticsearch6": "npm:@elastic/elasticsearch@^6.7.0",
"elasticsearch7": "npm:@elastic/elasticsearch@^7.0.0",
"elasticsearch8": "npm:@elastic/elasticsearch@^8.0.0"
Expand Down
Loading
Loading