Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved docs #157

Merged
merged 4 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
461 changes: 139 additions & 322 deletions README.md

Large diffs are not rendered by default.

Binary file added docs/assets/work_dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/work_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/concepts/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,21 @@ to run on every hyper parameter.
success([Success]):::green

subgraph one[Parameter 1]
process_chunk1([Process Chunk]):::yellow
process_chunk1([Train model]):::yellow
success_chunk1([Success]):::yellow

process_chunk1 --> success_chunk1
end

subgraph two[Parameter ...]
process_chunk2([Process Chunk]):::yellow
process_chunk2([Train model]):::yellow
success_chunk2([Success]):::yellow

process_chunk2 --> success_chunk2
end

subgraph three[Parameter n]
process_chunk3([Process Chunk]):::yellow
process_chunk3([Train model]):::yellow
success_chunk3([Success]):::yellow

process_chunk3 --> success_chunk3
Expand Down
51 changes: 2 additions & 49 deletions docs/configurations/overview.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,2 @@
**runnable** is designed to make effective collaborations between data scientists/researchers
and infrastructure engineers.

All the features described in the [concepts](../concepts/the-big-picture.md) are
aimed at the *research* side of data science projects while configurations add *scaling* features to them.


Configurations are presented during the execution:

For ```yaml``` based pipeline, use the ```--config-file, -c``` option in the [runnable CLI](../usage.md/#usage).

For [python SDK](../sdk.md/#runnable.Pipeline.execute), use the ```configuration_file``` option or via
environment variable ```runnable_CONFIGURATION_FILE```

## Default configuration

```yaml
--8<-- "examples/configs/default.yaml"
```

1. Execute the pipeline in the local compute environment.
2. The run log is not persisted but present in-memory and flushed at the end of execution.
3. No catalog functionality, all catalog operations are effectively no-op.
4. No secrets functionality, all secrets are effectively no-op.
5. No experiment tracking tools, all interactions with experiment tracking tools are effectively no-op.
Run log still captures the metrics, but are not passed to the experiment tracking tools.

The default configuration for all the pipeline executions runs on the
[local compute](executors/local.md), using a
[buffered run log](run-log.md/#buffered) store with
[no catalog](catalog.md/#do-nothing) or
[secrets](secrets.md/#do-nothing) or
[experiment tracking functionality](experiment-tracking.md/).



## Format

The configuration file is in yaml format and the typical structure is:

```yaml
service:
type: service provider
config:
...
```

where service is one of ```executor```, ```catalog```, ```experiment_tracker```,
```secrets``` or ```run_log_store```.
**runnable** is designed to enable the pipeline execution in varied computational environments without changing the
infrastructure patterns.
32 changes: 31 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,41 @@ The difference between native driver and runnable orchestration:
- [x] The pipeline is `runnable` in any environment.


## But why runnable?
## why runnable?

Obviously, there are a lot of orchestration tools. A well maintained and curated [list is
available here](https://github.com/EthicalML/awesome-production-machine-learning/).

Broadly, they could be classed into ```native``` or ```meta``` orchestrators.

<figure markdown>
![Image title](assets/work_light.png#only-light){ width="600" height="300"}
![Image title](assets/work_dark.png#only-dark){ width="600" height="300"}
</figure>


### __native orchestrators__

- Focus on resource management, job scheduling, robustness and scalability.
- Have less features on domain (data engineering, data science) activities.
- Difficult to run locally.
- Not ideal for quick experimentation or research activities.

### __meta orchestrators__

- An abstraction over native orchestrators.
- Oriented towards domain (data engineering, data science) features.
- Easy to get started and run locally.
- Ideal for quick experimentation or research activities.

```runnable``` is a _meta_ orchestrator with simple API, geared towards data engineering, data science activities.
It works in conjunction with _native_ orchestrators and an alternative to [kedro](https://docs.kedro.org/en/stable/index.html)
or [metaflow](https://metaflow.org/).





```runnable``` stands out based on these design principles.

<div class="grid cards" markdown>
Expand Down
72 changes: 41 additions & 31 deletions docs/reference.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
Please accompany the reference with ```examples``` from
[the repo](https://github.com/AstraZeneca/runnable-core).



## PythonTask

=== "sdk"
Expand Down Expand Up @@ -75,18 +80,40 @@
<hr style="border:2px dotted orange">


## Catalog
## ShellTask

=== "sdk"

::: runnable.Catalog
::: runnable.ShellTask
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3

=== "yaml"

Attributes:

- ```name```: the name of the task
- ```command```: the path to the notebook relative to the project root.
- ```next```: the next node to call if the function succeeds. Use ```success``` to terminate
the pipeline successfully or ```fail``` to terminate with fail.
- ```on_failure```: The next node in case of failure.
- ```catalog```: mapping of cataloging items
- ```overrides```: mapping of step overrides from global configuration.

```yaml
dag:
steps:
name: <>
type: task
command: <>
next: <>
on_failure: <>
catalog: # Any cataloging to be done.
overrides: # mapping of overrides of global configuration
```


<hr style="border:2px dotted orange">
Expand All @@ -108,16 +135,14 @@
<hr style="border:2px dotted orange">



## ShellTask
## Catalog

=== "sdk"

::: runnable.ShellTask
::: runnable.Catalog
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3

=== "yaml"
Expand All @@ -128,30 +153,29 @@




## Parallel

## Pipeline

=== "sdk"

::: runnable.Parallel
::: runnable.Pipeline
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3
members:
- execute

=== "yaml"



<hr style="border:2px dotted orange">
## Parallel

## Map

=== "sdk"

::: runnable.Map
::: runnable.Parallel
options:
show_root_heading: true
show_bases: false
Expand All @@ -160,35 +184,21 @@

=== "yaml"

<hr style="border:2px dotted orange">



::: runnable.Success
options:
show_root_heading: true
show_bases: false
show_docstring_description: true

<hr style="border:2px dotted orange">

::: runnable.Fail
options:
show_root_heading: true
show_bases: false
show_docstring_description: true

<hr style="border:2px dotted orange">

## Pipeline
## Map

=== "sdk"

::: runnable.Pipeline
::: runnable.Map
options:
show_root_heading: true
show_bases: false
show_docstring_description: true
heading_level: 3

=== "yaml"

<hr style="border:2px dotted orange">
5 changes: 1 addition & 4 deletions examples/01-tasks/stub.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,7 @@ def main():

step3 = Stub(name="step3", terminate_with_success=True)

pipeline = Pipeline(
steps=[step1, step2, step3],
add_terminal_nodes=True,
)
pipeline = Pipeline(steps=[step1, step2, step3])

pipeline.execute()

Expand Down
2 changes: 1 addition & 1 deletion examples/02-sequential/on_failure_fail.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def main():
step_1.on_failure = step_4.name

pipeline = Pipeline(
steps=[step_1, step_2, step_3, [step_4]],
steps=[step_1, step_2, step_3],
)
pipeline.execute()

Expand Down
1 change: 0 additions & 1 deletion examples/03-parameters/static_parameters_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ def read_initial_params_as_json(

pipeline = Pipeline(
steps=[read_params_as_pydantic, read_params_as_json],
add_terminal_nodes=True,
)

_ = pipeline.execute(parameters_file="examples/common/initial_parameters.yaml")
Expand Down
1 change: 0 additions & 1 deletion examples/07-map/custom_reducer.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ def iterable_branch(execute: bool = True):

pipeline = Pipeline(
steps=[process_chunk_task_python, process_chunk_task_notebook, process_chunk_task_shell, read_chunk],
add_terminal_nodes=True,
)

if execute:
Expand Down
1 change: 0 additions & 1 deletion examples/07-map/map.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ def iterable_branch(execute: bool = True):

pipeline = Pipeline(
steps=[process_chunk_task_python, process_chunk_task_notebook, process_chunk_task_shell, read_chunk],
add_terminal_nodes=True,
)

if execute:
Expand Down
1 change: 0 additions & 1 deletion examples/comparisions/README.md

This file was deleted.

Loading
Loading