diff --git a/CHANGELOG/CHANGELOG-v0.17.0.md b/CHANGELOG/CHANGELOG-v0.17.0.md index 2304834992..68c83e8479 100644 --- a/CHANGELOG/CHANGELOG-v0.17.0.md +++ b/CHANGELOG/CHANGELOG-v0.17.0.md @@ -10,7 +10,7 @@ 1. Great Expectations Integration ([docs](https://docs.flyte.org/en/latest/flytesnacks/examples/greatexpectations_plugin/index.html)). 1. Access to durable blob stores (AWS/GCS/etc) are now pluggable. 1. Local task execution has been updated to also trigger the type engine. -1. Tasks that have `cache=True` should now be cached when running locally as well ([docs](https://docs.flyte.org/en/latest/flytesnacks/examples/development_lifecycle/task_cache.html#how-does-local-caching-work)). +1. Tasks that have `cache=True` should now be cached when running locally as well ([docs](https://docs.flyte.org/en/latest/user_guide/development_lifecycle/caching.html#how-does-local-caching-work)). Please see the [flytekit release](https://github.com/flyteorg/flytekit/releases/tag/v0.22.0) for the full list and more details. diff --git a/CHANGELOG/CHANGELOG-v0.5.0.md b/CHANGELOG/CHANGELOG-v0.5.0.md index 20382f7050..87a4831f7f 100644 --- a/CHANGELOG/CHANGELOG-v0.5.0.md +++ b/CHANGELOG/CHANGELOG-v0.5.0.md @@ -6,7 +6,7 @@ - Enable CI system to run on forks. ## Core Platform -- [Single Task Execution](https://docs.flyte.org/en/latest/flytesnacks/examples/development_lifecycle/remote_task.html) to enable registering and launching tasks outside the scope of a workflow to enable faster iteration and a more intuitive development workflow. +- [Single Task Execution](https://docs.flyte.org/en/latest/user_guide/development_lifecycle/running_tasks.html) to enable registering and launching tasks outside the scope of a workflow to enable faster iteration and a more intuitive development workflow. - [Run to completion](https://docs.flyte.org/en/latest/protos/docs/core/core.html#ref-flyteidl-core-workflowmetadata-onfailurepolicy) to enable workflows to continue executing even if one or more branches fail. - Fixed retries for dynamically yielded nodes. - PreAlpha Support for Raw container with FlyteCoPilot. (docs coming soon). [Sample Notebooks](https://github.com/lyft/flytekit/blob/master/sample-notebooks/raw-container-shell.ipynb). This makes it possible to run workflows with arbitrary containers diff --git a/CHANGELOG/CHANGELOG-v1.1.0.md b/CHANGELOG/CHANGELOG-v1.1.0.md index ebcee3739a..9236270965 100644 --- a/CHANGELOG/CHANGELOG-v1.1.0.md +++ b/CHANGELOG/CHANGELOG-v1.1.0.md @@ -4,7 +4,7 @@ ### User Improvements Support for [Optional types](https://github.com/flyteorg/flyte/issues/2426). With the inclusion of Union types in flytekit, we can now support optional types. -[Flyte Deck](https://github.com/flyteorg/flyte/issues/2175) is now available. Please take a look at the [documentation](https://docs.flyte.org/en/latest/flytesnacks/examples/development_lifecycle/decks.html) and also the [OSS presentation](https://www.youtube.com/watch?v=KqyBYIaAZ7c) that was done a few weeks back. +[Flyte Deck](https://github.com/flyteorg/flyte/issues/2175) is now available. Please take a look at the [documentation](https://docs.flyte.org/en/latest/user_guide/development_lifecycle/decks.html) and also the [OSS presentation](https://www.youtube.com/watch?v=KqyBYIaAZ7c) that was done a few weeks back. ### Backend Improvements diff --git a/CHANGELOG/CHANGELOG-v1.10.0.md b/CHANGELOG/CHANGELOG-v1.10.0.md index 48d298ccf7..7791a6fd20 100644 --- a/CHANGELOG/CHANGELOG-v1.10.0.md +++ b/CHANGELOG/CHANGELOG-v1.10.0.md @@ -8,7 +8,7 @@ Programmatically consuming inputs and outputs using flyteremote became a lot eas ![Usage snippet](./images/v1.10.0-flyteconsole-programmatic-access.png) -You'll now be able to use offloaded types in [eager workflows](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/eager_workflows.html). +You'll now be able to use offloaded types in [eager workflows](https://docs.flyte.org/en/latest/user_guide/advanced_composition/eager_workflows.html). More ergonomic improvements to [pyflyte](https://docs.flyte.org/en/latest/api/flytekit/pyflyte.html), including the inclusion of a progress bar, the ability to activate launchplans, and the ability to interact with gate nodes in local executions. diff --git a/CHANGELOG/CHANGELOG-v1.2.0.md b/CHANGELOG/CHANGELOG-v1.2.0.md index 00a3d8c735..d83bfa4f28 100644 --- a/CHANGELOG/CHANGELOG-v1.2.0.md +++ b/CHANGELOG/CHANGELOG-v1.2.0.md @@ -18,7 +18,7 @@ - dbt plugin (https://github.com/flyteorg/flyte/issues/2202) - cache overriding behavior is now open to all types (https://github.com/flyteorg/flyte/issues/2912) - Bug: Fallback to pickling in the case of unknown types used Unions (https://github.com/flyteorg/flyte/issues/2823) -- [pyflyte run](https://docs.flyte.org/en/latest/api/flytekit/design/clis.html#pyflyte-run) now supports [imperative workflows](https://docs.flyte.org/en/latest/flytesnacks/examples/basics/imperative_workflow.html) +- [pyflyte run](https://docs.flyte.org/en/latest/api/flytekit/design/clis.html#pyflyte-run) now supports [imperative workflows](https://docs.flyte.org/en/latest/user_guide/basics/imperative_workflows.html) - Newlines are now stripped from client secrets (https://github.com/flyteorg/flytekit/pull/1163) - Ensure repeatability in the generation of cache keys in the case of dictionaries (https://github.com/flyteorg/flytekit/pull/1126) - Support for multiple images in the yaml config file (https://github.com/flyteorg/flytekit/pull/1106) diff --git a/CHANGELOG/CHANGELOG-v1.5.0.md b/CHANGELOG/CHANGELOG-v1.5.0.md index a711e38835..1cd809c867 100644 --- a/CHANGELOG/CHANGELOG-v1.5.0.md +++ b/CHANGELOG/CHANGELOG-v1.5.0.md @@ -63,7 +63,7 @@ def wf(a: int) -> str: Notice how calls to `t1_fixed_b` do not need to specify the `b` parameter. -This also works for [Map Tasks](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/map_task.html) in a limited capacity. For example: +This also works for [Map Tasks](https://docs.flyte.org/en/latest/user_guide/advanced_composition/map_tasks.html) in a limited capacity. For example: ``` from flytekit import task, workflow, partial, map_task @@ -107,5 +107,5 @@ Map tasks do not support partial tasks with lists as inputs. ## Flyteconsole -Multiple bug fixes around [waiting for external inputs](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/waiting_for_external_inputs.html#waiting-for-external-inputs). +Multiple bug fixes around [waiting for external inputs](https://docs.flyte.org/en/latest/user_guide/advanced_composition/waiting_for_external_inputs.html). Better support for dataclasses in the launch form. diff --git a/CHANGELOG/CHANGELOG-v1.9.0.md b/CHANGELOG/CHANGELOG-v1.9.0.md index 90371e5c11..dd7a8f93a3 100644 --- a/CHANGELOG/CHANGELOG-v1.9.0.md +++ b/CHANGELOG/CHANGELOG-v1.9.0.md @@ -1,11 +1,11 @@ # Flyte v1.9.0 Release -In this release we're announcing two experimental features, namely (1) ArrayNode map tasks, and (2) Execution Tags. +In this release we're announcing two experimental features, namely (1) ArrayNode map tasks, and (2) Execution Tags. ### ArrayNode map tasks -ArrayNodes are described more fully in [RFC 3346](https://github.com/flyteorg/flyte/blob/master/rfc/system/3346-array-node.md), but the summary is that ArrayNode map tasks are a drop-in replacement for [regular map tasks](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/map_task.html), the only difference being the submodule used to import the `map_task` function. +ArrayNodes are described more fully in [RFC 3346](https://github.com/flyteorg/flyte/blob/master/rfc/system/3346-array-node.md), but the summary is that ArrayNode map tasks are a drop-in replacement for [regular map tasks](https://docs.flyte.org/en/latest/user-guide/advanced_composition/map_tasks.html), the only difference being the submodule used to import the `map_task` function. More explicitly, let's say you have this code: ```python @@ -15,7 +15,7 @@ from flytekit import map_task, task, workflow @task def t(a: int) -> int: ... - + @workflow def wf(xs: List[int]) -> List[int]: return map_task(t)(a=xs) @@ -31,7 +31,7 @@ from flytekit.experimental import map_task @task def t(a: int) -> int: ... - + @workflow def wf(xs: List[int]) -> List[int]: return map_task(t)(a=xs) @@ -119,7 +119,7 @@ As mentioned before, this feature is shipped in an experimental capacity, the id * chore: remove release git step by @FrankFlitton in https://github.com/flyteorg/flyteconsole/pull/811 * fix: union value handling in launch form by @ursucarina in https://github.com/flyteorg/flyteconsole/pull/812 -## New Contributors +## New Contributors * @Nan2018 made their first contribution in https://github.com/flyteorg/flytekit/pull/1751 * @oliverhu made their first contribution in https://github.com/flyteorg/flytekit/pull/1727 * @DavidMertz made their first contribution in https://github.com/flyteorg/flytekit/pull/1761 diff --git a/README.md b/README.md index 43c9e72a82..6049f262ef 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@

- :building_construction: :rocket: :chart_with_upwards_trend: + :building_construction: :rocket: :chart_with_upwards_trend:

@@ -24,7 +24,7 @@ OpenSSF Best Practices label Flyte Helm Chart label - + Flyte Slack label @@ -36,7 +36,7 @@ Flyte is an open-source orchestrator that facilitates building production-grade Build

-Write code in Python or any other language and leverage a robust type engine. +Write code in Python or any other language and leverage a robust type engine.

Getting started with Flyte @@ -48,7 +48,7 @@ Write code in Python or any other language and leverage a robust type engine. Either locally or on a remote cluster, execute your models with ease.

Getting started with Flyte - +

Get Started @@ -107,24 +107,24 @@ Go to the [Deployment guide](https://docs.flyte.org/en/latest/deployment/deploym ๐ŸŒ **Any language**: Write code in any language using raw containers, or choose [Python](https://github.com/flyteorg/flytekit), [Java](https://github.com/flyteorg/flytekit-java), [Scala](https://github.com/flyteorg/flytekit-java) or [JavaScript](https://github.com/NotMatthewGriffin/pterodactyl) SDKs to develop your Flyte workflows.
๐Ÿ”’ **Immutability**: Immutable executions help ensure reproducibility by preventing any changes to the state of an execution.
๐Ÿงฌ **Data lineage**: Track the movement and transformation of data throughout the lifecycle of your data and ML workflows.
-๐Ÿ“Š **Map tasks**: Achieve parallel code execution with minimal configuration using [map tasks](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/map_task.html).
+๐Ÿ“Š **Map tasks**: Achieve parallel code execution with minimal configuration using [map tasks](https://docs.flyte.org/en/latest/user_guide/advanced_composition/map_tasks.html).
๐ŸŒŽ **Multi-tenancy**: Multiple users can share the same platform while maintaining their own distinct data and configurations.
-๐ŸŒŸ **Dynamic workflows**: [Build flexible and adaptable workflows](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/dynamic_workflow.html) that can change and evolve as needed, making it easier to respond to changing requirements.
-โฏ๏ธ [Wait](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/waiting_for_external_inputs.html) for **external inputs** before proceeding with the execution.
-๐ŸŒณ **Branching**: [Selectively execute branches](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/conditional.html) of your workflow based on static or dynamic data produced by other tasks or input data.
+๐ŸŒŸ **Dynamic workflows**: [Build flexible and adaptable workflows](https://docs.flyte.org/en/latest/user_guide/advanced_composition/dynamic_workflows.html) that can change and evolve as needed, making it easier to respond to changing requirements.
+โฏ๏ธ [Wait](https://docs.flyte.org/en/latest/user_guide/advanced_composition/waiting_for_external_inputs.html) for **external inputs** before proceeding with the execution.
+๐ŸŒณ **Branching**: [Selectively execute branches](https://docs.flyte.org/en/latest/user_guide/advanced_composition/conditionals.html) of your workflow based on static or dynamic data produced by other tasks or input data.
๐Ÿ“ˆ **Data visualization**: Visualize data, monitor models and view training history through plots.
-๐Ÿ“‚ **FlyteFile & FlyteDirectory**: Transfer [files](https://docs.flyte.org/en/latest/flytesnacks/examples/data_types_and_io/file.html#file) and [directories](https://docs.flyte.org/en/latest/flytesnacks/examples/data_types_and_io/folder.html) between local and cloud storage.
-๐Ÿ—ƒ๏ธ **Structured dataset**: Convert dataframes between types and enforce column-level type checking using the abstract 2D representation provided by [Structured Dataset](https://docs.flyte.org/en/latest/flytesnacks/examples/data_types_and_io/structured_dataset.html).
+๐Ÿ“‚ **FlyteFile & FlyteDirectory**: Transfer [files](https://docs.flyte.org/en/latest/user_guide/data_types_and_io/flytefile.html) and [directories](https://docs.flyte.org/en/latest/user_guide/data_types_and_io/flytedirectory.html) between local and cloud storage.
+๐Ÿ—ƒ๏ธ **Structured dataset**: Convert dataframes between types and enforce column-level type checking using the abstract 2D representation provided by [Structured Dataset](https://docs.flyte.org/en/latest/user_guide/data_types_and_io/structureddataset.html).
๐Ÿ›ก๏ธ **Recover from failures**: Recover only the failed tasks.
๐Ÿ” **Rerun a single task**: Rerun workflows at the most granular level without modifying the previous state of a data/ML workflow.
๐Ÿ” **Cache outputs**: Cache task outputs by passing `cache=True` to the task decorator.
-๐Ÿšฉ **Intra-task checkpointing**: [Checkpoint progress](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/checkpoint.html) within a task execution.
+๐Ÿšฉ **Intra-task checkpointing**: [Checkpoint progress](https://docs.flyte.org/en/latest/user_guide/advanced_composition/intratask_checkpoints.html) within a task execution.
โฐ **Timeout**: Define a timeout period, after which the task is marked as failure.
๐Ÿญ **Dev to prod**: As simple as changing your [domain](https://docs.flyte.org/en/latest/concepts/domains.html) from development or staging to production.
๐Ÿ’ธ **Spot or preemptible instances**: Schedule your workflows on spot instances by setting `interruptible` to `True` in the task decorator.
โ˜๏ธ **Cloud-native deployment**: Deploy Flyte on AWS, GCP, Azure and other cloud services.
-๐Ÿ“… **Scheduling**: [Schedule](https://docs.flyte.org/en/latest/flytesnacks/examples/productionizing/lp_schedules.html) your data and ML workflows to run at a specific time.
-๐Ÿ“ข **Notifications**: Stay informed about changes to your workflow's state by configuring [notifications](https://docs.flyte.org/en/latest/flytesnacks/examples/productionizing/lp_notifications.html) through Slack, PagerDuty or email.
+๐Ÿ“… **Scheduling**: [Schedule](https://docs.flyte.org/en/latest/user_guide/productionizing/schedules.html) your data and ML workflows to run at a specific time.
+๐Ÿ“ข **Notifications**: Stay informed about changes to your workflow's state by configuring [notifications](https://docs.flyte.org/en/latest/user_guide/productionizing/notifications.html) through Slack, PagerDuty or email.
โŒ›๏ธ **Timeline view**: Evaluate the duration of each of your Flyte tasks and identify potential bottlenecks.
๐Ÿ’จ **GPU acceleration**: Enable and control your tasksโ€™ GPU demands by requesting resources in the task decorator.
๐Ÿณ **Dependency isolation via containers**: Maintain separate sets of dependencies for your tasks so no dependency conflicts arise.
diff --git a/docs/community/contribute.rst b/docs/community/contribute.rst index 12cbf38b01..e866be5a2c 100644 --- a/docs/community/contribute.rst +++ b/docs/community/contribute.rst @@ -282,7 +282,7 @@ The resulting ``html`` files will be in ``docs/_build/html``. You can view them * - **Purpose**: Examples, Tips, and Tricks to use Flytekit SDKs * - **Language**: Python (In the future, Java examples will be added) * - **Guidelines**: Refer to the `Flytesnacks Contribution Guide `__ - + ``flytectl`` ************ @@ -291,7 +291,7 @@ The resulting ``html`` files will be in ``docs/_build/html``. You can view them * - `Repo `__ * - **Purpose**: A standalone Flyte CLI * - **Language**: Go - * - **Guidelines**: Refer to the `FlyteCTL Contribution Guide `__ + * - **Guidelines**: Refer to the `FlyteCTL Contribution Guide `__ ๐Ÿ”ฎ Development Environment Setup Guide @@ -677,7 +677,7 @@ You can access it via http://localhost:30080/console. Core Flyte components, such as admin, propeller, and datacatalog, as well as user runtime containers rely on an object store (in this case, minio) to hold files. -During development, you might need to examine files such as `input.pb/output.pb `__, or `deck.html `__ stored in minio. +During development, you might need to examine files such as `input.pb/output.pb `__, or `deck.html `__ stored in minio. Access the minio console at: http://localhost:30080/minio/login. The default credentials are: diff --git a/docs/concepts/tasks.rst b/docs/concepts/tasks.rst index f3ae87709e..94807d3632 100644 --- a/docs/concepts/tasks.rst +++ b/docs/concepts/tasks.rst @@ -30,7 +30,7 @@ When deciding if a unit of execution constitutes a Flyte task, consider these qu - Is there a well-defined graceful/successful exit criteria for the task? A task is expected to exit after completion of input processing. - Is it repeatable? Under certain circumstances, a task might be retried, rerun, etc. with the same inputs. It is expected - to produce the same output every single time. For example, avoid using random number generators with current clock as seed. Use a system-provided clock as the seed instead. + to produce the same output every single time. For example, avoid using random number generators with current clock as seed. Use a system-provided clock as the seed instead. - Is it a pure function, i.e., does it have side effects that are unknown to the system (calls a web-service)? It is recommended to avoid side-effects in tasks. When side-effects are evident, ensure that the operations are idempotent. Dynamic Tasks @@ -38,7 +38,7 @@ Dynamic Tasks "Dynamic tasks" is a misnomer. Flyte is one-of-a-kind workflow engine that ships with the concept of truly `Dynamic Workflows `__! -Users can generate workflows in reaction to user inputs or computed values at runtime. +Users can generate workflows in reaction to user inputs or computed values at runtime. These executions are evaluated to generate a static graph before execution. Extending Task @@ -47,9 +47,9 @@ Extending Task Plugins ^^^^^^^ -Flyte exposes an extensible model to express tasks in an execution-independent language. -It contains first-class task plugins (for example: `Papermill `__, -`Great Expectations `__, and :ref:`more `.) +Flyte exposes an extensible model to express tasks in an execution-independent language. +It contains first-class task plugins (for example: `Papermill `__, +`Great Expectations `__, and :ref:`more `.) that execute the Flyte tasks. Almost any action can be implemented and introduced into Flyte as a "Plugin", which includes: @@ -58,7 +58,7 @@ Almost any action can be implemented and introduced into Flyte as a "Plugin", wh - Tasks that call web services. Flyte ships with certain defaults, for example, running a simple Python function does not need any hosted service. Flyte knows how to -execute these kinds of tasks on Kubernetes. It turns out these are the vast majority of tasks in machine learning, and Flyte is adept at +execute these kinds of tasks on Kubernetes. It turns out these are the vast majority of tasks in machine learning, and Flyte is adept at handling an enormous scale on Kubernetes. This is achieved by implementing a unique scheduler on Kubernetes. Types @@ -74,14 +74,14 @@ Inherent Features Fault tolerance ^^^^^^^^^^^^^^^ -In any distributed system, failure is inevitable. Allowing users to design a fault-tolerant system (e.g. workflow) is an inherent goal of Flyte. +In any distributed system, failure is inevitable. Allowing users to design a fault-tolerant system (e.g. workflow) is an inherent goal of Flyte. At a high level, tasks offer two parameters to achieve fault tolerance: **Retries** - -Tasks can define a retry strategy to let the system know how to handle failures (For example: retry 3 times on any kind of error). -There are two kinds of retries: +Tasks can define a retry strategy to let the system know how to handle failures (For example: retry 3 times on any kind of error). + +There are two kinds of retries: 1. System retry: It is a system-defined, recoverable failure that is used when system failures occur. The number of retries is validated against the number of system retries. @@ -91,7 +91,7 @@ System retry can be of two types: - **Downstream System Retry**: When a downstream system (or service) fails, or remote service is not contactable, the failure is retried against the number of retries set `here `__. This performs end-to-end system retry against the node whenever the task fails with a system error. This is useful when the downstream service throws a 500 error, abrupt network failure, etc. -- **Transient Failure Retry**: This retry mechanism offers resiliency against transient failures, which are opaque to the user. It is tracked across the entire duration of execution. It helps Flyte entities and the additional services connected to Flyte like S3, to continue operating despite a system failure. Indeed, all transient failures are handled gracefully by Flyte! Moreover, in case of a transient failure retry, Flyte does not necessarily retry the entire task. โ€œRetrying an entire taskโ€ means that the entire pod associated with the Flyte task would be rerun with a clean slate; instead, it just retries the atomic operation. For example, Flyte tries to persist the state until it can, exhausts the max retries, and backs off. +- **Transient Failure Retry**: This retry mechanism offers resiliency against transient failures, which are opaque to the user. It is tracked across the entire duration of execution. It helps Flyte entities and the additional services connected to Flyte like S3, to continue operating despite a system failure. Indeed, all transient failures are handled gracefully by Flyte! Moreover, in case of a transient failure retry, Flyte does not necessarily retry the entire task. โ€œRetrying an entire taskโ€ means that the entire pod associated with the Flyte task would be rerun with a clean slate; instead, it just retries the atomic operation. For example, Flyte tries to persist the state until it can, exhausts the max retries, and backs off. To set a transient failure retry: @@ -102,17 +102,17 @@ System retry can be of two types: 2. User retry: If a task fails to execute, it is retried for a specific number of times, and this number is set by the user in `TaskMetadata `__. The number of retries must be less than or equal to 10. .. note:: - + Recoverable vs. Non-Recoverable failures: Recoverable failures will be retried and counted against the task's retry count. Non-recoverable failures will just fail, i.e., the task isnโ€™t retried irrespective of user/system retry configurations. All user exceptions are considered non-recoverable unless the exception is a subclass of FlyteRecoverableException. .. note:: - `RFC 3902 `_ implements an alternative, simplified retry behaviour with which both system and user retries are counted towards a single retry budget defined in the task decorator (thus, without a second retry budget defined in the platform configuration). The last retries are always performed on non-spot instances to guarantee completion. To activate this behaviour, set ``configmap.core.propeller.node-config.ignore-retry-cause`` to ``true`` in the helm values. + `RFC 3902 `_ implements an alternative, simplified retry behavior with which both system and user retries are counted towards a single retry budget defined in the task decorator (thus, without a second retry budget defined in the platform configuration). The last retries are always performed on non-spot instances to guarantee completion. To activate this behaviour, set ``configmap.core.propeller.node-config.ignore-retry-cause`` to ``true`` in the helm values. **Timeouts** - + To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the `TaskMetadata `__. @@ -120,4 +120,4 @@ Caching/Memoization ^^^^^^^^^^^^^^^^^^^ Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example, if you wish to run the same piece of code multiple times, you can reuse the output instead of re-computing it. -For more information on memoization, refer to the :std:doc:`Caching Example `. +For more information on memoization, refer to the :std:doc:`/user_guide/development_lifecycle/caching`. diff --git a/docs/conf.py b/docs/conf.py index d9e38e5806..63a1ec9483 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -313,6 +313,7 @@ # These patterns are used to replace values in source files that are imported # from other repos. REPLACE_PATTERNS = { + r"": r"", r"": r"", INTERSPHINX_REFS_PATTERN: INTERSPHINX_REFS_REPLACE, @@ -328,16 +329,16 @@ PROTO_REF_PATTERN: PROTO_REF_REPLACE, r"/protos/docs/service/index": r"/protos/docs/service/service", r"": r"", - r"": r"" } +# r"": r"", + import_projects_config = { "clone_dir": "_projects", "flytekit_api_dir": "_src/flytekit/", "source_regex_mapping": REPLACE_PATTERNS, "list_table_toc": [ - "flytesnacks/userguide", - "flytesnacks/tutorials", + "flytesnacks/tutorials", "flytesnacks/integrations", ], "dev_build": bool(int(os.environ.get("MONODOCS_DEV_BUILD", 1))), @@ -369,6 +370,25 @@ "flytesnacks/_build", "flytesnacks/_tags", "flytesnacks/getting_started", + "flytesnacks/userguide.md", + "flytesnacks/environment_setup.md", + "flytesnacks/index.md", + "examples/advanced_composition", + "examples/basics", + "examples/customizing_dependencies", + "examples/data_types_and_io", + "examples/development_lifecycle", + "examples/extending", + "examples/productionizing", + "examples/testing", + "flytesnacks/examples/advanced_composition", + "flytesnacks/examples/basics", + "flytesnacks/examples/customizing_dependencies", + "flytesnacks/examples/data_types_and_io", + "flytesnacks/examples/development_lifecycle", + "flytesnacks/examples/extending", + "flytesnacks/examples/productionizing", + "flytesnacks/examples/testing", ] ], "local": flytesnacks_local_path is not None, diff --git a/docs/deployment/configuration/customizable_resources.rst b/docs/deployment/configuration/customizable_resources.rst index 2b785d31f6..6fb1318ac6 100644 --- a/docs/deployment/configuration/customizable_resources.rst +++ b/docs/deployment/configuration/customizable_resources.rst @@ -187,7 +187,7 @@ etc. in the `Workflow execution config `__: configures the pod identity and auth credentials for task pods at execution time - `raw_output_data_config`: where offloaded user data is stored -- `interruptible`: whether to use [spot instances](https://docs.flyte.org/en/latest/flytesnacks/examples/productionizing/spot_instances.html#using-spot-preemptible-instances) +- `interruptible`: whether to use [spot instances](https://docs.flyte.org/en/user_guide/productionizing/spot_instances.html) - `overwrite_cache`: Allows for all cached values of a workflow and its tasks to be overwritten for a single execution. - `envs`: Custom environment variables to apply for task pods brought up during execution diff --git a/docs/deployment/configuration/notifications.rst b/docs/deployment/configuration/notifications.rst index 386e19a406..2e4a77ac53 100644 --- a/docs/deployment/configuration/notifications.rst +++ b/docs/deployment/configuration/notifications.rst @@ -39,7 +39,7 @@ For example ) -See detailed usage examples in the :std:doc:`User Guide ` +See detailed usage examples in the :std:doc:`/user_guide/productionizing/notifications` Notifications can be combined with schedules to automatically alert you when a scheduled job succeeds or fails. diff --git a/docs/deployment/plugins/aws/batch.rst b/docs/deployment/plugins/aws/batch.rst index f640b7907a..a3cad36d0e 100644 --- a/docs/deployment/plugins/aws/batch.rst +++ b/docs/deployment/plugins/aws/batch.rst @@ -8,7 +8,7 @@ and single tasks running on AWS Batch. .. note:: - For single [non-map] task use, please take note of + For single [non-map] task use, please take note of the additional code when updating the flytepropeller config. AWS Batch simplifies the process for developers, scientists and engineers to run @@ -21,7 +21,7 @@ optimizing AWS Batch job queues for load distribution and priority coordination. Set up AWS Batch ---------------- -Follow the guide `Running batch jobs +Follow the guide `Running batch jobs at scale for less `__. By the end of this step, your AWS Account should have a configured compute environment @@ -30,7 +30,7 @@ and one or more AWS Batch Job Queues. Modify users' AWS IAM role trust policy document ------------------------------------------------ -Follow the guide `AWS Batch Execution +Follow the guide `AWS Batch Execution IAM role `__. When running workflows in Flyte, users can specify a Kubernetes service account and/or an IAM Role to run as. @@ -40,11 +40,11 @@ to allow elastic container service (ECS) to assume the role. Modify system's AWS IAM role policies ------------------------------------- -Follow the guide `Granting a user permissions to pass a +Follow the guide `Granting a user permissions to pass a role to an AWS service `__. The best practice for granting permissions to Flyte components is by utilizing OIDC, -as described in the +as described in the `OIDC documentation `__. This approach entails assigning an IAM Role to each service account being used. To proceed, identify the IAM Role associated with the flytepropeller's Kubernetes service account, @@ -145,10 +145,10 @@ These configurations reside within FlytePropeller's configMap. Modify the config .. note:: - To register the `map task - `__ on Flyte, + To register the `map task + `__ on Flyte, use the command ``pyflyte register ``. - Launch the execution through the FlyteConsole by selecting the appropriate ``IAM Role`` and entering the full + Launch the execution through the FlyteConsole by selecting the appropriate ``IAM Role`` and entering the full ``AWS Arn`` of an IAM Role configured according to the above guide. Once the task starts executing, you'll find a link for the AWS Array Job in the log links section of the Flyte Console. diff --git a/docs/index.md b/docs/index.md index 3a8d38e6ba..cb49256803 100644 --- a/docs/index.md +++ b/docs/index.md @@ -79,7 +79,7 @@ contribute its architecture and design. You can also access the * - {doc}`๐Ÿ”ค Introduction to Flyte ` - Get your first workflow running, learn about the Flyte development lifecycle and core use cases. -* - {doc}`๐Ÿ“– User Guide ` +* - {doc}`๐Ÿ“– User Guide ` - A comprehensive view of Flyte's functionality for data and ML practitioners. * - {doc}`๐Ÿ“š Tutorials ` - End-to-end examples of Flyte for data/feature engineering, machine learning, @@ -147,7 +147,7 @@ Core use cases :name: examples-guides :hidden: -User Guide +User Guide Tutorials Integrations ``` diff --git a/docs/user_guide/advanced_composition/chaining_flyte_entities.md b/docs/user_guide/advanced_composition/chaining_flyte_entities.md new file mode 100644 index 0000000000..f51b45a2d0 --- /dev/null +++ b/docs/user_guide/advanced_composition/chaining_flyte_entities.md @@ -0,0 +1,112 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(chain_flyte_entities)= + +# Chaining Flyte entities + +```{eval-rst} +.. tags:: Basic +``` + +Flytekit offers a mechanism for chaining Flyte entities using the `>>` operator. +This is particularly valuable when chaining tasks and subworkflows without the need for data flow between the entities. + +## Tasks + +Let's establish a sequence where `t1()` occurs after `t0()`, and `t2()` follows `t1()`. + +```{code-cell} +from flytekit import task, workflow + + +@task +def t2(): + print("Running t2") + return + + +@task +def t1(): + print("Running t1") + return + + +@task +def t0(): + print("Running t0") + return + + +@workflow +def chain_tasks_wf(): + t2_promise = t2() + t1_promise = t1() + t0_promise = t0() + + t0_promise >> t1_promise + t1_promise >> t2_promise +``` + ++++ {"lines_to_next_cell": 0} + +(chain_subworkflow)= +## Subworkflows + +Just like tasks, you can chain {ref}`subworkflows `. + +```{code-cell} +:lines_to_next_cell: 2 + +@workflow +def sub_workflow_1(): + t1() + + +@workflow +def sub_workflow_0(): + t0() + + +@workflow +def chain_workflows_wf(): + sub_wf1 = sub_workflow_1() + sub_wf0 = sub_workflow_0() + + sub_wf0 >> sub_wf1 +``` + +To run the provided workflows on the Flyte cluster, use the following commands: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/chain_entities.py \ + chain_tasks_wf +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/chain_entities.py \ + chain_workflows_wf +``` + +:::{note} +Chaining tasks and subworkflows is not supported in local environments. +Follow the progress of this issue [here](https://github.com/flyteorg/flyte/issues/4080). +::: diff --git a/docs/user_guide/advanced_composition/conditionals.md b/docs/user_guide/advanced_composition/conditionals.md new file mode 100644 index 0000000000..88c447a05c --- /dev/null +++ b/docs/user_guide/advanced_composition/conditionals.md @@ -0,0 +1,323 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(conditional)= + +# Conditionals + +```{eval-rst} +.. tags:: Intermediate +``` + +Flytekit elevates conditions to a first-class construct named `conditional`, providing a powerful mechanism for selectively +executing branches in a workflow. Conditions leverage static or dynamic data generated by tasks or +received as workflow inputs. While conditions are highly performant in their evaluation, +it's important to note that they are restricted to specific binary and logical operators +and are applicable only to primitive values. + +To begin, import the necessary libraries. + +```{code-cell} +import random + +from flytekit import conditional, task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +## Simple branch + +In this example, we introduce two tasks, `calculate_circle_circumference` and +`calculate_circle_area`. The workflow dynamically chooses between these tasks based on whether the input +falls within the fraction range (0-1) or not. + +```{code-cell} +@task +def calculate_circle_circumference(radius: float) -> float: + return 2 * 3.14 * radius # Task to calculate the circumference of a circle + + +@task +def calculate_circle_area(radius: float) -> float: + return 3.14 * radius * radius # Task to calculate the area of a circle + + +@workflow +def shape_properties(radius: float) -> float: + return ( + conditional("shape_properties") + .if_((radius >= 0.1) & (radius < 1.0)) + .then(calculate_circle_circumference(radius=radius)) + .else_() + .then(calculate_circle_area(radius=radius)) + ) + + +if __name__ == "__main__": + radius_small = 0.5 + print(f"Circumference of circle (radius={radius_small}): {shape_properties(radius=radius_small)}") + + radius_large = 3.0 + print(f"Area of circle (radius={radius_large}): {shape_properties(radius=radius_large)}") +``` + ++++ {"lines_to_next_cell": 0} + +## Multiple branches + +We establish an `if` condition with multiple branches, which will result in a failure if none of the conditions is met. +It's important to note that any `conditional` statement in Flyte is expected to be complete, +meaning that all possible branches must be accounted for. + +```{code-cell} +@workflow +def shape_properties_with_multiple_branches(radius: float) -> float: + return ( + conditional("shape_properties_with_multiple_branches") + .if_((radius >= 0.1) & (radius < 1.0)) + .then(calculate_circle_circumference(radius=radius)) + .elif_((radius >= 1.0) & (radius <= 10.0)) + .then(calculate_circle_area(radius=radius)) + .else_() + .fail("The input must be within the range of 0 to 10.") + ) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +Take note of the usage of bitwise operators (`&`). Due to Python's PEP-335, +the logical `and`, `or` and `not` operators cannot be overloaded. +Flytekit employs bitwise `&` and `|` as equivalents for logical `and` and `or` operators, +a convention also observed in other libraries. +::: + +## Consuming the output of a conditional +Here, we write a task that consumes the output returned by a `conditional`. + +```{code-cell} +@workflow +def shape_properties_accept_conditional_output(radius: float) -> float: + result = ( + conditional("shape_properties_accept_conditional_output") + .if_((radius >= 0.1) & (radius < 1.0)) + .then(calculate_circle_circumference(radius=radius)) + .elif_((radius >= 1.0) & (radius <= 10.0)) + .then(calculate_circle_area(radius=radius)) + .else_() + .fail("The input must exist between 0 and 10.") + ) + return calculate_circle_area(radius=result) + + +if __name__ == "__main__": + print(f"Circumference of circle x Area of circle (radius={radius_small}): {shape_properties(radius=5.0)}") +``` + ++++ {"lines_to_next_cell": 0} + +## Using the output of a previous task in a conditional + +You can check if a boolean returned from the previous task is `True`, +but unary operations are not supported directly. Instead, use the `is_true`, +`is_false` and `is_none` methods on the result. + +```{code-cell} +@task +def coin_toss(seed: int) -> bool: + """ + Mimic a condition to verify the successful execution of an operation + """ + r = random.Random(seed) + if r.random() < 0.5: + return True + return False + + +@task +def failed() -> int: + """ + Mimic a task that handles failure + """ + return -1 + + +@task +def success() -> int: + """ + Mimic a task that handles success + """ + return 0 + + +@workflow +def boolean_wf(seed: int = 5) -> int: + result = coin_toss(seed=seed) + return conditional("coin_toss").if_(result.is_true()).then(success()).else_().then(failed()) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +*How do output values acquire these methods?* In a workflow, direct access to outputs is not permitted. +Inputs and outputs are automatically encapsulated in a special object known as {py:class}`flytekit.extend.Promise`. +::: + +## Using boolean workflow inputs in a conditional +You can directly pass a boolean to a workflow. + +```{code-cell} +@workflow +def boolean_input_wf(boolean_input: bool) -> int: + return conditional("boolean_input_conditional").if_(boolean_input.is_true()).then(success()).else_().then(failed()) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +Observe that the passed boolean possesses a method called `is_true`. +This boolean resides within the workflow context and is encapsulated in a specialized Flytekit object. +This special object enables it to exhibit additional behavior. +::: + +You can run the workflows locally as follows: + +```{code-cell} +if __name__ == "__main__": + print("Running boolean_wf a few times...") + for index in range(0, 5): + print(f"The output generated by boolean_wf = {boolean_wf(seed=index)}") + print( + f"Boolean input: {True if index < 2 else False}; workflow output: {boolean_input_wf(boolean_input=True if index < 2 else False)}" + ) +``` + ++++ {"lines_to_next_cell": 0} + +## Nested conditionals + +You can nest conditional sections arbitrarily inside other conditional sections. +However, these nested sections can only be in the `then` part of a `conditional` block. + +```{code-cell} +@workflow +def nested_conditions(radius: float) -> float: + return ( + conditional("nested_conditions") + .if_((radius >= 0.1) & (radius < 1.0)) + .then( + conditional("inner_nested_conditions") + .if_(radius < 0.5) + .then(calculate_circle_circumference(radius=radius)) + .elif_((radius >= 0.5) & (radius < 0.9)) + .then(calculate_circle_area(radius=radius)) + .else_() + .fail("0.9 is an outlier.") + ) + .elif_((radius >= 1.0) & (radius <= 10.0)) + .then(calculate_circle_area(radius=radius)) + .else_() + .fail("The input must be within the range of 0 to 10.") + ) + + +if __name__ == "__main__": + print(f"nested_conditions(0.4): {nested_conditions(radius=0.4)}") +``` + ++++ {"lines_to_next_cell": 0} + +## Using the output of a task in a conditional + +Let's write a fun workflow that triggers the `calculate_circle_circumference` task in the event of a "heads" outcome, +and alternatively, runs the `calculate_circle_area` task in the event of a "tail" outcome. + +```{code-cell} +@workflow +def consume_task_output(radius: float, seed: int = 5) -> float: + is_heads = coin_toss(seed=seed) + return ( + conditional("double_or_square") + .if_(is_heads.is_true()) + .then(calculate_circle_circumference(radius=radius)) + .else_() + .then(calculate_circle_area(radius=radius)) + ) +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + default_seed_output = consume_task_output(radius=0.4) + print( + f"Executing consume_task_output(0.4) with default seed=5. Expected output: calculate_circle_circumference => {default_seed_output}" + ) + + custom_seed_output = consume_task_output(radius=0.4, seed=7) + print(f"Executing consume_task_output(0.4, seed=7). Expected output: calculate_circle_area => {custom_seed_output}") +``` + +## Run the example on the Flyte cluster + +To run the provided workflows on the Flyte cluster, use the following commands: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + shape_properties --radius 3.0 +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + shape_properties_with_multiple_branches --radius 11.0 +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + shape_properties_accept_conditional_output --radius 0.5 +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + boolean_wf +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + boolean_input_wf --boolean_input +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + nested_conditions --radius 0.7 +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ + consume_task_output --radius 0.4 --seed 7 +``` diff --git a/docs/user_guide/advanced_composition/decorating_tasks.md b/docs/user_guide/advanced_composition/decorating_tasks.md new file mode 100644 index 0000000000..50135ee8ab --- /dev/null +++ b/docs/user_guide/advanced_composition/decorating_tasks.md @@ -0,0 +1,152 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(decorating_tasks)= + +# Decorating tasks + +```{eval-rst} +.. tags:: Intermediate +``` + +You can easily change how tasks behave by using decorators to wrap your task functions. + +In order to make sure that your decorated function contains all the type annotation and docstring +information that Flyte needs, you will need to use the built-in {py:func}`~functools.wraps` decorator. + +To begin, import the required dependencies. + +```{code-cell} +import logging +from functools import partial, wraps + +from flytekit import task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +Create a logger to monitor the execution's progress. + +```{code-cell} +logger = logging.getLogger(__file__) +``` + ++++ {"lines_to_next_cell": 0} + +## Using a single decorator + +We define a decorator that logs the input and output details for a decorated task. + +```{code-cell} +def log_io(fn): + @wraps(fn) + def wrapper(*args, **kwargs): + logger.info(f"task {fn.__name__} called with args: {args}, kwargs: {kwargs}") + out = fn(*args, **kwargs) + logger.info(f"task {fn.__name__} output: {out}") + return out + + return wrapper +``` + ++++ {"lines_to_next_cell": 0} + +We create a task named `t1` that is decorated with `log_io`. + +:::{note} +The order of invoking the decorators is important. `@task` should always be the outer-most decorator. +::: + +```{code-cell} +@task +@log_io +def t1(x: int) -> int: + return x + 1 +``` + ++++ {"lines_to_next_cell": 0} + +(stacking_decorators)= + +## Stacking multiple decorators + +You can also stack multiple decorators on top of each other as long as `@task` is the outer-most decorator. + +We define a decorator that verifies if the output from the decorated function is a positive number before it's returned. +If this assumption is violated, it raises a `ValueError` exception. + +```{code-cell} +def validate_output(fn=None, *, floor=0): + @wraps(fn) + def wrapper(*args, **kwargs): + out = fn(*args, **kwargs) + if out <= floor: + raise ValueError(f"output of task {fn.__name__} must be a positive number, found {out}") + return out + + if fn is None: + return partial(validate_output, floor=floor) + + return wrapper +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +The output of the `validate_output` task uses {py:func}`~functools.partial` to implement parameterized decorators. +::: + +We define a function that uses both the logging and validator decorators. + +```{code-cell} +@task +@log_io +@validate_output(floor=10) +def t2(x: int) -> int: + return x + 10 +``` + ++++ {"lines_to_next_cell": 0} + +Finally, we compose a workflow that calls `t1` and `t2`. + +```{code-cell} +@workflow +def decorating_task_wf(x: int) -> int: + return t2(x=t1(x=x)) + + +if __name__ == "__main__": + print(f"Running decorating_task_wf(x=10) {decorating_task_wf(x=10)}") +``` + +## Run the example on the Flyte cluster + +To run the provided workflow on the Flyte cluster, use the following command: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py \ + decorating_task_wf --x 10 +``` + +In this example, you learned how to modify the behavior of tasks via function decorators using the built-in +{py:func}`~functools.wraps` decorator pattern. To learn more about how to extend Flyte at a deeper level, for +example creating custom types, custom tasks or backend plugins, +see {ref}`Extending Flyte `. diff --git a/docs/user_guide/advanced_composition/decorating_workflows.md b/docs/user_guide/advanced_composition/decorating_workflows.md new file mode 100644 index 0000000000..3a369cc433 --- /dev/null +++ b/docs/user_guide/advanced_composition/decorating_workflows.md @@ -0,0 +1,180 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(decorating_workflows)= + +# Decorating workflows + +```{eval-rst} +.. tags:: Intermediate +``` + +The behavior of workflows can be modified in a light-weight fashion by using the built-in {py:func}`~functools.wraps` +decorator pattern, similar to using decorators to +{ref}`customize task behavior `. However, unlike in the case of +tasks, we need to do a little extra work to make sure that the DAG underlying the workflow executes tasks in the +correct order. + +## Setup-teardown pattern + +The main use case of decorating `@workflow`-decorated functions is to establish a setup-teardown pattern to execute task +before and after your main workflow logic. This is useful when integrating with other external services +like [wandb](https://wandb.ai/site) or [clearml](https://clear.ml/), which enable you to track metrics of model +training runs. + +To begin, import the necessary libraries. + +```{code-cell} +from functools import partial, wraps +from unittest.mock import MagicMock + +import flytekit +from flytekit import FlyteContextManager, task, workflow +from flytekit.core.node_creation import create_node +``` + ++++ {"lines_to_next_cell": 0} + +Let's define the tasks we need for setup and teardown. In this example, we use the +{py:class}`unittest.mock.MagicMock` class to create a fake external service that we want to initialize at the +beginning of our workflow and finish at the end. + +```{code-cell} +external_service = MagicMock() + + +@task +def setup(): + print("initializing external service") + external_service.initialize(id=flytekit.current_context().execution_id) + + +@task +def teardown(): + print("finish external service") + external_service.complete(id=flytekit.current_context().execution_id) +``` + ++++ {"lines_to_next_cell": 0} + +As you can see, you can even use Flytekit's current context to access the `execution_id` of the current workflow +if you need to link Flyte with the external service so that you reference the same unique identifier in both the +external service and Flyte. + +## Workflow decorator + +We create a decorator that we want to use to wrap our workflow function. + +```{code-cell} +def setup_teardown(fn=None, *, before, after): + @wraps(fn) + def wrapper(*args, **kwargs): + # get the current flyte context to obtain access to the compilation state of the workflow DAG. + ctx = FlyteContextManager.current_context() + + # defines before node + before_node = create_node(before) + # ctx.compilation_state.nodes == [before_node] + + # under the hood, flytekit compiler defines and threads + # together nodes within the `my_workflow` function body + outputs = fn(*args, **kwargs) + # ctx.compilation_state.nodes == [before_node, *nodes_created_by_fn] + + # defines the after node + after_node = create_node(after) + # ctx.compilation_state.nodes == [before_node, *nodes_created_by_fn, after_node] + + # compile the workflow correctly by making sure `before_node` + # runs before the first workflow node and `after_node` + # runs after the last workflow node. + if ctx.compilation_state is not None: + # ctx.compilation_state.nodes is a list of nodes defined in the + # order of execution above + workflow_node0 = ctx.compilation_state.nodes[1] + workflow_node1 = ctx.compilation_state.nodes[-2] + before_node >> workflow_node0 + workflow_node1 >> after_node + return outputs + + if fn is None: + return partial(setup_teardown, before=before, after=after) + + return wrapper +``` + ++++ {"lines_to_next_cell": 0} + +There are a few key pieces to note in the `setup_teardown` decorator above: + +1. It takes a `before` and `after` argument, both of which need to be `@task`-decorated functions. These + tasks will run before and after the main workflow function body. +2. The [create_node](https://github.com/flyteorg/flytekit/blob/9e156bb0cf3d1441c7d1727729e8f9b4bbc3f168/flytekit/core/node_creation.py#L18) function + to create nodes associated with the `before` and `after` tasks. +3. When `fn` is called, under the hood Flytekit creates all the nodes associated with the workflow function body +4. The code within the `if ctx.compilation_state is not None:` conditional is executed at compile time, which + is where we extract the first and last nodes associated with the workflow function body at index `1` and `-2`. +5. The `>>` right shift operator ensures that `before_node` executes before the + first node and `after_node` executes after the last node of the main workflow function body. + +## Defining the DAG + +We define two tasks that will constitute the workflow. + +```{code-cell} +@task +def t1(x: float) -> float: + return x - 1 + + +@task +def t2(x: float) -> float: + return x**2 +``` + ++++ {"lines_to_next_cell": 0} + +And then create our decorated workflow: + +```{code-cell} +:lines_to_next_cell: 2 + +@workflow +@setup_teardown(before=setup, after=teardown) +def decorating_workflow(x: float) -> float: + return t2(x=t1(x=x)) + + +if __name__ == "__main__": + print(decorating_workflow(x=10.0)) +``` + +## Run the example on the Flyte cluster + +To run the provided workflow on the Flyte cluster, use the following command: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_workflows.py \ + decorating_workflow --x 10.0 +``` + +To define workflows imperatively, refer to {ref}`this example `, +and to learn more about how to extend Flyte at a deeper level, for example creating custom types, custom tasks or +backend plugins, see {ref}`Extending Flyte `. diff --git a/docs/user_guide/advanced_composition/dynamic_workflows.md b/docs/user_guide/advanced_composition/dynamic_workflows.md new file mode 100644 index 0000000000..99bc88a372 --- /dev/null +++ b/docs/user_guide/advanced_composition/dynamic_workflows.md @@ -0,0 +1,292 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(dynamic_workflow)= + +# Dynamic workflows + +```{eval-rst} +.. tags:: Intermediate +``` + +A workflow whose directed acyclic graph (DAG) is computed at run-time is a {py:func}`~flytekit.dynamic` workflow. +The tasks in a dynamic workflow are executed at runtime using dynamic inputs. +This type of workflow shares similarities with the {py:func}`~flytekit.workflow`, as it employs a Python-esque DSL +to declare dependencies between the tasks or define new workflows. A key distinction lies in the dynamic workflow being assessed at runtime. +This means that the inputs are initially materialized and forwarded to dynamic workflow, resembling the behavior of a task. +However, the return value from a dynamic workflow is a {py:class}`~flytekit.extend.Promise` object, +which can be materialized by the subsequent tasks. + +Think of a dynamic workflow as a combination of a task and a workflow. +It is used to dynamically decide the parameters of a workflow at runtime. +It is both compiled and executed at run-time. You can define a dynamic workflow using the `@dynamic` decorator. + +Within the `@dynamic` context, each invocation of a {py:func}`~flytekit.task` or a derivative of +{py:class}`~flytekit.core.base_task.Task` class leads to deferred evaluation using a promise, +rather than the immediate materialization of the actual value. While nesting other `@dynamic` and +`@workflow` constructs within this task is possible, direct interaction with the outputs of a task/workflow is limited, +as they are lazily evaluated. If interaction with the outputs is desired, it is recommended to separate the +logic in a dynamic workflow and create a new task to read and resolve the outputs. + +Dynamic workflows become essential when you require: + +- Modifying the logic of the code at runtime +- Changing or deciding on feature extraction parameters on-the-go +- Building AutoML pipelines +- Tuning hyperparameters during execution + +This example utilizes dynamic workflow to count the common characters between any two strings. + +To begin, we import the required libraries. + +```{code-cell} +from flytekit import dynamic, task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +We define a task that returns the index of a character, where A-Z/a-z is equivalent to 0-25. + +```{code-cell} +@task +def return_index(character: str) -> int: + if character.islower(): + return ord(character) - ord("a") + else: + return ord(character) - ord("A") +``` + ++++ {"lines_to_next_cell": 0} + +We also create a task that prepares a list of 26 characters by populating the frequency of each character. + +```{code-cell} +@task +def update_list(freq_list: list[int], list_index: int) -> list[int]: + freq_list[list_index] += 1 + return freq_list +``` + ++++ {"lines_to_next_cell": 0} + +We define a task to calculate the number of common characters between the two strings. + +```{code-cell} +@task +def derive_count(freq1: list[int], freq2: list[int]) -> int: + count = 0 + for i in range(26): + count += min(freq1[i], freq2[i]) + return count +``` + ++++ {"lines_to_next_cell": 0} + +We define a dynamic workflow to accomplish the following: + +1. Initialize an empty 26-character list to be passed to the `update_list` task +2. Iterate through each character of the first string (`s1`) and populate the frequency list +3. Iterate through each character of the second string (`s2`) and populate the frequency list +4. Determine the number of common characters by comparing the two frequency lists + +The looping process is contingent on the number of characters in both strings, which is unknown until runtime. + +```{code-cell} +@dynamic +def count_characters(s1: str, s2: str) -> int: + # s1 and s2 should be accessible + + # Initialize empty lists with 26 slots each, corresponding to every alphabet (lower and upper case) + freq1 = [0] * 26 + freq2 = [0] * 26 + + # Loop through characters in s1 + for i in range(len(s1)): + # Calculate the index for the current character in the alphabet + index = return_index(character=s1[i]) + # Update the frequency list for s1 + freq1 = update_list(freq_list=freq1, list_index=index) + # index and freq1 are not accessible as they are promises + + # looping through the string s2 + for i in range(len(s2)): + # Calculate the index for the current character in the alphabet + index = return_index(character=s2[i]) + # Update the frequency list for s2 + freq2 = update_list(freq_list=freq2, list_index=index) + # index and freq2 are not accessible as they are promises + + # Count the common characters between s1 and s2 + return derive_count(freq1=freq1, freq2=freq2) +``` + ++++ {"lines_to_next_cell": 0} + +A dynamic workflow is modeled as a task in the backend, +but the body of the function is executed to produce a workflow at run-time. +In both dynamic and static workflows, the output of tasks are promise objects. + +Propeller executes the dynamic task within its Kubernetes pod, resulting in a compiled DAG, which is then accessible in the console. +It utilizes the information acquired during the dynamic task's execution to schedule and execute each node within the dynamic task. +Visualization of the dynamic workflow's graph in the UI becomes available only after the dynamic task has completed its execution. + +When a dynamic task is executed, it generates the entire workflow as its output, termed the *futures file*. +This nomenclature reflects the anticipation that the workflow is yet to be executed, and all subsequent outputs are considered futures. + +:::{note} +Local execution works when a `@dynamic` decorator is used because Flytekit treats it as a task that runs with native Python inputs. +::: + +Define a workflow that triggers the dynamic workflow. + +```{code-cell} +@workflow +def dynamic_wf(s1: str, s2: str) -> int: + return count_characters(s1=s1, s2=s2) +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally as follows: + +```{code-cell} +:lines_to_next_cell: 2 + +if __name__ == "__main__": + print(dynamic_wf(s1="Pear", s2="Earth")) +``` + ++++ {"lines_to_next_cell": 0} + +## Why use dynamic workflows? + +### Flexibility + +Dynamic workflows streamline the process of building pipelines, offering the flexibility to design workflows +according to the unique requirements of your project. This level of adaptability is not achievable with static workflows. + +### Lower pressure on etcd + +The workflow Custom Resource Definition (CRD) and the states associated with static workflows are stored in etcd, +the Kubernetes database. This database maintains Flyte workflow CRDs as key-value pairs, tracking the status of each node's execution. + +However, there is a limitation with etcd โ€” a hard limit on data size, encompassing the workflow and node status sizes. +Consequently, it's crucial to ensure that static workflows don't excessively consume memory. + +In contrast, dynamic workflows offload the workflow specification (including node/task definitions and connections) to the blobstore. +Still, the statuses of nodes are stored in the workflow CRD within etcd. + +Dynamic workflows help alleviate some of the pressure on etcd storage space, providing a solution to mitigate storage constraints. + +## Dynamic workflows vs. map tasks + +Dynamic tasks come with overhead for large fan-out tasks as they store metadata for the entire workflow. +In contrast, {ref}`map tasks ` prove efficient for such extensive fan-out scenarios since they refrain from storing metadata, +resulting in less noticeable overhead. + +(advanced_merge_sort)= +## Merge sort + +Merge sort is a perfect example to showcase how to seamlessly achieve recursion using dynamic workflows. +Flyte imposes limitations on the depth of recursion to prevent misuse and potential impacts on the overall stability of the system. + +```{code-cell} +:lines_to_next_cell: 2 + +from typing import Tuple + +from flytekit import conditional, dynamic, task, workflow + + +@task +def split(numbers: list[int]) -> Tuple[list[int], list[int], int, int]: + return ( + numbers[0 : int(len(numbers) / 2)], + numbers[int(len(numbers) / 2) :], + int(len(numbers) / 2), + int(len(numbers)) - int(len(numbers) / 2), + ) + + +@task +def merge(sorted_list1: list[int], sorted_list2: list[int]) -> list[int]: + result = [] + while len(sorted_list1) > 0 and len(sorted_list2) > 0: + # Compare the current element of the first array with the current element of the second array. + # If the element in the first array is smaller, append it to the result and increment the first array index. + # Otherwise, do the same with the second array. + if sorted_list1[0] < sorted_list2[0]: + result.append(sorted_list1.pop(0)) + else: + result.append(sorted_list2.pop(0)) + + # Extend the result with the remaining elements from both arrays + result.extend(sorted_list1) + result.extend(sorted_list2) + + return result + + +@task +def sort_locally(numbers: list[int]) -> list[int]: + return sorted(numbers) + + +@dynamic +def merge_sort_remotely(numbers: list[int], run_local_at_count: int) -> list[int]: + split1, split2, new_count1, new_count2 = split(numbers=numbers) + sorted1 = merge_sort(numbers=split1, numbers_count=new_count1, run_local_at_count=run_local_at_count) + sorted2 = merge_sort(numbers=split2, numbers_count=new_count2, run_local_at_count=run_local_at_count) + return merge(sorted_list1=sorted1, sorted_list2=sorted2) + + +@workflow +def merge_sort(numbers: list[int], numbers_count: int, run_local_at_count: int = 5) -> list[int]: + return ( + conditional("terminal_case") + .if_(numbers_count <= run_local_at_count) + .then(sort_locally(numbers=numbers)) + .else_() + .then(merge_sort_remotely(numbers=numbers, run_local_at_count=run_local_at_count)) + ) +``` + +By simply adding the `@dynamic` annotation, the `merge_sort_remotely` function transforms into a plan of execution, +generating a Flyte workflow with four distinct nodes. These nodes run remotely on potentially different hosts, +with Flyte ensuring proper data reference passing and maintaining execution order with maximum possible parallelism. + +`@dynamic` is essential in this context because the number of times `merge_sort` needs to be triggered is unknown at compile time. +The dynamic workflow calls a static workflow, which subsequently calls the dynamic workflow again, +creating a recursive and flexible execution structure. + +## Run the example on the Flyte cluster + +To run the provided workflows on the Flyte cluster, you can use the following commands: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py \ + dynamic_wf --s1 "Pear" --s2 "Earth" +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py \ + merge_sort --numbers '[1813, 3105, 3260, 2634, 383, 7037, 3291, 2403, 315, 7164]' --numbers_count 10 +``` diff --git a/docs/user_guide/advanced_composition/eager_workflows.md b/docs/user_guide/advanced_composition/eager_workflows.md new file mode 100644 index 0000000000..c2cc1dc542 --- /dev/null +++ b/docs/user_guide/advanced_composition/eager_workflows.md @@ -0,0 +1,495 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(eager_workflows)= + +# Eager workflows + +```{eval-rst} +.. tags:: Intermediate +``` + +```{important} +This feature is experimental and the API is subject to breaking changes. +If you encounter any issues please consider submitting a +[bug report](https://github.com/flyteorg/flyte/issues/new?assignees=&labels=bug%2Cuntriaged&projects=&template=bug_report.yaml&title=%5BBUG%5D+). +``` + +So far, the two types of workflows you've seen are static workflows, which +are defined with `@workflow`-decorated functions or imperative `Workflow` class, +and dynamic workflows, which are defined with the `@dynamic` decorator. + +{ref}`Static workflows ` are created at compile time when you call `pyflyte run`, +`pyflyte register`, or `pyflyte serialize`. This means that the workflow is static +and cannot change its shape at any point: all of the variables defined as an input +to the workflow or as an output of a task or subworkflow are promises. +{ref}`Dynamic workflows `, on the other hand, are compiled +at runtime so that they can materialize the inputs of the workflow as Python values +and use them to determine the shape of the execution graph. + +In this guide you'll learn how to use eager workflows, which allow you to +create extremely flexible workflows that give you run-time access to +intermediary task/subworkflow outputs. + +## Why eager workflows? + +Both static and dynamic workflows have a key limitation: while they provide +compile-time and run-time type safety, respectively, they both suffer from +inflexibility in expressing asynchronous execution graphs that many Python +programmers may be accustomed to by using, for example, the +[asyncio](https://docs.python.org/3/library/asyncio.html) library. + +Unlike static and dynamic workflows, eager workflows allow you to use all of +the python constructs that you're familiar with via the `asyncio` API. To +understand what this looks like, let's define a very basic eager workflow +using the `@eager` decorator. + +```{code-cell} +:lines_to_next_cell: 2 + +from flytekit import task, workflow +from flytekit.experimental import eager + + +@task +def add_one(x: int) -> int: + return x + 1 + + +@task +def double(x: int) -> int: + return x * 2 + + +@eager +async def simple_eager_workflow(x: int) -> int: + out = await add_one(x=x) + if out < 0: + return -1 + return await double(x=out) +``` + ++++ {"lines_to_next_cell": 2} + +As we can see in the code above, we're defining an `async` function called +`simple_eager_workflow` that takes an integer as input and returns an integer. +By decorating this function with `@eager`, we now have the ability to invoke +tasks, static subworkflows, and even other eager subworkflows in an _eager_ +fashion such that we can materialize their outputs and use them inside the +parent eager workflow itself. + +In the `simple_eager_workflow` function, we can see that we're `await`ing +the output of the `add_one` task and assigning it to the `out` variable. If +`out` is a negative integer, the workflow will return `-1`. Otherwise, it +will double the output of `add_one` and return it. + +Unlike in static and dynamic workflows, this variable is actually +the Python integer that is the result of `x + 1` and not a promise. + +## How it works + +When you decorate a function with `@eager`, any function invoked within it +that's decorated with `@task`, `@workflow`, or `@eager` becomes +an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) +object within the lifetime of the parent eager workflow execution. Note that +this happens automatically and you don't need to use the `async` keyword when +defining a task or workflow that you want to invoke within an eager workflow. + +```{important} +With eager workflows, you basically have access to the Python `asyncio` +interface to define extremely flexible execution graphs! The trade-off is that +you lose the compile-time type safety that you get with regular static workflows +and to a lesser extent, dynamic workflows. + +We're leveraging Python's native `async` capabilities in order to: + +1. Materialize the output of flyte tasks and subworkflows so you can operate + on them without spinning up another pod and also determine the shape of the + workflow graph in an extremely flexible manner. +2. Provide an alternative way of achieving concurrency in Flyte. Flyte has + concurrency built into it, so all tasks/subworkflows will execute concurrently + assuming that they don't have any dependencies on each other. However, eager + workflows provide a python-native way of doing this, with the main downside + being that you lose the benefits of statically compiled workflows such as + compile-time analysis and first-class data lineage tracking. +``` + +Similar to {ref}`dynamic workflows `, eager workflows are +actually tasks. The main difference is that, while dynamic workflows compile +a static workflow at runtime using materialized inputs, eager workflows do +not compile any workflow at all. Instead, they use the {py:class}`~flytekit.remote.remote.FlyteRemote` +object together with Python's `asyncio` API to kick off tasks and subworkflow +executions eagerly whenever you `await` on a coroutine. This means that eager +workflows can materialize an output of a task or subworkflow and use it as a +Python object in the underlying runtime environment. We'll see how to configure +`@eager` functions to run on a remote Flyte cluster +{ref}`later in this guide `. + +## What can you do with eager workflows? + +In this section we'll cover a few of the use cases that you can accomplish +with eager workflows, some of which you can't accomplish with static or dynamic +workflows. + +### Operating on task and subworkflow outputs + +One of the biggest benefits of eager workflows is that you can now materialize +task and subworkflow outputs as Python values and do operations on them just +like you would in any other Python function. Let's look at another example: + +```{code-cell} +@eager +async def another_eager_workflow(x: int) -> int: + out = await add_one(x=x) + + # out is a Python integer + out = out - 1 + + return await double(x=out) +``` + ++++ {"lines_to_next_cell": 0} + +Since out is an actual Python integer and not a promise, we can do operations +on it at runtime, inside the eager workflow function body. This is not possible +with static or dynamic workflows. + +### Pythonic conditionals + +As you saw in the `simple_eager_workflow` workflow above, you can use regular +Python conditionals in your eager workflows. Let's look at a more complicated +example: + +```{code-cell} +:lines_to_next_cell: 2 + +@task +def gt_100(x: int) -> bool: + return x > 100 + + +@eager +async def eager_workflow_with_conditionals(x: int) -> int: + out = await add_one(x=x) + + if out < 0: + return -1 + elif await gt_100(x=out): + return 100 + else: + out = await double(x=out) + + assert out >= -1 + return out +``` + +In the above example, we're using the eager workflow's Python runtime +to check if `out` is negative, but we're also using the `gt_100` task in the +`elif` statement, which will be executed in a separate Flyte task. + +### Loops + +You can also gather the outputs of multiple tasks or subworkflows into a list: + +```{code-cell} +import asyncio + + +@eager +async def eager_workflow_with_for_loop(x: int) -> int: + outputs = [] + + for i in range(x): + outputs.append(add_one(x=i)) + + outputs = await asyncio.gather(*outputs) + return await double(x=sum(outputs)) +``` + ++++ {"lines_to_next_cell": 0} + +### Static subworkflows + +You can also invoke static workflows from within an eager workflow: + +```{code-cell} +:lines_to_next_cell: 2 + +@workflow +def subworkflow(x: int) -> int: + out = add_one(x=x) + return double(x=out) + + +@eager +async def eager_workflow_with_static_subworkflow(x: int) -> int: + out = await subworkflow(x=x) + assert out == (x + 1) * 2 + return out +``` + ++++ {"lines_to_next_cell": 0} + +### Eager subworkflows + +You can have nest eager subworkflows inside a parent eager workflow: + +```{code-cell} +:lines_to_next_cell: 2 + +@eager +async def eager_subworkflow(x: int) -> int: + return await add_one(x=x) + + +@eager +async def nested_eager_workflow(x: int) -> int: + out = await eager_subworkflow(x=x) + return await double(x=out) +``` + ++++ {"lines_to_next_cell": 0} + +### Catching exceptions + +You can also catch exceptions in eager workflows through `EagerException`: + +```{code-cell} +:lines_to_next_cell: 2 + +from flytekit.experimental import EagerException + + +@task +def raises_exc(x: int) -> int: + if x <= 0: + raise TypeError + return x + + +@eager +async def eager_workflow_with_exception(x: int) -> int: + try: + return await raises_exc(x=x) + except EagerException: + return -1 +``` + +Even though the `raises_exc` exception task raises a `TypeError`, the +`eager_workflow_with_exception` runtime will raise an `EagerException` and +you'll need to specify `EagerException` as the exception type in your `try... except` +block. + +```{note} +This is a current limitation in the `@eager` workflow implementation. +```` + +## Executing eager workflows + +As with most Flyte constructs, you can execute eager workflows both locally +and remotely. + +### Local execution + +You can execute eager workflows locally by simply calling them like a regular +`async` function: + +```{code-cell} +if __name__ == "__main__": + result = asyncio.run(simple_eager_workflow(x=5)) + print(f"Result: {result}") # "Result: 12" +``` + +This just uses the `asyncio.run` function to execute the eager workflow just +like any other Python async code. This is useful for local debugging as you're +developing your workflows and tasks. + +(eager_workflows_remote)= + +### Remote Flyte cluster execution + +Under the hood, `@eager` workflows use the {py:class}`~flytekit.remote.remote.FlyteRemote` +object to kick off task, static workflow, and eager workflow executions. + +In order to actually execute them on a Flyte cluster, you'll need to configure +eager workflows with a `FlyteRemote` object and secrets configuration that +allows you to authenticate into the cluster via a client secret key. + +```{code-block} python +from flytekit.remote import FlyteRemote +from flytekit.configuration import Config + +@eager( + remote=FlyteRemote( + config=Config.auto(config_file="config.yaml"), + default_project="flytesnacks", + default_domain="development", + ), + client_secret_group="", + client_secret_key="", +) +async def eager_workflow_remote(x: int) -> int: + ... +``` + ++++ + +Where `config.yaml` contains a +[flytectl](https://docs.flyte.org/projects/flytectl/en/latest/#configuration)-compatible +config file and `my_client_secret_group` and `my_client_secret_key` are the +{ref}`secret group and key ` that you've configured for your Flyte +cluster to authenticate via a client key. + ++++ + +### Sandbox Flyte cluster execution + +When using a sandbox cluster started with `flytectl demo start`, however, the +`client_secret_group` and `client_secret_key` are not required, since the +default sandbox configuration does not require key-based authentication. + +```{code-cell} +:lines_to_next_cell: 2 + +from flytekit.configuration import Config +from flytekit.remote import FlyteRemote + + +@eager( + remote=FlyteRemote( + config=Config.for_sandbox(), + default_project="flytesnacks", + default_domain="development", + ) +) +async def eager_workflow_sandbox(x: int) -> int: + out = await add_one(x=x) + if out < 0: + return -1 + return await double(x=out) +``` + +```{important} +When executing eager workflows on a remote Flyte cluster, it will execute the +latest version of tasks, static workflows, and eager workflows that are on +the `default_project` and `default_domain` as specified in the `FlyteRemote` +object. This means that you need to pre-register all Flyte entities that are +invoked inside of the eager workflow. +``` + +### Registering and running + +Assuming that your `flytekit` code is configured correctly, you will need to +register all of the task and subworkflows that are used with your eager +workflow with `pyflyte register`: + +```{prompt} bash +pyflyte --config register \ + --project \ + --domain \ + --image \ + path/to/eager_workflows.py +``` + +And then run it with `pyflyte run`: + +```{prompt} bash +pyflyte --config run \ + --project \ + --domain \ + --image \ + path/to/eager_workflows.py simple_eager_workflow --x 10 +``` + +```{note} +You need to register the tasks/workflows associated with your eager workflow +because eager workflows are actually flyte tasks under the hood, which means +that `pyflyte run` has no way of knowing what tasks and subworkflows are +invoked inside of it. +``` + +## Eager workflows on Flyte console + +Since eager workflows are an experimental feature, there is currently no +first-class representation of them on Flyte Console, the UI for Flyte. +When you register an eager workflow, you'll be able to see it in the task view: + +:::{figure} https://github.com/flyteorg/static-resources/blob/main/flytesnacks/user_guide/flyte_eager_workflow_ui_view.png?raw=true +:alt: Eager Workflow UI View +:class: with-shadow +::: + +When you execute an eager workflow, the tasks and subworkflows invoked within +it **won't show up** on the node, graph, or timeline view. As mentioned above, +this is because eager workflows are actually Flyte tasks under the hood and +Flyte has no way of knowing the shape of the execution graph before actually +executing them. + +:::{figure} https://github.com/flyteorg/static-resources/blob/main/flytesnacks/user_guide/flyte_eager_workflow_execution.png?raw=true +:alt: Eager Workflow Execution +:class: with-shadow +::: + +However, at the end of execution, you'll be able to use {ref}`Flyte Decks ` +to see a list of all the tasks and subworkflows that were executed within the +eager workflow: + +:::{figure} https://github.com/flyteorg/static-resources/blob/main/flytesnacks/user_guide/flyte_eager_workflow_deck.png?raw=true +:alt: Eager Workflow Deck +:class: with-shadow +::: + +## Limitations + +As this feature is still experimental, there are a few limitations that you +need to keep in mind: + +- You cannot invoke {ref}`dynamic workflows `, + {ref}`map tasks `, or {ref}`launch plans ` inside an + eager workflow. +- [Context managers](https://docs.python.org/3/library/contextlib.html) will + only work on locally executed functions within the eager workflow, i.e. using a + context manager to modify the behavior of a task or subworkflow will not work + because they are executed on a completely different pod. +- All exceptions raised by Flyte tasks or workflows will be caught and raised + as an {py:class}`~flytekit.experimental.EagerException` at runtime. +- All task/subworkflow outputs are materialized as Python values, which includes + offloaded types like `FlyteFile`, `FlyteDirectory`, `StructuredDataset`, and + `pandas.DataFrame` will be fully downloaded into the pod running the eager workflow. + This prevents you from incrementally downloading or streaming very large datasets + in eager workflows. +- Flyte entities that are invoked inside of an eager workflow must be registered + under the same project and domain as the eager workflow itself. The eager + workflow will execute the latest version of these entities. +- Flyte console currently does not have a first-class way of viewing eager + workflows, but it can be accessed via the task list view and the execution + graph is viewable via Flyte Decks. + +## Summary of workflows + +Eager workflows are a powerful new construct that trades-off compile-time type +safety for flexibility in the shape of the execution graph. The table below +will help you to reason about the different workflow constructs in Flyte in terms +of promises and materialized values: + +| Construct | Description | Flyte Promises | Pro | Con | +|--------|--------|--------|----|----| +| `@workflow` | Compiled at compile-time | All inputs and intermediary outputs are promises | Type errors caught at compile-time | Constrained by Flyte DSL | +| `@dynamic` | Compiled at run-time | Inputs are materialized, but outputs of all Flyte entities are Promises | More flexible than `@workflow`, e.g. can do Python operations on inputs | Can't use a lot of Python constructs (e.g. try/except) | +| `@eager` | Never compiled | Everything is materialized! | Can effectively use all Python constructs via `asyncio` syntax | No compile-time benefits, this is the wild west ๐Ÿœ | diff --git a/docs/user_guide/advanced_composition/index.md b/docs/user_guide/advanced_composition/index.md new file mode 100644 index 0000000000..26eb8df33c --- /dev/null +++ b/docs/user_guide/advanced_composition/index.md @@ -0,0 +1,24 @@ +(advanced_composition)= + +# Advanced composition + +This section of the user guide introduces the advanced features of the Flytekit Python SDK. +These examples cover more complex aspects of Flyte, including conditions, subworkflows, +dynamic workflows, map tasks, gate nodes and more. + +```{toctree} +:maxdepth: -1 +:name: advanced_composition_toc +:hidden: + +conditionals +chaining_flyte_entities +subworkflows +dynamic_workflows +map_tasks +eager_workflows +decorating_tasks +decorating_workflows +intratask_checkpoints +waiting_for_external_inputs +``` diff --git a/docs/user_guide/advanced_composition/intratask_checkpoints.md b/docs/user_guide/advanced_composition/intratask_checkpoints.md new file mode 100644 index 0000000000..703279abcb --- /dev/null +++ b/docs/user_guide/advanced_composition/intratask_checkpoints.md @@ -0,0 +1,137 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +# Intratask checkpoints + +```{eval-rst} +.. tags:: MachineLearning, Intermediate +``` + +A checkpoint in Flyte serves to recover a task from a previous failure by preserving the task's state before the failure +and resuming from the latest recorded state. + +## Why intratask checkpoints? + +The inherent design of Flyte, being a workflow engine, allows users to break down operations, programs or ideas +into smaller tasks within workflows. In the event of a task failure, the workflow doesn't need to rerun the +previously completed tasks. Instead, it can retry the specific task that encountered an issue. +Once the problematic task succeeds, it won't be rerun. Consequently, the natural boundaries between tasks act as implicit checkpoints. + +However, there are scenarios where breaking a task into smaller tasks is either challenging or undesirable due to the associated overhead. +This is especially true when running a substantial computation in a tight loop. +In such cases, users may consider splitting each loop iteration into individual tasks using dynamic workflows. +Yet, the overhead of spawning new tasks, recording intermediate results, and reconstructing the state can incur additional expenses. + +### Use case: Model training + +An exemplary scenario illustrating the utility of intra-task checkpointing is during model training. +In situations where executing multiple epochs or iterations with the same dataset might be time-consuming, +setting task boundaries can incur a high bootstrap time and be costly. + +Flyte addresses this challenge by providing a mechanism to checkpoint progress within a task execution, +saving it as a file or set of files. In the event of a failure, the checkpoint file can be re-read to +resume most of the state without rerunning the entire task. +This feature opens up possibilities to leverage alternate, more cost-effective compute systems, +such as [AWS spot instances](https://aws.amazon.com/ec2/spot/), +[GCP pre-emptible instances](https://cloud.google.com/compute/docs/instances/preemptible) and others. + +These instances offer great performance at significantly lower price points compared to their on-demand or reserved counterparts. +This becomes feasible when tasks are constructed in a fault-tolerant manner. +For tasks running within a short duration, e.g., less than 10 minutes, the likelihood of failure is negligible, +and task-boundary-based recovery provides substantial fault tolerance for successful completion. + +However, as the task execution time increases, the cost of re-running it also increases, +reducing the chances of successful completion. This is precisely where Flyte's intra-task checkpointing proves to be highly beneficial. + +Here's an example illustrating how to develop tasks that leverage intra-task checkpointing. +It's important to note that Flyte currently offers the low-level API for checkpointing. +Future integrations aim to incorporate higher-level checkpointing APIs from popular training frameworks +like Keras, PyTorch, Scikit-learn, and big-data frameworks such as Spark and Flink, enhancing their fault-tolerance capabilities. + +To begin, import the necessary libraries and set the number of task retries to `3`. + +```{code-cell} +from flytekit import current_context, task, workflow +from flytekit.exceptions.user import FlyteRecoverableException + +RETRIES = 3 +``` + ++++ {"lines_to_next_cell": 0} + +We define a task to iterate precisely `n_iterations`, checkpoint its state, and recover from simulated failures. + +```{code-cell} +@task(retries=RETRIES) +def use_checkpoint(n_iterations: int) -> int: + cp = current_context().checkpoint + prev = cp.read() + + start = 0 + if prev: + start = int(prev.decode()) + + # Create a failure interval to simulate failures across 'n' iterations and then succeed after configured retries + failure_interval = n_iterations // RETRIES + index = 0 + for index in range(start, n_iterations): + # Simulate a deterministic failure for demonstration. Showcasing how it eventually completes within the given retries + if index > start and index % failure_interval == 0: + raise FlyteRecoverableException(f"Failed at iteration {index}, failure_interval {failure_interval}.") + # Save progress state. It is also entirely possible to save state every few intervals + cp.write(f"{index + 1}".encode()) + return index +``` + ++++ {"lines_to_next_cell": 0} + +The checkpoint system offers additional APIs, documented in the code accessible at +[checkpointer code](https://github.com/flyteorg/flytekit/blob/master/flytekit/core/checkpointer.py). + +Create a workflow that invokes the task. +The task will automatically undergo retries in the event of a {ref}`FlyteRecoverableException `. + +```{code-cell} +@workflow +def checkpointing_example(n_iterations: int) -> int: + return use_checkpoint(n_iterations=n_iterations) +``` + ++++ {"lines_to_next_cell": 0} + +The local checkpoint is not utilized here because retries are not supported. + +```{code-cell} +if __name__ == "__main__": + try: + checkpointing_example(n_iterations=10) + except RuntimeError as e: # noqa : F841 + # Since no retries are performed, an exception is expected when run locally + pass +``` + +## Run the example on the Flyte cluster + +To run the provided workflow on the Flyte cluster, use the following command: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/checkpoint.py \ + checkpointing_example --n_iterations 10 +``` diff --git a/docs/user_guide/advanced_composition/map_tasks.md b/docs/user_guide/advanced_composition/map_tasks.md new file mode 100644 index 0000000000..6449b6d124 --- /dev/null +++ b/docs/user_guide/advanced_composition/map_tasks.md @@ -0,0 +1,278 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(map_task)= + +# Map tasks + +```{eval-rst} +.. tags:: Intermediate +``` + +Using a map task in Flyte allows for the execution of a pod task or a regular task across a series of inputs within a single workflow node. +This capability eliminates the need to create individual nodes for each instance, leading to substantial performance improvements. + +Map tasks find utility in diverse scenarios, such as: + +1. Executing the same code logic on multiple inputs +2. Concurrent processing of multiple data batches +3. Hyperparameter optimization + +The following examples demonstrate how to use map tasks with both single and multiple inputs. + +To begin, import the required libraries. + +```{code-cell} +from flytekit import map_task, task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +Here's a simple workflow that uses {py:func}`map_task `. + +```{code-cell} +threshold = 11 + + +@task +def detect_anomalies(data_point: int) -> bool: + return data_point > threshold + + +@workflow +def map_workflow(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10]) -> list[bool]: + # Use the map task to apply the anomaly detection function to each data point + return map_task(detect_anomalies)(data_point=data) + + +if __name__ == "__main__": + print(f"Anomalies Detected: {map_workflow()}") +``` + ++++ {"lines_to_next_cell": 0} + +To customize resource allocations, such as memory usage for individual map tasks, +you can leverage `with_overrides`. Here's an example using the `detect_anomalies` map task within a workflow: + +```python +from flytekit import Resources + + +@workflow +def map_workflow_with_resource_overrides(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10]) -> list[bool]: + return map_task(detect_anomalies)(data_point=data).with_overrides(requests=Resources(mem="2Gi")) +``` + +You can use {py:class}`~flytekit.TaskMetadata` to set attributes such as `cache`, `cache_version`, `interruptible`, `retries` and `timeout`. +```python +from flytekit import TaskMetadata + + +@workflow +def map_workflow_with_metadata(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10]) -> list[bool]: + return map_task(detect_anomalies, metadata=TaskMetadata(cache=True, cache_version="0.1", retries=1))( + data_point=data + ) +``` + +You can also configure `concurrency` and `min_success_ratio` for a map task: +- `concurrency` limits the number of mapped tasks that can run in parallel to the specified batch size. +If the input size exceeds the concurrency value, multiple batches will run serially until all inputs are processed. +If left unspecified, it implies unbounded concurrency. +- `min_success_ratio` determines the minimum fraction of total jobs that must complete successfully before terminating +the map task and marking it as successful. + +```python +@workflow +def map_workflow_with_additional_params(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10]) -> list[bool]: + return map_task(detect_anomalies, concurrency=1, min_success_ratio=0.75)(data_point=data) +``` + +A map task internally uses a compression algorithm (bitsets) to handle every Flyte workflow nodeโ€™s metadata, +which would have otherwise been in the order of 100s of bytes. + +When defining a map task, avoid calling other tasks in it. Flyte +can't accurately register tasks that call other tasks. While Flyte +will correctly execute a task that calls other tasks, it will not be +able to give full performance advantages. This is +especially true for map tasks. + +In this example, the map task `suboptimal_mappable_task` would not +give you the best performance. + +```{code-cell} +@task +def upperhalf(a: int) -> int: + return a / 2 + 1 + + +@task +def suboptimal_mappable_task(a: int) -> str: + inc = upperhalf(a=a) + stringified = str(inc) + return stringified +``` + ++++ {"lines_to_next_cell": 0} + +By default, the map task utilizes the Kubernetes array plugin for execution. +However, map tasks can also be run on alternate execution backends. +For example, you can configure the map task to run on +[AWS Batch](https://docs.flyte.org/en/latest/deployment/plugin_setup/aws/batch.html#deployment-plugin-setup-aws-array), +a provisioned service that offers scalability for handling large-scale tasks. + +## Map a task with multiple inputs + +You might need to map a task with multiple inputs. + +For instance, consider a task that requires three inputs. + +```{code-cell} +@task +def multi_input_task(quantity: int, price: float, shipping: float) -> float: + return quantity * price * shipping +``` + ++++ {"lines_to_next_cell": 0} + +You may want to map this task with only the ``quantity`` input, while keeping the other inputs unchanged. +Since a map task accepts only one input, you can achieve this by partially binding values to the map task. +This can be done using the {py:func}`functools.partial` function. + +```{code-cell} +import functools + + +@workflow +def multiple_inputs_map_workflow(list_q: list[int] = [1, 2, 3, 4, 5], p: float = 6.0, s: float = 7.0) -> list[float]: + partial_task = functools.partial(multi_input_task, price=p, shipping=s) + return map_task(partial_task)(quantity=list_q) +``` + ++++ {"lines_to_next_cell": 0} + +Another possibility is to bind the outputs of a task to partials. + +```{code-cell} +@task +def get_price() -> float: + return 7.0 + + +@workflow +def map_workflow_partial_with_task_output(list_q: list[int] = [1, 2, 3, 4, 5], s: float = 6.0) -> list[float]: + p = get_price() + partial_task = functools.partial(multi_input_task, price=p, shipping=s) + return map_task(partial_task)(quantity=list_q) +``` + ++++ {"lines_to_next_cell": 0} + +You can also provide multiple lists as input to a ``map_task``. + +```{code-cell} +:lines_to_next_cell: 2 + +@workflow +def map_workflow_with_lists( + list_q: list[int] = [1, 2, 3, 4, 5], list_p: list[float] = [6.0, 9.0, 8.7, 6.5, 1.2], s: float = 6.0 +) -> list[float]: + partial_task = functools.partial(multi_input_task, shipping=s) + return map_task(partial_task)(quantity=list_q, price=list_p) +``` + +```{note} +It is important to note that you cannot provide a list as an input to a partial task. +``` + +## Run the example on the Flyte cluster + +To run the provided workflows on the Flyte cluster, use the following commands: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py \ + map_workflow +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py \ + map_workflow_with_additional_params +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py \ + multiple_inputs_map_workflow +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py \ + map_workflow_partial_with_task_output +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py \ + map_workflow_with_lists +``` + +## ArrayNode + +:::{important} +This feature is experimental and the API is subject to breaking changes. +If you encounter any issues please consider submitting a +[bug report](https://github.com/flyteorg/flyte/issues/new?assignees=&labels=bug%2Cuntriaged&projects=&template=bug_report.yaml&title=%5BBUG%5D+). +::: + +ArrayNode map tasks serve as a seamless substitution for regular map tasks, differing solely in the submodule +utilized to import the `map_task` function. Specifically, you will need to import `map_task` from the experimental module as illustrated below: + +```python +from flytekit import task, workflow +from flytekit.experimental import map_task + +@task +def t(a: int) -> int: + ... + +@workflow +def array_node_wf(xs: list[int]) -> list[int]: + return map_task(t)(a=xs) +``` + +Flyte introduces map task to enable parallelization of homogeneous operations, +offering efficient evaluation and a user-friendly API. Because itโ€™s implemented as a backend plugin, +its evaluation is independent of core Flyte logic, which generates subtask executions that lack full Flyte functionality. +ArrayNode tackles this issue by offering robust support for subtask executions. +It also extends mapping capabilities across all plugins and Flyte node types. +This enhancement will be a part of our move from the experimental phase to general availability. + +In contrast to map tasks, an ArrayNode provides the following enhancements: + +- **Wider mapping support**. ArrayNode extends mapping capabilities beyond Kubernetes tasks, encompassing tasks such as Python tasks, container tasks and pod tasks. +- **Cache management**. It supports both cache serialization and cache overwriting for subtask executions. +- **Intra-task checkpointing**. ArrayNode enables intra-task checkpointing, contributing to improved execution reliability. +- **Workflow recovery**. Subtasks remain recoverable during the workflow recovery process. (This is a work in progress.) +- **Subtask failure handling**. The mechanism handles subtask failures effectively, ensuring that running subtasks are appropriately aborted. +- **Multiple input values**. Subtasks can be defined with multiple input values, enhancing their versatility. + +We expect the performance of ArrayNode map tasks to compare closely to standard map tasks. diff --git a/docs/user_guide/advanced_composition/subworkflows.md b/docs/user_guide/advanced_composition/subworkflows.md new file mode 100644 index 0000000000..59826aa491 --- /dev/null +++ b/docs/user_guide/advanced_composition/subworkflows.md @@ -0,0 +1,182 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(subworkflow)= + +# Subworkflows + +```{eval-rst} +.. tags:: Intermediate +``` + +Subworkflows share similarities with {ref}`launch plans `, as both enable users to initiate one workflow from within another. +The distinction lies in the analogy: think of launch plans as "pass by pointer" and subworkflows as "pass by value." + +## When to use subworkflows? + +Subworkflows offer an elegant solution for managing parallelism between a workflow and its launched sub-flows, +as they execute within the same context as the parent workflow. +Consequently, all nodes of a subworkflow adhere to the overall constraints imposed by the parent workflow. + +Consider this scenario: when workflow `A` is integrated as a subworkflow of workflow `B`, +running workflow `B` results in the entire graph of workflow `A` being duplicated into workflow `B` at the point of invocation. + +Here's an example illustrating the calculation of slope, intercept and the corresponding y-value. + +```{code-cell} +from flytekit import task, workflow + + +@task +def slope(x: list[int], y: list[int]) -> float: + sum_xy = sum([x[i] * y[i] for i in range(len(x))]) + sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) + n = len(x) + return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) + + +@task +def intercept(x: list[int], y: list[int], slope: float) -> float: + mean_x = sum(x) / len(x) + mean_y = sum(y) / len(y) + intercept = mean_y - slope * mean_x + return intercept + + +@workflow +def slope_intercept_wf(x: list[int], y: list[int]) -> (float, float): + slope_value = slope(x=x, y=y) + intercept_value = intercept(x=x, y=y, slope=slope_value) + return (slope_value, intercept_value) + + +@task +def regression_line(val: int, slope_value: float, intercept_value: float) -> float: + return (slope_value * val) + intercept_value # y = mx + c + + +@workflow +def regression_line_wf(val: int = 5, x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> float: + slope_value, intercept_value = slope_intercept_wf(x=x, y=y) + return regression_line(val=val, slope_value=slope_value, intercept_value=intercept_value) +``` + ++++ {"lines_to_next_cell": 0} + +The `slope_intercept_wf` computes the slope and intercept of the regression line. +Subsequently, the `regression_line_wf` triggers `slope_intercept_wf` and then computes the y-value. + +To execute the workflow locally, use the following: + +```{code-cell} +if __name__ == "__main__": + print(f"Executing regression_line_wf(): {regression_line_wf()}") +``` + ++++ {"lines_to_next_cell": 0} + +It's possible to nest a workflow that contains a subworkflow within another workflow. +Workflows can be easily constructed from other workflows, even if they function as standalone entities. +Each workflow in this module has the capability to exist and run independently. + +```{code-cell} +@workflow +def nested_regression_line_wf() -> float: + return regression_line_wf() +``` + ++++ {"lines_to_next_cell": 0} + +You can run the nested workflow locally as well. + +```{code-cell} +if __name__ == "__main__": + print(f"Running nested_regression_line_wf(): {nested_regression_line_wf()}") +``` + ++++ {"lines_to_next_cell": 0} + +## External workflow + +When launch plans are employed within a workflow to initiate the execution of a pre-defined workflow, +a new external execution is triggered. This results in a distinct execution ID and can be identified +as a separate entity. + +These external invocations of a workflow, initiated using launch plans from a parent workflow, +are termed as external workflows. They may have separate parallelism constraints since the context is not shared. + +:::{tip} +If your deployment uses {ref}`multiple Kubernetes clusters `, +external workflows may offer a way to distribute the workload of a workflow across multiple clusters. +::: + +Here's an example that illustrates the concept of external workflows: + +```{code-cell} + +from flytekit import LaunchPlan + +launch_plan = LaunchPlan.get_or_create( + regression_line_wf, "regression_line_workflow", default_inputs={"val": 7, "x": [-3, 0, 3], "y": [7, 4, -2]} +) + + +@workflow +def nested_regression_line_lp() -> float: + # Trigger launch plan from within a workflow + return launch_plan() +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_external_workflow_execution.png +:alt: External workflow execution +:class: with-shadow +::: + +In the console screenshot above, note that the launch plan execution ID differs from that of the workflow. + +You can run a workflow containing an external workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + print(f"Running nested_regression_line_lp(): {nested_regression_line_lp}") +``` + +## Run the example on a Flyte cluster + +To run the provided workflows on a Flyte cluster, use the following commands: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py \ + regression_line_wf +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py \ + nested_regression_line_wf +``` + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py \ + nested_regression_line_lp +``` diff --git a/docs/user_guide/advanced_composition/waiting_for_external_inputs.md b/docs/user_guide/advanced_composition/waiting_for_external_inputs.md new file mode 100644 index 0000000000..d694b62443 --- /dev/null +++ b/docs/user_guide/advanced_composition/waiting_for_external_inputs.md @@ -0,0 +1,314 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +# Waiting for external inputs + +*New in Flyte 1.3.0* + +There are use cases where you may want a workflow execution to pause, only to continue +when some time has passed or when it receives some inputs that are external to +the workflow execution inputs. You can think of these as execution-time inputs, +since they need to be supplied to the workflow after it's launched. Examples of +this use case would be: + +1. **Model Deployment**: A hyperparameter-tuning workflow that + trains `n` models, where a human needs to inspect a report before approving + the model for downstream deployment to some serving layer. +2. **Data Labeling**: A workflow that iterates through an image dataset, + presenting individual images to a human annotator for them to label. +3. **Active Learning**: An [active learning]() + workflow that trains a model, shows examples for a human annotator to label + based which examples it's least/most certain about or would provide the most + information to the model. + +These use cases can be achieved in Flyte with the {func}`~flytekit.sleep`, +{func}`~flytekit.wait_for_input`, and {func}`~flytekit.approve` workflow nodes. +Although all of the examples above are human-in-the-loop processes, these +constructs allow you to pass inputs into a workflow from some arbitrary external +process (๐Ÿ‘ฉ human or ๐Ÿค– machine) in order to continue. + +:::{important} +These functions can only be used inside {func}`@workflow `-decorated +functions, {func}`@dynamic `-decorated functions, or +{ref}`imperative workflows `. +::: + +## Pause executions with the `sleep` node + +The simplest case is when you want your workflow to {py:func}`~flytekit.sleep` +for some specified amount of time before continuing. + +Though this type of node may not be used often in a production setting, +you might want to use it, for example, if you want to simulate a delay in +your workflow to mock out the behavior of some long-running computation. + +```{code-cell} +from datetime import timedelta + +from flytekit import sleep, task, workflow + + +@task +def long_running_computation(num: int) -> int: + """A mock task pretending to be a long-running computation.""" + return num + + +@workflow +def sleep_wf(num: int) -> int: + """Simulate a "long-running" computation with sleep.""" + + # increase the sleep duration to actually make it long-running + sleeping = sleep(timedelta(seconds=10)) + result = long_running_computation(num=num) + sleeping >> result + return result +``` + ++++ {"lines_to_next_cell": 0} + +As you can see above, we define a simple `add_one` task and a `sleep_wf` +workflow. We first create a `sleeping` and `result` node, then +order the dependencies with the `>>` operator such that the workflow sleeps +for 10 seconds before kicking off the `result` computation. Finally, we +return the `result`. + +:::{note} +You can learn more about the `>>` chaining operator +{ref}`here `. +::: + +Now that you have a general sense of how this works, let's move onto the +{func}`~flytekit.wait_for_input` workflow node. + +## Supply external inputs with `wait_for_input` + +With the {py:func}`~flytekit.wait_for_input` node, you can pause a +workflow execution that requires some external input signal. For example, +suppose that you have a workflow that publishes an automated analytics report, +but before publishing it you want to give it a custom title. You can achieve +this by defining a `wait_for_input` node that takes a `str` input and +finalizes the report: + +```{code-cell} +import typing + +from flytekit import wait_for_input + + +@task +def create_report(data: typing.List[float]) -> dict: # o0 + """A toy report task.""" + return { + "mean": sum(data) / len(data), + "length": len(data), + "max": max(data), + "min": min(data), + } + + +@task +def finalize_report(report: dict, title: str) -> dict: + return {"title": title, **report} + + +@workflow +def reporting_wf(data: typing.List[float]) -> dict: + report = create_report(data=data) + title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) + return finalize_report(report=report, title=title_input) +``` + +Let's breakdown what's happening in the code above: + +- In `reporting_wf` we first create the raw `report` +- Then, we define a `title` node that will wait for a string to be provided + through the Flyte API, which can be done through the Flyte UI or through + `FlyteRemote` (more on that later). This node will time out after 1 hour. +- Finally, we pass the `title_input` promise into `finalize_report`, which + attaches the custom title to the report. + +:::{note} +The `create_report` task is just toy example. In a realistic example, this +report might be an html file or set of visualizations. This can be rendered +in the Flyte UI with {ref}`Flyte Decks `. +::: + +As mentioned in the beginning of this page, this construct can be used for +selecting the best-performing model in cases where there isn't a clear single +metric to determine the best model, or if you're doing data labeling using +a Flyte workflow. + +## Continue executions with `approve` + +Finally, the {py:func}`~flytekit.approve` workflow node allows you to wait on +an explicit approval signal before continuing execution. Going back to our +report-publishing use case, suppose that we want to block the publishing of +a report for some reason (e.g. if they don't appear to be valid): + +```{code-cell} +from flytekit import approve + + +@workflow +def reporting_with_approval_wf(data: typing.List[float]) -> dict: + report = create_report(data=data) + title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) + final_report = finalize_report(report=report, title=title_input) + + # approve the final report, where the output of approve is the final_report + # dictionary. + return approve(final_report, "approve-final-report", timeout=timedelta(hours=2)) +``` + ++++ {"lines_to_next_cell": 0} + +The `approve` node will pass the `final_report` promise through as the +output of the workflow, provided that the `approve-final-report` gets an +approval input via the Flyte UI or Flyte API. + +You can also use the output of the `approve` function as a promise, feeding +it to a subsequent task. Let's create a version of our report-publishing +workflow where the approval happens after `create_report`: + +```{code-cell} +@workflow +def approval_as_promise_wf(data: typing.List[float]) -> dict: + report = create_report(data=data) + title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) + + # wait for report to run so that the user can view it before adding a custom + # title to the report + report >> title_input + + final_report = finalize_report( + report=approve(report, "raw-report-approval", timeout=timedelta(hours=2)), + title=title_input, + ) + return final_report +``` + ++++ {"lines_to_next_cell": 0} + +## Working with conditionals + +The node constructs by themselves are useful, but they become even more +useful when we combine them with other Flyte constructs, like {ref}`conditionals `. + +To illustrate this, let's extend the report-publishing use case so that we +produce an "invalid report" output in case we don't approve the final report: + +```{code-cell} +:lines_to_next_cell: 2 + +from flytekit import conditional + + +@task +def invalid_report() -> dict: + return {"invalid_report": True} + + +@workflow +def conditional_wf(data: typing.List[float]) -> dict: + report = create_report(data=data) + title_input = wait_for_input("title-input", timeout=timedelta(hours=1), expected_type=str) + + # Define a "review-passes" wait_for_input node so that a human can review + # the report before finalizing it. + review_passed = wait_for_input("review-passes", timeout=timedelta(hours=2), expected_type=bool) + report >> review_passed + + # This conditional returns the finalized report if the review passes, + # otherwise it returns an invalid report output. + return ( + conditional("final-report-condition") + .if_(review_passed.is_true()) + .then(finalize_report(report=report, title=title_input)) + .else_() + .then(invalid_report()) + ) +``` + +On top of the `approved` node, which we use in the `conditional` to +determine which branch to execute, we also define a `disapprove_reason` +gate node, which will be used as an input to the `invalid_report` task. + +## Sending inputs to `wait_for_input` and `approve` nodes + +Assuming that you've registered the above workflows on a Flyte cluster that's +been started with {ref}`flytectl demo start `, +there are two ways of using `wait_for_input` and `approve` nodes: + +### Using the Flyte UI + +If you launch the `reporting_wf` workflow on the Flyte UI, you'll see a +**Graph** view of the workflow execution like this: + +```{image} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/wait_for_input_graph.png +:alt: reporting workflow wait for input graph +``` + +Clicking on the {fa}`play-circle,style=far` icon of the `title` task node or the +**Resume** button on the sidebar will create a modal form that you can use to +provide the custom title input. + +```{image} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/wait_for_input_form.png +:alt: reporting workflow wait for input form +``` + +### Using `FlyteRemote` + +For many cases it's enough to use Flyte UI to provide inputs/approvals on +gate nodes. However, if you want to pass inputs to `wait_for_input` and +`approve` nodes programmatically, you can use the +{py:meth}`FlyteRemote.set_signal ` +method. Using the `gate_node_with_conditional_wf` workflow, the example +below allows you to set values for `title-input` and `review-passes` nodes. + +```python +import typing +from flytekit.remote.remote import FlyteRemote +from flytekit.configuration import Config + +remote = FlyteRemote( + Config.for_sandbox(), + default_project="flytesnacks", + default_domain="development", +) + +# First kick off the wotrkflow +flyte_workflow = remote.fetch_workflow( + name="core.control_flow.waiting_for_external_inputs.conditional_wf" +) + +# Execute the workflow +execution = remote.execute(flyte_workflow, inputs={"data": [1.0, 2.0, 3.0, 4.0, 5.0]}) + +# Get a list of signals available for the execution +signals = remote.list_signals(execution.id.name) + +# Set a signal value for the "title" node. Make sure that the "title-input" +# node is in the `signals` list above +remote.set_signal("title-input", execution.id.name, "my report") + +# Set signal value for the "review-passes" node. Make sure that the "review-passes" +# node is in the `signals` list above +remote.set_signal("review-passes", execution.id.name, True) +``` diff --git a/docs/user_guide/basics/documenting_workflows.md b/docs/user_guide/basics/documenting_workflows.md new file mode 100644 index 0000000000..d6a561c532 --- /dev/null +++ b/docs/user_guide/basics/documenting_workflows.md @@ -0,0 +1,157 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +# Documenting workflows + +```{eval-rst} +.. tags:: Basic +``` + +Well-documented code significantly improves code readability. +Flyte enables the use of docstrings to document your code. +Docstrings are stored in [FlyteAdmin](https://docs.flyte.org/en/latest/concepts/admin.html) +and displayed on the UI. + +To begin, import the relevant libraries. + +```{code-cell} +from typing import Tuple + +from flytekit import workflow +``` + ++++ {"lines_to_next_cell": 0} + +We import the `slope` and `intercept` tasks from the `workflow.py` file. + +```{code-cell} +from .workflow import intercept, slope +``` + ++++ {"lines_to_next_cell": 0} + +## Sphinx-style docstring + +An example to demonstrate Sphinx-style docstring. + +The initial section of the docstring provides a concise overview of the workflow. +The subsequent section provides a comprehensive explanation. +The last part of the docstring outlines the parameters and return type. + +```{code-cell} +@workflow +def sphinx_docstring_wf(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> Tuple[float, float]: + """ + Slope and intercept of a regression line + + This workflow accepts a list of coefficient pairs for a regression line. + It calculates both the slope and intercept of the regression line. + + :param x: List of x-coefficients + :param y: List of y-coefficients + :return: Slope and intercept values + """ + slope_value = slope(x=x, y=y) + intercept_value = intercept(x=x, y=y, slope=slope_value) + return slope_value, intercept_value +``` + ++++ {"lines_to_next_cell": 0} + +## NumPy-style docstring + +An example to demonstrate NumPy-style docstring. + +The first part of the docstring provides a concise overview of the workflow. +The next section offers a comprehensive description. +The third section of the docstring details all parameters along with their respective data types. +The final section of the docstring explains the return type and its associated data type. + +```{code-cell} +@workflow +def numpy_docstring_wf(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> Tuple[float, float]: + """ + Slope and intercept of a regression line + + This workflow accepts a list of coefficient pairs for a regression line. + It calculates both the slope and intercept of the regression line. + + Parameters + ---------- + x : list[int] + List of x-coefficients + y : list[int] + List of y-coefficients + + Returns + ------- + out : Tuple[float, float] + Slope and intercept values + """ + slope_value = slope(x=x, y=y) + intercept_value = intercept(x=x, y=y, slope=slope_value) + return slope_value, intercept_value +``` + ++++ {"lines_to_next_cell": 0} + +## Google-style docstring + +An example to demonstrate Google-style docstring. + +The initial section of the docstring offers a succinct one-liner summary of the workflow. +The subsequent section of the docstring provides an extensive explanation. +The third segment of the docstring outlines the parameters and return type, +including their respective data types. + +```{code-cell} +:lines_to_next_cell: 2 + +@workflow +def google_docstring_wf(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> Tuple[float, float]: + """ + Slope and intercept of a regression line + + This workflow accepts a list of coefficient pairs for a regression line. + It calculates both the slope and intercept of the regression line. + + Args: + x (list[int]): List of x-coefficients + y (list[int]): List of y-coefficients + + Returns: + Tuple[float, float]: Slope and intercept values + """ + slope_value = slope(x=x, y=y) + intercept_value = intercept(x=x, y=y, slope=slope_value) + return slope_value, intercept_value +``` + +Here are two screenshots showcasing how the description appears on the UI: +1. On the workflow page, you'll find the short description: +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/document_wf_short.png +:alt: Short description +:class: with-shadow +::: + +2. If you click into the workflow, you'll see the long description in the basic information section: +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/document_wf_long.png +:alt: Long description +:class: with-shadow +::: diff --git a/docs/user_guide/basics/hello_world.md b/docs/user_guide/basics/hello_world.md new file mode 100644 index 0000000000..45e5e89c4d --- /dev/null +++ b/docs/user_guide/basics/hello_world.md @@ -0,0 +1,75 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + + +# Hello, World! + +```{eval-rst} +.. tags:: Basic +``` + +Let's write a Flyte {py:func}`~flytekit.workflow` that invokes a +{py:func}`~flytekit.task` to generate the output "Hello, World!". + +Flyte tasks are the core building blocks of larger, more complex workflows. +Workflows compose multiple tasks โ€“ or other workflows โ€“ +into meaningful steps of computation to produce some useful set of outputs or outcomes. + +To begin, import `task` and `workflow` from the `flytekit` library. + +```{code-cell} +from flytekit import task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +Define a task that produces the string "Hello, World!". +Simply using the `@task` decorator to annotate the Python function. + +```{code-cell} +@task +def say_hello() -> str: + return "Hello, World!" +``` + ++++ {"lines_to_next_cell": 0} + +You can handle the output of a task in the same way you would with a regular Python function. +Store the output in a variable and use it as a return value for a Flyte workflow. + +```{code-cell} +@workflow +def hello_world_wf() -> str: + res = say_hello() + return res +``` + ++++ {"lines_to_next_cell": 0} + +Run the workflow by simply calling it like a Python function. + +```{code-cell} +:lines_to_next_cell: 2 + +if __name__ == "__main__": + print(f"Running hello_world_wf() {hello_world_wf()}") +``` + +Next, let's delve into the specifics of {ref}`tasks `, +{ref}`workflows ` and {ref}`launch plans `. diff --git a/docs/user_guide/basics/imperative_workflows.md b/docs/user_guide/basics/imperative_workflows.md new file mode 100644 index 0000000000..b5da5b6336 --- /dev/null +++ b/docs/user_guide/basics/imperative_workflows.md @@ -0,0 +1,119 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(imperative_workflow)= + +# Imperative workflows + +```{eval-rst} +.. tags:: Basic +``` + +Workflows are commonly created by applying the `@workflow` decorator to Python functions. +During compilation, this involves processing the function's body and utilizing subsequent calls to +underlying tasks to establish and record the workflow structure. This approach is known as declarative +and is suitable when manually drafting the workflow. + +However, in cases where workflows are constructed programmatically, an imperative style is more appropriate. +For instance, if tasks have been defined already, their sequence and dependencies might have been specified +in textual form (perhaps during a transition from a legacy system). +In such scenarios, you want to orchestrate these tasks. +This is where Flyte's imperative workflows come into play, allowing you to programmatically construct workflows. + +To begin, import the necessary dependencies. + +```{code-cell} +from flytekit import Workflow +``` + ++++ {"lines_to_next_cell": 0} + +We import the `slope` and `intercept` tasks from the `workflow.py` file. + +```{code-cell} +from .workflow import intercept, slope +``` + ++++ {"lines_to_next_cell": 0} + +Create an imperative workflow. + +```{code-cell} +imperative_wf = Workflow(name="imperative_workflow") +``` + ++++ {"lines_to_next_cell": 0} + +Add the workflow inputs to the imperative workflow. + +```{code-cell} +imperative_wf.add_workflow_input("x", list[int]) +imperative_wf.add_workflow_input("y", list[int]) +``` + ++++ {"lines_to_next_cell": 0} + +::: {note} +If you want to assign default values to the workflow inputs, +you can create a {ref}`launch plan `. +::: + +Add the tasks that need to be triggered from within the workflow. + +```{code-cell} +node_t1 = imperative_wf.add_entity(slope, x=imperative_wf.inputs["x"], y=imperative_wf.inputs["y"]) +node_t2 = imperative_wf.add_entity( + intercept, x=imperative_wf.inputs["x"], y=imperative_wf.inputs["y"], slope=node_t1.outputs["o0"] +) +``` + ++++ {"lines_to_next_cell": 0} + +Lastly, add the workflow output. + +```{code-cell} +imperative_wf.add_workflow_output("wf_output", node_t2.outputs["o0"]) +``` + ++++ {"lines_to_next_cell": 0} + +You can execute the workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + print(f"Running imperative_wf() {imperative_wf(x=[-3, 0, 3], y=[7, 4, -2])}") +``` + +:::{note} +You also have the option to provide a list of inputs and +retrieve a list of outputs from the workflow. + +```python +wf_input_y = imperative_wf.add_workflow_input("y", list[str]) +node_t3 = wf.add_entity(some_task, a=[wf.inputs["x"], wf_input_y]) +``` + +```python +wf.add_workflow_output( + "list_of_outputs", + [node_t1.outputs["o0"], node_t2.outputs["o0"]], + python_type=list[str], +) +``` +::: diff --git a/docs/user_guide/basics/index.md b/docs/user_guide/basics/index.md new file mode 100644 index 0000000000..bc97b74cc9 --- /dev/null +++ b/docs/user_guide/basics/index.md @@ -0,0 +1,25 @@ +# Basics + +This section introduces you to the basic building blocks of Flyte +using `flytekit`. `flytekit` is a Python SDK for developing Flyte workflows and +tasks, and can be used generally, whenever stateful computation is desirable. +`flytekit` workflows and tasks are completely runnable locally, unless they need +some advanced backend functionality like starting a distributed Spark cluster. + +Here, you will learn how to write Flyte tasks, assemble them into workflows, +run bash scripts, and document workflows. + +```{toctree} +:maxdepth: -1 +:name: basics_toc +:hidden: + +hello_world +tasks +workflows +launch_plans +imperative_workflows +documenting_workflows +shell_tasks +named_outputs +``` diff --git a/docs/user_guide/basics/launch_plans.md b/docs/user_guide/basics/launch_plans.md new file mode 100644 index 0000000000..01eb9d1051 --- /dev/null +++ b/docs/user_guide/basics/launch_plans.md @@ -0,0 +1,116 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(launch_plan)= + +# Launch plans + +```{eval-rst} +.. tags:: Basic +``` + +Launch plans link a partial or complete list of inputs required to initiate a workflow, +accompanied by optional run-time overrides like notifications, schedules and more. +They serve various purposes: + +- Schedule the same workflow multiple times, with optional predefined inputs. +- Run a specific workflow but with altered notifications. +- Share a workflow with predefined inputs, allowing another user to initiate an execution. +- Share a workflow with the option for the other user to override certain inputs. +- Share a workflow, ensuring specific inputs remain unchanged. + +Launch plans are the only means for invoking workflow executions. +When a workflow is serialized and registered, a _default launch plan_ is generated. +This default launch plan can bind default workflow inputs and runtime options defined +in the project's flytekit configuration (such as user role). + +To begin, import the necessary libraries. + +```{code-cell} +from flytekit import LaunchPlan, current_context +``` + ++++ {"lines_to_next_cell": 0} + +We import the workflow from the `workflow.py` file for which we're going to create a launch plan. + +```{code-cell} +from .workflow import simple_wf +``` + ++++ {"lines_to_next_cell": 0} + +Create a default launch plan with no inputs during serialization. + +```{code-cell} +default_lp = LaunchPlan.get_default_launch_plan(current_context(), simple_wf) +``` + ++++ {"lines_to_next_cell": 0} + +You can run the launch plan locally as follows: + +```{code-cell} +default_lp(x=[-3, 0, 3], y=[7, 4, -2]) +``` + ++++ {"lines_to_next_cell": 0} + +Create a launch plan and specify the default inputs. + +```{code-cell} +simple_wf_lp = LaunchPlan.create( + name="simple_wf_lp", workflow=simple_wf, default_inputs={"x": [-3, 0, 3], "y": [7, 4, -2]} +) +``` + ++++ {"lines_to_next_cell": 0} + +You can trigger the launch plan locally as follows: + +```{code-cell} +simple_wf_lp() +``` + ++++ {"lines_to_next_cell": 0} + +You can override the defaults as follows: + +```{code-cell} +simple_wf_lp(x=[3, 5, 3], y=[-3, 2, -2]) +``` + ++++ {"lines_to_next_cell": 0} + +It's possible to lock launch plan inputs, preventing them from being overridden during execution. + +```{code-cell} +simple_wf_lp_fixed_inputs = LaunchPlan.get_or_create( + name="fixed_inputs", workflow=simple_wf, fixed_inputs={"x": [-3, 0, 3]} +) +``` + +Attempting to modify the inputs will result in an error being raised by Flyte. + +:::{note} +You can employ default and fixed inputs in conjunction in a launch plan. +::: + +Launch plans can also be used to run workflows on a specific cadence. +For more information, refer to the {ref}`scheduling_launch_plan` documentation. diff --git a/docs/user_guide/basics/named_outputs.md b/docs/user_guide/basics/named_outputs.md new file mode 100644 index 0000000000..a609cd50a9 --- /dev/null +++ b/docs/user_guide/basics/named_outputs.md @@ -0,0 +1,116 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(named_outputs)= + +# Named outputs + +```{eval-rst} +.. tags:: Basic +``` + +By default, Flyte employs a standardized convention to assign names to the outputs of tasks or workflows. +Each output is sequentially labeled as `o1`, `o2`, `o3`, ... `on`, where `o` serves as the standard prefix, +and `1`, `2`, ... `n` indicates the positional index within the returned values. + +However, Flyte allows the customization of output names for tasks or workflows. +This customization becomes beneficial when you're returning multiple outputs +and you wish to assign a distinct name to each of them. + +The following example illustrates the process of assigning names to outputs for both a task and a workflow. + +To begin, import the required dependencies. + +```{code-cell} +from typing import NamedTuple + +from flytekit import task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +Define a `NamedTuple` and assign it as an output to a task. + +```{code-cell} +slope_value = NamedTuple("slope_value", [("slope", float)]) + + +@task +def slope(x: list[int], y: list[int]) -> slope_value: + sum_xy = sum([x[i] * y[i] for i in range(len(x))]) + sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) + n = len(x) + return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) +``` + ++++ {"lines_to_next_cell": 0} + +Likewise, assign a `NamedTuple` to the output of `intercept` task. + +```{code-cell} +intercept_value = NamedTuple("intercept_value", [("intercept", float)]) + + +@task +def intercept(x: list[int], y: list[int], slope: float) -> intercept_value: + mean_x = sum(x) / len(x) + mean_y = sum(y) / len(y) + intercept = mean_y - slope * mean_x + return intercept +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +While it's possible to create `NamedTuple`s directly within the code, +it's often better to declare them explicitly. This helps prevent potential linting errors in tools like mypy. + +``` +def slope() -> NamedTuple("slope_value", slope=float): + pass +``` +::: + +You can easily unpack the `NamedTuple` outputs directly within a workflow. +Additionally, you can also have the workflow return a `NamedTuple` as an output. + +:::{note} +Remember that we are extracting individual task execution outputs by dereferencing them. +This is necessary because `NamedTuple`s function as tuples and require this dereferencing. +::: + +```{code-cell} +slope_and_intercept_values = NamedTuple("slope_and_intercept_values", [("slope", float), ("intercept", float)]) + + +@workflow +def simple_wf_with_named_outputs(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> slope_and_intercept_values: + slope_value = slope(x=x, y=y) + intercept_value = intercept(x=x, y=y, slope=slope_value.slope) + return slope_and_intercept_values(slope=slope_value.slope, intercept=intercept_value.intercept) +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + print(f"Running simple_wf_with_named_outputs() {simple_wf_with_named_outputs()}") +``` diff --git a/docs/user_guide/basics/shell_tasks.md b/docs/user_guide/basics/shell_tasks.md new file mode 100644 index 0000000000..73cc5ab6b8 --- /dev/null +++ b/docs/user_guide/basics/shell_tasks.md @@ -0,0 +1,145 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(shell_task)= + +# Shell tasks + +```{eval-rst} +.. tags:: Basic +``` + +To execute bash scripts within Flyte, you can utilize the {py:class}`~flytekit.extras.tasks.shell.ShellTask` class. +This example includes three shell tasks to execute bash commands. + +First, import the necessary libraries. + +```{code-cell} +from pathlib import Path +from typing import Tuple + +import flytekit +from flytekit import kwtypes, task, workflow +from flytekit.extras.tasks.shell import OutputLocation, ShellTask +from flytekit.types.directory import FlyteDirectory +from flytekit.types.file import FlyteFile +``` + ++++ {"lines_to_next_cell": 0} + +With the required imports in place, you can proceed to define a shell task. +To create a shell task, provide a name for it, specify the bash script to be executed, +and define inputs and outputs if needed. + +```{code-cell} +t1 = ShellTask( + name="task_1", + debug=True, + script=""" + set -ex + echo "Hey there! Let's run some bash scripts using Flyte's ShellTask." + echo "Showcasing Flyte's Shell Task." >> {inputs.x} + if grep "Flyte" {inputs.x} + then + echo "Found it!" >> {inputs.x} + else + echo "Not found!" + fi + """, + inputs=kwtypes(x=FlyteFile), + output_locs=[OutputLocation(var="i", var_type=FlyteFile, location="{inputs.x}")], +) + + +t2 = ShellTask( + name="task_2", + debug=True, + script=""" + set -ex + cp {inputs.x} {inputs.y} + tar -zcvf {outputs.j} {inputs.y} + """, + inputs=kwtypes(x=FlyteFile, y=FlyteDirectory), + output_locs=[OutputLocation(var="j", var_type=FlyteFile, location="{inputs.y}.tar.gz")], +) + + +t3 = ShellTask( + name="task_3", + debug=True, + script=""" + set -ex + tar -zxvf {inputs.z} + cat {inputs.y}/$(basename {inputs.x}) | wc -m > {outputs.k} + """, + inputs=kwtypes(x=FlyteFile, y=FlyteDirectory, z=FlyteFile), + output_locs=[OutputLocation(var="k", var_type=FlyteFile, location="output.txt")], +) +``` + ++++ {"lines_to_next_cell": 0} + +Here's a breakdown of the parameters of the `ShellTask`: + +- The `inputs` parameter allows you to specify the types of inputs that the task will accept +- The `output_locs` parameter is used to define the output locations, which can be `FlyteFile` or `FlyteDirectory` +- The `script` parameter contains the actual bash script that will be executed + (`{inputs.x}`, `{outputs.j}`, etc. will be replaced with the actual input and output values). +- The `debug` parameter is helpful for debugging purposes + +We define a task to instantiate `FlyteFile` and `FlyteDirectory`. +A `.gitkeep` file is created in the FlyteDirectory as a placeholder to ensure the directory exists. + +```{code-cell} +@task +def create_entities() -> Tuple[FlyteFile, FlyteDirectory]: + working_dir = Path(flytekit.current_context().working_directory) + flytefile = working_dir / "test.txt" + flytefile.touch() + + flytedir = working_dir / "testdata" + flytedir.mkdir(exist_ok=True) + + flytedir_file = flytedir / ".gitkeep" + flytedir_file.touch() + return flytefile, flytedir +``` + ++++ {"lines_to_next_cell": 0} + +We create a workflow to define the dependencies between the tasks. + +```{code-cell} +@workflow +def shell_task_wf() -> FlyteFile: + x, y = create_entities() + t1_out = t1(x=x) + t2_out = t2(x=t1_out, y=y) + t3_out = t3(x=x, y=y, z=t2_out) + return t3_out +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally. + +```{code-cell} +if __name__ == "__main__": + print(f"Running shell_task_wf() {shell_task_wf()}") +``` diff --git a/docs/user_guide/basics/tasks.md b/docs/user_guide/basics/tasks.md new file mode 100644 index 0000000000..3f9fcb493d --- /dev/null +++ b/docs/user_guide/basics/tasks.md @@ -0,0 +1,108 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(task)= + +# Tasks + +```{eval-rst} +.. tags:: Basic +``` + +A task serves as the fundamental building block and an extension point within Flyte. +It exhibits the following characteristics: + +1. Versioned (typically aligned with the git sha) +2. Strong interfaces (annotated inputs and outputs) +3. Declarative +4. Independently executable +5. Suitable for unit testing + +A Flyte task operates within its own container and runs on a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/). +It can be classified into two types: + +1. A task associated with a Python function. Executing the task is the same as executing the function. +2. A task without a Python function, such as a SQL query or a portable task like prebuilt + algorithms in SageMaker, or a service calling an API. + +Flyte offers numerous plugins for tasks, including backend plugins like +[Athena](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-aws-athena/flytekitplugins/athena/task.py). + +This example demonstrates how to write and execute a +[Python function task](https://github.com/flyteorg/flytekit/blob/master/flytekit/core/python_function_task.py#L75). + +To begin, import `task` from the `flytekit` library. + +```{code-cell} +from flytekit import task +``` + ++++ {"lines_to_next_cell": 0} + +The use of the {py:func}`~flytekit.task` decorator is mandatory for a ``PythonFunctionTask``. +A task is essentially a regular Python function, with the exception that all inputs and outputs must be clearly annotated with their types. +Learn more about the supported types in the {ref}`type-system section `. + +We create a task that computes the slope of a regression line. + +```{code-cell} +@task +def slope(x: list[int], y: list[int]) -> float: + sum_xy = sum([x[i] * y[i] for i in range(len(x))]) + sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) + n = len(x) + return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +Flytekit will assign a default name to the output variable like `out0`. +In case of multiple outputs, each output will be numbered in the order +starting with 0, e.g., -> `out0, out1, out2, ...`. +::: + +You can execute a Flyte task just like any regular Python function. + +```{code-cell} +if __name__ == "__main__": + print(slope(x=[-3, 0, 3], y=[7, 4, -2])) +``` + +:::{note} +When invoking a Flyte task, you need to use keyword arguments to specify +the values for the corresponding parameters. +::: + +(single_task_execution)= + +To run it locally, you can use the following `pyflyte run` command: +``` +pyflyte run \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/task.py \ + slope --x '[-3,0,3]' --y '[7,4,-2]' +``` + +If you want to run it remotely on the Flyte cluster, +simply add the `--remote flag` to the `pyflyte run` command: +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/task.py \ + slope --x '[-3,0,3]' --y '[7,4,-2]' +``` diff --git a/docs/user_guide/basics/workflows.md b/docs/user_guide/basics/workflows.md new file mode 100644 index 0000000000..1f750c9da8 --- /dev/null +++ b/docs/user_guide/basics/workflows.md @@ -0,0 +1,151 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(workflow)= + +# Workflows + +```{eval-rst} +.. tags:: Basic +``` + +Workflows link multiple tasks together. They can be written as Python functions, +but it's important to distinguish tasks and workflows. + +A task's body executes at run-time on a Kubernetes cluster, in a Query Engine like BigQuery, +or on hosted services like AWS Batch or Sagemaker. + +In contrast, a workflow's body doesn't perform computations; it's used to structure tasks. +A workflow's body executes at registration time, during the workflow's registration process. +Registration involves uploading the packaged (serialized) code to the Flyte backend, +enabling the workflow to be triggered. + +For more information, see the {std:ref}`registration documentation `. + +To begin, import {py:func}`~flytekit.task` and {py:func}`~flytekit.workflow` from the flytekit library. + +```{code-cell} +from flytekit import task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +We define `slope` and `intercept` tasks to compute the slope and +intercept of the regression line, respectively. + +```{code-cell} +@task +def slope(x: list[int], y: list[int]) -> float: + sum_xy = sum([x[i] * y[i] for i in range(len(x))]) + sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) + n = len(x) + return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) + + +@task +def intercept(x: list[int], y: list[int], slope: float) -> float: + mean_x = sum(x) / len(x) + mean_y = sum(y) / len(y) + intercept = mean_y - slope * mean_x + return intercept +``` + ++++ {"lines_to_next_cell": 0} + +Define a workflow to establish the task dependencies. +Just like a task, a workflow is also strongly typed. + +```{code-cell} +@workflow +def simple_wf(x: list[int], y: list[int]) -> float: + slope_value = slope(x=x, y=y) + intercept_value = intercept(x=x, y=y, slope=slope_value) + return intercept_value +``` + ++++ {"lines_to_next_cell": 0} + +The {py:func}`~flytekit.workflow` decorator encapsulates Flyte tasks, +essentially representing lazily evaluated promises. +During parsing, function calls are deferred until execution time. +These function calls generate {py:class}`~flytekit.extend.Promise`s that can be propagated to downstream functions, +yet remain inaccessible within the workflow itself. +The actual evaluation occurs when the workflow is executed. + +Workflows can be executed locally, resulting in immediate evaluation, or through tools like +[`pyflyte`](https://docs.flyte.org/projects/flytekit/en/latest/pyflyte.html), +[`flytectl`](https://docs.flyte.org/projects/flytectl/en/latest/index.html) or UI, triggering evaluation. +While workflows decorated with `@workflow` resemble Python functions, +they function as python-esque Domain Specific Language (DSL). +When encountering a @task-decorated Python function, a promise object is created. +This promise doesn't store the task's actual output. Its fulfillment only occurs during execution. +Additionally, the inputs to a workflow are also promises, you can only pass promises into +tasks, workflows and other Flyte constructs. + +:::{note} +You can learn more about creating dynamic Flyte workflows by referring +to {ref}`dynamic workflows `. +In a dynamic workflow, unlike a simple workflow, the inputs are pre-materialized. +However, each task invocation within the dynamic workflow still generates a promise that is evaluated lazily. +Bear in mind that a workflow can have tasks, other workflows and dynamic workflows. +::: + +You can run a workflow by calling it as you would with a Python function and providing the necessary inputs. + +```{code-cell} +if __name__ == "__main__": + print(f"Running simple_wf() {simple_wf(x=[-3, 0, 3], y=[7, 4, -2])}") +``` + ++++ {"lines_to_next_cell": 0} + +To run the workflow locally, you can use the following `pyflyte run` command: +``` +pyflyte run \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \ + simple_wf --x '[-3,0,3]' --y '[7,4,-2]' +``` + +If you want to run it remotely on the Flyte cluster, +simply add the `--remote flag` to the `pyflyte run` command: +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \ + simple_wf --x '[-3,0,3]' --y '[7,4,-2]' +``` + +While workflows are usually constructed from multiple tasks with dependencies established through +shared inputs and outputs, there are scenarios where isolating the execution of a single task +proves advantageous during the development and iteration of its logic. +Crafting a new workflow definition each time for this purpose can be cumbersome. +However, {ref}`executing an individual task ` independently, +without the confines of a workflow, offers a convenient approach for iterating on task logic effortlessly. + +## Use `partial` to provide default arguments to tasks +You can use the {py:func}`functools.partial` function to assign default or constant values to the parameters of your tasks. + +```{code-cell} +import functools + + +@workflow +def simple_wf_with_partial(x: list[int], y: list[int]) -> float: + partial_task = functools.partial(slope, x=x) + return partial_task(y=y) +``` diff --git a/docs/user_guide/customizing_dependencies/imagespec.md b/docs/user_guide/customizing_dependencies/imagespec.md new file mode 100644 index 0000000000..5a9ef93736 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/imagespec.md @@ -0,0 +1,162 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(image_spec_example)= + +# ImageSpec + +```{eval-rst} +.. tags:: Containerization, Intermediate +``` + +:::{note} +This is an experimental feature, which is subject to change the API in the future. +::: + +`ImageSpec` is a way to specify how to build a container image without a Dockerfile. The `ImageSpec` by default will be +converted to an [Envd](https://envd.tensorchord.ai/) config, and the [Envd builder](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-envd/flytekitplugins/envd/image_builder.py#L12-L34) will build the image for you. However, you can also register your own builder to build +the image using other tools. + +For every {py:class}`flytekit.PythonFunctionTask` task or a task decorated with the `@task` decorator, +you can specify rules for binding container images. By default, flytekit binds a single container image, i.e., +the [default Docker image](https://ghcr.io/flyteorg/flytekit), to all tasks. To modify this behavior, +use the `container_image` parameter available in the {py:func}`flytekit.task` decorator, and pass an +`ImageSpec`. + +Before building the image, Flytekit checks the container registry first to see if the image already exists. By doing +so, it avoids having to rebuild the image over and over again. If the image does not exist, flytekit will build the +image before registering the workflow, and replace the image name in the task template with the newly built image name. + +```{code-cell} +import typing + +import pandas as pd +from flytekit import ImageSpec, Resources, task, workflow +``` + +:::{admonition} Prerequisites +:class: important + +- Install [flytekitplugins-envd](https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-envd) to build the `ImageSpec`. +- To build the image on remote machine, check this [doc](https://envd.tensorchord.ai/teams/context.html#start-remote-buildkitd-on-builder-machine). +- When using a registry in ImageSpec, `docker login` is required to push the image +::: + ++++ {"lines_to_next_cell": 0} + +You can specify python packages, apt packages, and environment variables in the `ImageSpec`. +These specified packages will be added on top of the [default image](https://github.com/flyteorg/flytekit/blob/master/Dockerfile), which can be found in the Flytekit Dockerfile. +More specifically, flytekit invokes [DefaultImages.default_image()](https://github.com/flyteorg/flytekit/blob/f2cfef0ec098d4ae8f042ab915b0b30d524092c6/flytekit/configuration/default_images.py#L26-L27) function. +This function determines and returns the default image based on the Python version and flytekit version. For example, if you are using python 3.8 and flytekit 0.16.0, the default image assigned will be `ghcr.io/flyteorg/flytekit:py3.8-1.6.0`. +If desired, you can also override the default image by providing a custom `base_image` parameter when using the `ImageSpec`. + +```{code-cell} +pandas_image_spec = ImageSpec( + base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.2", + packages=["pandas", "numpy"], + python_version="3.9", + apt_packages=["git"], + env={"Debug": "True"}, + registry="ghcr.io/flyteorg", +) + +sklearn_image_spec = ImageSpec( + base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.2", + packages=["scikit-learn"], + registry="ghcr.io/flyteorg", +) +``` + ++++ {"lines_to_next_cell": 0} + +:::{important} +Replace `ghcr.io/flyteorg` with a container registry you've access to publish to. +To upload the image to the local registry in the demo cluster, indicate the registry as `localhost:30000`. +::: + +`is_container` is used to determine whether the task is utilizing the image constructed from the `ImageSpec`. +If the task is indeed using the image built from the `ImageSpec`, it will then import Tensorflow. +This approach helps minimize module loading time and prevents unnecessary dependency installation within a single image. + +```{code-cell} +if sklearn_image_spec.is_container(): + from sklearn.linear_model import LogisticRegression +``` + ++++ {"lines_to_next_cell": 0} + +To enable tasks to utilize the images built with `ImageSpec`, you can specify the `container_image` parameter for those tasks. + +```{code-cell} +@task(container_image=pandas_image_spec) +def get_pandas_dataframe() -> typing.Tuple[pd.DataFrame, pd.Series]: + df = pd.read_csv("https://storage.googleapis.com/download.tensorflow.org/data/heart.csv") + print(df.head()) + return df[["age", "thalach", "trestbps", "chol", "oldpeak"]], df.pop("target") + + +@task(container_image=sklearn_image_spec, requests=Resources(cpu="1", mem="1Gi")) +def get_model(max_iter: int, multi_class: str) -> typing.Any: + return LogisticRegression(max_iter=max_iter, multi_class=multi_class) + + +# Get a basic model to train. +@task(container_image=sklearn_image_spec, requests=Resources(cpu="1", mem="1Gi")) +def train_model(model: typing.Any, feature: pd.DataFrame, target: pd.Series) -> typing.Any: + model.fit(feature, target) + return model + + +# Lastly, let's define a workflow to capture the dependencies between the tasks. +@workflow() +def wf(): + feature, target = get_pandas_dataframe() + model = get_model(max_iter=3000, multi_class="auto") + train_model(model=model, feature=feature, target=target) + + +if __name__ == "__main__": + wf() +``` + +There exists an option to override the container image by providing an Image Spec YAML file to the `pyflyte run` or `pyflyte register` command. +This allows for greater flexibility in specifying a custom container image. For example: + +```yaml +# imageSpec.yaml +python_version: 3.11 +registry: pingsutw +packages: + - sklearn +env: + Debug: "True" +``` + +``` +# Use pyflyte to register the workflow +pyflyte run --remote --image image.yaml image_spec.py wf +``` + ++++ + +If you only want to build the image without registering the workflow, you can use the `pyflyte build` command. + +``` +pyflyte build --remote image_spec.py wf +``` diff --git a/docs/user_guide/customizing_dependencies/index.md b/docs/user_guide/customizing_dependencies/index.md new file mode 100644 index 0000000000..0c5262dd67 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/index.md @@ -0,0 +1,17 @@ +# Customizing dependencies + +In this section, you will uncover how Flyte utilizes Docker images to construct containers under the hood, +and you'll learn how to craft your own images to encompass all the necessary dependencies for your tasks or workflows. +You will explore how to execute a raw container with custom commands, +indicate multiple container images within a single workflow, +and get familiar with the ins and outs of `ImageSpec`! + +```{toctree} +:maxdepth: -1 +:name: customizing_dependencies_toc +:hidden: + +imagespec +raw_containers +multiple_images_in_a_workflow +``` diff --git a/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md b/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md new file mode 100644 index 0000000000..0c323cada9 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md @@ -0,0 +1,110 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(multi_images)= + +# Multiple images in a workflow + +```{eval-rst} +.. tags:: Containerization, Intermediate +``` + +For every {py:class}`flytekit.PythonFunctionTask` task or a task decorated with the `@task` decorator, you can specify rules for binding container images. +By default, flytekit binds a single container image, i.e., the [default Docker image](https://ghcr.io/flyteorg/flytekit), to all tasks. +To modify this behavior, use the `container_image` parameter available in the {py:func}`flytekit.task` decorator. + +:::{note} +If the Docker image is not available publicly, refer to {ref}`Pulling Private Images`. +::: + +```{code-cell} +:lines_to_next_cell: 2 + +import numpy as np +from flytekit import task, workflow + + +@task(container_image="{{.image.mindmeld.fqn}}:{{.image.mindmeld.version}}") +def get_data() -> np.ndarray: + # here we're importing scikit learn within the Flyte task + from sklearn import datasets + + iris = datasets.load_iris() + X = iris.data[:, :2] + return X + + +@task(container_image="{{.image.borebuster.fqn}}:{{.image.borebuster.version}}") +def normalize(X: np.ndarray) -> np.ndarray: + return (X - X.mean(axis=0)) / X.std(axis=0) + + +@workflow +def multi_images_wf() -> np.ndarray: + X = get_data() + X = normalize(X=X) + return X +``` + +Observe how the `sklearn` library is imported in the context of a Flyte task. +This approach is beneficial when creating tasks in a single module, where some tasks have dependencies that others do not require. + +## Configuring image parameters + +The following parameters can be used to configure images in the `@task` decorator: + +1. `image` refers to the name of the image in the image configuration. The name `default` is a reserved keyword and will automatically apply to the default image name for this repository. +2. `fqn` refers to the fully qualified name of the image. For example, it includes the repository and domain URL of the image. Example: docker.io/my_repo/xyz. +3. `version` refers to the tag of the image. For example: latest, or python-3.9 etc. If `container_image` is not specified, then the default configured image for the project is used. + +## Sending images to `pyflyte` command + +You can pass Docker images to the `pyflyte run` or `pyflyte register` command. +For instance: + +``` +pyflyte run --remote --image mindmeld="ghcr.io/flyteorg/flytecookbook:core-latest" --image borebuster="ghcr.io/flyteorg/flytekit:py3.9-latest" multi_images.py multi_images_wf +``` + +## Configuring images in `$HOME/.flyte/config.yaml` + +To specify images in your `$HOME/.flyte/config.yaml` file (or whichever configuration file you are using), include an "images" section in the configuration. +For example: + +```{code-block} yaml +:emphasize-lines: 6-8 + +admin: + # For GRPC endpoints you might want to use dns:///flyte.myexample.com + endpoint: localhost:30080 + authType: Pkce + insecure: true +images: + mindmeld: ghcr.io/flyteorg/flytecookbook:core-latest + borebuster: ghcr.io/flyteorg/flytekit:py3.9-latest +console: + endpoint: http://localhost:30080 +logger: + show-source: true + level: 0 +``` + +Send the name of the configuration file to your `pyflyte run` command as follows: + +``` +pyflyte --config $HOME/.flyte/config.yaml run --remote multi_images.py multi_images_wf +``` diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/haskell/Dockerfile b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/haskell/Dockerfile new file mode 100644 index 0000000000..8d9679088d --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/haskell/Dockerfile @@ -0,0 +1,7 @@ +FROM haskell:9 + +WORKDIR /root + +COPY calculate-ellipse-area.hs /root + +RUN ghc calculate-ellipse-area.hs diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/haskell/calculate-ellipse-area.hs b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/haskell/calculate-ellipse-area.hs new file mode 100644 index 0000000000..c31d4a10ad --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/haskell/calculate-ellipse-area.hs @@ -0,0 +1,19 @@ +import System.IO +import System.Environment +import Text.Read +import Text.Printf + +calculateEllipseArea :: Float -> Float -> Float +calculateEllipseArea a b = pi * a * b + +main = do + args <- getArgs + let a = args!!0 + b = args!!1 + + let area = calculateEllipseArea (read a::Float) (read b::Float) + + let output_area = args!!2 ++ "/area" + output_metadata = args!!2 ++ "/metadata" + writeFile output_area (show area) + writeFile output_metadata "[from haskell rawcontainer]" diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/julia/Dockerfile b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/julia/Dockerfile new file mode 100644 index 0000000000..caaf85a2ab --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/julia/Dockerfile @@ -0,0 +1,5 @@ +FROM julia:1.6.4-buster + +WORKDIR /root + +COPY calculate-ellipse-area.jl /root diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/julia/calculate-ellipse-area.jl b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/julia/calculate-ellipse-area.jl new file mode 100644 index 0000000000..c26ffdfea7 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/julia/calculate-ellipse-area.jl @@ -0,0 +1,31 @@ + +using Printf + +function calculate_area(a, b) + ฯ€ * a * b +end + +function write_output(output_dir, output_file, v) + output_path = @sprintf "%s/%s" output_dir output_file + open(output_path, "w") do file + write(file, string(v)) + end +end + +function main(a, b, output_dir) + a = parse.(Float64, a) + b = parse.(Float64, b) + + area = calculate_area(a, b) + + write_output(output_dir, "area", area) + write_output(output_dir, "metadata", "[from julia rawcontainer]") +end + +# the keyword ARGS is a special value that contains the command-line arguments +# julia arrays are 1-indexed +a = ARGS[1] +b = ARGS[2] +output_dir = ARGS[3] + +main(a, b, output_dir) diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/python/Dockerfile b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/python/Dockerfile new file mode 100644 index 0000000000..2174b2d997 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/python/Dockerfile @@ -0,0 +1,5 @@ +FROM python:3.10-slim-buster + +WORKDIR /root + +COPY *.py /root/ diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/python/calculate-ellipse-area.py b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/python/calculate-ellipse-area.py new file mode 100644 index 0000000000..7c589da7c8 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/python/calculate-ellipse-area.py @@ -0,0 +1,29 @@ +import math +import sys + + +def write_output(output_dir, output_file, v): + with open(f"{output_dir}/{output_file}", "w") as f: + f.write(str(v)) + + +def calculate_area(a, b): + return math.pi * a * b + + +def main(a, b, output_dir): + a = float(a) + b = float(b) + + area = calculate_area(a, b) + + write_output(output_dir, "area", area) + write_output(output_dir, "metadata", "[from python rawcontainer]") + + +if __name__ == "__main__": + a = sys.argv[1] + b = sys.argv[2] + output_dir = sys.argv[3] + + main(a, b, output_dir) diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/Dockerfile b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/Dockerfile new file mode 100644 index 0000000000..b1dad09c08 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/Dockerfile @@ -0,0 +1,5 @@ +FROM r-base + +WORKDIR /root + +COPY *.R /root/ diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/calculate-ellipse-area.R b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/calculate-ellipse-area.R new file mode 100644 index 0000000000..d2650d826b --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/calculate-ellipse-area.R @@ -0,0 +1,13 @@ +#!/usr/bin/env Rscript + +args = commandArgs(trailingOnly=TRUE) + +a = args[1] +b = args[2] +output_dir = args[3] + +area <- pi * as.double(a) * as.double(b) +print(area) + +writeLines(as.character(area), sprintf("%s/%s", output_dir, 'area')) +writeLines("[from R rawcontainer]", sprintf("%s/%s", output_dir, 'metadata')) diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/install-readr.R b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/install-readr.R new file mode 100644 index 0000000000..3308314e4e --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/r/install-readr.R @@ -0,0 +1 @@ +install.packages("readr") diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/shell/Dockerfile b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/shell/Dockerfile new file mode 100644 index 0000000000..856160ba11 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/shell/Dockerfile @@ -0,0 +1,6 @@ +FROM alpine + +WORKDIR /root + +COPY calculate-ellipse-area.sh /root +RUN chmod +x /root/calculate-ellipse-area.sh diff --git a/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/shell/calculate-ellipse-area.sh b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/shell/calculate-ellipse-area.sh new file mode 100755 index 0000000000..5096e14035 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw-containers-supporting-files/per-language/shell/calculate-ellipse-area.sh @@ -0,0 +1,5 @@ +#! /usr/bin/env sh + +echo "4*a(1) * $1 * $2" | bc -l | tee "$3/area" + +echo "[from shell rawcontainer]" | tee "$3/metadata" diff --git a/docs/user_guide/customizing_dependencies/raw_containers.md b/docs/user_guide/customizing_dependencies/raw_containers.md new file mode 100644 index 0000000000..2ba6cfec55 --- /dev/null +++ b/docs/user_guide/customizing_dependencies/raw_containers.md @@ -0,0 +1,227 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(raw_container)= + +# Raw containers + +```{eval-rst} +.. tags:: Containerization, Advanced +``` + +This example demonstrates how to use arbitrary containers in 5 different languages, all orchestrated in flytekit seamlessly. +Flyte mounts an input data volume where all the data needed by the container is available, and an output data volume +for the container to write all the data which will be stored away. + +The data is written as separate files, one per input variable. The format of the file is serialized strings. +Refer to the raw protocol to understand how to leverage this. + +```{code-cell} +import logging + +from flytekit import ContainerTask, kwtypes, task, workflow + +logger = logging.getLogger(__file__) +``` + ++++ {"lines_to_next_cell": 0} + +## Container tasks + +A {py:class}`flytekit.ContainerTask` denotes an arbitrary container. In the following example, the name of the task +is `calculate_ellipse_area_shell`. This name has to be unique in the entire project. Users can specify: + +- `input_data_dir` -> where inputs will be written to. +- `output_data_dir` -> where Flyte will expect the outputs to exist. + +`inputs` and `outputs` specify the interface for the task; thus it should be an ordered dictionary of typed input and +output variables. + +```{code-cell} +calculate_ellipse_area_shell = ContainerTask( + name="ellipse-area-metadata-shell", + input_data_dir="/var/inputs", + output_data_dir="/var/outputs", + inputs=kwtypes(a=float, b=float), + outputs=kwtypes(area=float, metadata=str), + image="ghcr.io/flyteorg/rawcontainers-shell:v2", + command=[ + "./calculate-ellipse-area.sh", + "{{.inputs.a}}", + "{{.inputs.b}}", + "/var/outputs", + ], +) + +calculate_ellipse_area_python = ContainerTask( + name="ellipse-area-metadata-python", + input_data_dir="/var/inputs", + output_data_dir="/var/outputs", + inputs=kwtypes(a=float, b=float), + outputs=kwtypes(area=float, metadata=str), + image="ghcr.io/flyteorg/rawcontainers-python:v2", + command=[ + "python", + "calculate-ellipse-area.py", + "{{.inputs.a}}", + "{{.inputs.b}}", + "/var/outputs", + ], +) + +calculate_ellipse_area_r = ContainerTask( + name="ellipse-area-metadata-r", + input_data_dir="/var/inputs", + output_data_dir="/var/outputs", + inputs=kwtypes(a=float, b=float), + outputs=kwtypes(area=float, metadata=str), + image="ghcr.io/flyteorg/rawcontainers-r:v2", + command=[ + "Rscript", + "--vanilla", + "calculate-ellipse-area.R", + "{{.inputs.a}}", + "{{.inputs.b}}", + "/var/outputs", + ], +) + +calculate_ellipse_area_haskell = ContainerTask( + name="ellipse-area-metadata-haskell", + input_data_dir="/var/inputs", + output_data_dir="/var/outputs", + inputs=kwtypes(a=float, b=float), + outputs=kwtypes(area=float, metadata=str), + image="ghcr.io/flyteorg/rawcontainers-haskell:v2", + command=[ + "./calculate-ellipse-area", + "{{.inputs.a}}", + "{{.inputs.b}}", + "/var/outputs", + ], +) + +calculate_ellipse_area_julia = ContainerTask( + name="ellipse-area-metadata-julia", + input_data_dir="/var/inputs", + output_data_dir="/var/outputs", + inputs=kwtypes(a=float, b=float), + outputs=kwtypes(area=float, metadata=str), + image="ghcr.io/flyteorg/rawcontainers-julia:v2", + command=[ + "julia", + "calculate-ellipse-area.jl", + "{{.inputs.a}}", + "{{.inputs.b}}", + "/var/outputs", + ], +) + + +@task +def report_all_calculated_areas( + area_shell: float, + metadata_shell: str, + area_python: float, + metadata_python: str, + area_r: float, + metadata_r: str, + area_haskell: float, + metadata_haskell: str, + area_julia: float, + metadata_julia: str, +): + logger.info(f"shell: area={area_shell}, metadata={metadata_shell}") + logger.info(f"python: area={area_python}, metadata={metadata_python}") + logger.info(f"r: area={area_r}, metadata={metadata_r}") + logger.info(f"haskell: area={area_haskell}, metadata={metadata_haskell}") + logger.info(f"julia: area={area_julia}, metadata={metadata_julia}") +``` + ++++ {"lines_to_next_cell": 0} + +As can be seen in this example, `ContainerTask`s can be interacted with like normal Python functions, whose inputs +correspond to the declared input variables. All data returned by the tasks are consumed and logged by a Flyte task. + +```{code-cell} +:lines_to_next_cell: 2 + +@workflow +def wf(a: float, b: float): + # Calculate area in all languages + area_shell, metadata_shell = calculate_ellipse_area_shell(a=a, b=b) + area_python, metadata_python = calculate_ellipse_area_python(a=a, b=b) + area_r, metadata_r = calculate_ellipse_area_r(a=a, b=b) + area_haskell, metadata_haskell = calculate_ellipse_area_haskell(a=a, b=b) + area_julia, metadata_julia = calculate_ellipse_area_julia(a=a, b=b) + + # Report on all results in a single task to simplify comparison + report_all_calculated_areas( + area_shell=area_shell, + metadata_shell=metadata_shell, + area_python=area_python, + metadata_python=metadata_python, + area_r=area_r, + metadata_r=metadata_r, + area_haskell=area_haskell, + metadata_haskell=metadata_haskell, + area_julia=area_julia, + metadata_julia=metadata_julia, + ) +``` + +One of the benefits of raw container tasks is that Flytekit does not need to be installed in the target container. + +:::{note} +Raw containers cannot be run locally at the moment. +::: + +## Scripts + +The contents of each script specified in the `ContainerTask` is as follows: + +### calculate-ellipse-area.sh + +```{literalinclude} raw-containers-supporting-files/per-language/shell/calculate-ellipse-area.sh +:language: shell +``` + +### calculate-ellipse-area.py + +```{literalinclude} raw-containers-supporting-files/per-language/python/calculate-ellipse-area.py +:language: python +``` + +### calculate-ellipse-area.R + +```{literalinclude} raw-containers-supporting-files/per-language/r/calculate-ellipse-area.R +:language: r +``` + +### calculate-ellipse-area.hs + +```{literalinclude} raw-containers-supporting-files/per-language/haskell/calculate-ellipse-area.hs +:language: haskell +``` + +### calculate-ellipse-area.jl + +```{literalinclude} raw-containers-supporting-files/per-language/julia/calculate-ellipse-area.jl +:language: julia +``` diff --git a/docs/user_guide/data_types_and_io/accessing_attributes.md b/docs/user_guide/data_types_and_io/accessing_attributes.md new file mode 100644 index 0000000000..42706a3d1d --- /dev/null +++ b/docs/user_guide/data_types_and_io/accessing_attributes.md @@ -0,0 +1,176 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(attribute_access)= + +# Accessing attributes + +```{eval-rst} +.. tags:: Basic +``` + +You can directly access attributes on output promises for lists, dicts, dataclasses and combinations of these types in Flyte. +This functionality facilitates the direct passing of output attributes within workflows, +enhancing the convenience of working with complex data structures. + +To begin, import the required dependencies and define a common task for subsequent use. + +```{code-cell} +from dataclasses import dataclass + +from dataclasses_json import dataclass_json +from flytekit import task, workflow + + +@task +def print_message(message: str): + print(message) + return +``` + ++++ {"lines_to_next_cell": 0} + +## List +You can access an output list using index notation. + +:::{important} +Flyte currently does not support output promise access through list slicing. +::: + +```{code-cell} +@task +def list_task() -> list[str]: + return ["apple", "banana"] + + +@workflow +def list_wf(): + items = list_task() + first_item = items[0] + print_message(message=first_item) +``` + ++++ {"lines_to_next_cell": 0} + +## Dictionary +Access the output dictionary by specifying the key. + +```{code-cell} +@task +def dict_task() -> dict[str, str]: + return {"fruit": "banana"} + + +@workflow +def dict_wf(): + fruit_dict = dict_task() + print_message(message=fruit_dict["fruit"]) +``` + ++++ {"lines_to_next_cell": 0} + +## Data class +Directly access an attribute of a dataclass. + +```{code-cell} +@dataclass_json +@dataclass +class Fruit: + name: str + + +@task +def dataclass_task() -> Fruit: + return Fruit(name="banana") + + +@workflow +def dataclass_wf(): + fruit_instance = dataclass_task() + print_message(message=fruit_instance.name) +``` + ++++ {"lines_to_next_cell": 0} + +## Complex type +Combinations of list, dict and dataclass also work effectively. + +```{code-cell} +@task +def advance_task() -> (dict[str, list[str]], list[dict[str, str]], dict[str, Fruit]): + return {"fruits": ["banana"]}, [{"fruit": "banana"}], {"fruit": Fruit(name="banana")} + + +@task +def print_list(fruits: list[str]): + print(fruits) + + +@task +def print_dict(fruit_dict: dict[str, str]): + print(fruit_dict) + + +@workflow +def advanced_workflow(): + dictionary_list, list_dict, dict_dataclass = advance_task() + print_message(message=dictionary_list["fruits"][0]) + print_message(message=list_dict[0]["fruit"]) + print_message(message=dict_dataclass["fruit"].name) + + print_list(fruits=dictionary_list["fruits"]) + print_dict(fruit_dict=list_dict[0]) +``` + ++++ {"lines_to_next_cell": 0} + +You can run all the workflows locally as follows: + +```{code-cell} +:lines_to_next_cell: 2 + +if __name__ == "__main__": + list_wf() + dict_wf() + dataclass_wf() + advanced_workflow() +``` + +## Failure scenario +The following workflow fails because it attempts to access indices and keys that are out of range: + +```python +from flytekit import WorkflowFailurePolicy + + +@task +def failed_task() -> (list[str], dict[str, str], Fruit): + return ["apple", "banana"], {"fruit": "banana"}, Fruit(name="banana") + + +@workflow( + # The workflow remains unaffected if one of the nodes encounters an error, as long as other executable nodes are still available + failure_policy=WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE +) +def failed_workflow(): + fruits_list, fruit_dict, fruit_instance = failed_task() + print_message(message=fruits_list[100]) # Accessing an index that doesn't exist + print_message(message=fruit_dict["fruits"]) # Accessing a non-existent key + print_message(message=fruit_instance.fruit) # Accessing a non-existent param +``` diff --git a/docs/user_guide/data_types_and_io/dataclass.md b/docs/user_guide/data_types_and_io/dataclass.md new file mode 100644 index 0000000000..fdb9f1d992 --- /dev/null +++ b/docs/user_guide/data_types_and_io/dataclass.md @@ -0,0 +1,172 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(dataclass)= + +# Dataclass + +```{eval-rst} +.. tags:: Basic +``` + +When you've multiple values that you want to send across Flyte entities, you can use a `dataclass`. + +Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro) +to serialize and deserialize dataclasses. + +:::{important} +If you're using Flytekit version below v1.10, you'll need to decorate with `@dataclass_json` using +`from dataclass_json import dataclass_json` instead of inheriting from Mashumaro's `DataClassJSONMixin`. +::: + +To begin, import the necessary dependencies. + +```{code-cell} +import os +import tempfile +from dataclasses import dataclass + +import pandas as pd +from flytekit import task, workflow +from flytekit.types.directory import FlyteDirectory +from flytekit.types.file import FlyteFile +from flytekit.types.structured import StructuredDataset +from mashumaro.mixins.json import DataClassJSONMixin +``` + ++++ {"lines_to_next_cell": 0} + +## Python types +We define a `dataclass` with `int`, `str` and `dict` as the data types. + +```{code-cell} +@dataclass +class Datum(DataClassJSONMixin): + x: int + y: str + z: dict[int, str] +``` + ++++ {"lines_to_next_cell": 0} + +You can send a `dataclass` between different tasks written in various languages, and input it through the Flyte console as raw JSON. + +:::{note} +All variables in a data class should be **annotated with their type**. Failure to do should will result in an error. +::: + +Once declared, a dataclass can be returned as an output or accepted as an input. + +```{code-cell} +@task +def stringify(s: int) -> Datum: + """ + A dataclass return will be treated as a single complex JSON return. + """ + return Datum(x=s, y=str(s), z={s: str(s)}) + + +@task +def add(x: Datum, y: Datum) -> Datum: + """ + Flytekit automatically converts the provided JSON into a data class. + If the structures don't match, it triggers a runtime failure. + """ + x.z.update(y.z) + return Datum(x=x.x + y.x, y=x.y + y.y, z=x.z) +``` + ++++ {"lines_to_next_cell": 0} + +## Flyte types +We also define a data class that accepts {std:ref}`StructuredDataset `, +{std:ref}`FlyteFile ` and {std:ref}`FlyteDirectory `. + +```{code-cell} +@dataclass +class FlyteTypes(DataClassJSONMixin): + dataframe: StructuredDataset + file: FlyteFile + directory: FlyteDirectory + + +@task +def upload_data() -> FlyteTypes: + """ + Flytekit will upload FlyteFile, FlyteDirectory and StructuredDataset to the blob store, + such as GCP or S3. + """ + # 1. StructuredDataset + df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) + + # 2. FlyteDirectory + temp_dir = tempfile.mkdtemp(prefix="flyte-") + df.to_parquet(temp_dir + "/df.parquet") + + # 3. FlyteFile + file_path = tempfile.NamedTemporaryFile(delete=False) + file_path.write(b"Hello, World!") + + fs = FlyteTypes( + dataframe=StructuredDataset(dataframe=df), + file=FlyteFile(file_path.name), + directory=FlyteDirectory(temp_dir), + ) + return fs + + +@task +def download_data(res: FlyteTypes): + assert pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}).equals(res.dataframe.open(pd.DataFrame).all()) + f = open(res.file, "r") + assert f.read() == "Hello, World!" + assert os.listdir(res.directory) == ["df.parquet"] +``` + ++++ {"lines_to_next_cell": 0} + +A data class supports the usage of data associated with Python types, data classes, +flyte file, flyte directory and structured dataset. + +We define a workflow that calls the tasks created above. + +```{code-cell} +@workflow +def dataclass_wf(x: int, y: int) -> (Datum, FlyteTypes): + o1 = add(x=stringify(s=x), y=stringify(s=y)) + o2 = upload_data() + download_data(res=o2) + return o1, o2 +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + dataclass_wf(x=10, y=20) +``` + +To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input: +``` +pyflyte run \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py \ + add --x dataclass_input.json --y dataclass_input.json +``` diff --git a/docs/user_guide/data_types_and_io/enum_type.md b/docs/user_guide/data_types_and_io/enum_type.md new file mode 100644 index 0000000000..b4727c508f --- /dev/null +++ b/docs/user_guide/data_types_and_io/enum_type.md @@ -0,0 +1,100 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +# Enum type + +```{eval-rst} +.. tags:: Basic +``` + +At times, you might need to limit the acceptable values for inputs or outputs to a predefined set. +This common requirement is usually met by using Enum types in programming languages. + +You can create a Python Enum type and utilize it as an input or output for a task. +Flytekit will automatically convert it and constrain the inputs and outputs to the predefined set of values. + +:::{important} +Currently, only string values are supported as valid enum values. +Flyte assumes the first value in the list as the default, and Enum types cannot be optional. +Therefore, when defining enums, it's important to design them with the first value as a valid default. +::: + +To begin, import the dependencies. + +```{code-cell} +from enum import Enum + +from flytekit import task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +We define an enum and a simple coffee maker workflow that accepts an order and brews coffee โ˜•๏ธ accordingly. +The assumption is that the coffee maker only understands enum inputs. + +```{code-cell} +class Coffee(Enum): + ESPRESSO = "espresso" + AMERICANO = "americano" + LATTE = "latte" + CAPPUCCINO = "cappucccino" + + +@task +def take_order(coffee: str) -> Coffee: + return Coffee(coffee) + + +@task +def prep_order(coffee_enum: Coffee) -> str: + return f"Preparing {coffee_enum.value} ..." + + +@workflow +def coffee_maker(coffee: str) -> str: + coffee_enum = take_order(coffee=coffee) + return prep_order(coffee_enum=coffee_enum) +``` + ++++ {"lines_to_next_cell": 0} + +The workflow can also accept an enum value. + +```{code-cell} +@workflow +def coffee_maker_enum(coffee_enum: Coffee) -> str: + return prep_order(coffee_enum=coffee_enum) +``` + ++++ {"lines_to_next_cell": 0} + +You can send a string to the `coffee_maker_enum` workflow during its execution, like this: +``` +pyflyte run \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/enum_type.py \ + coffee_maker_enum --coffee_enum="latte" +``` + +You can run the workflows locally. + +```{code-cell} +if __name__ == "__main__": + print(coffee_maker(coffee="latte")) + print(coffee_maker_enum(coffee_enum=Coffee.LATTE)) +``` diff --git a/docs/user_guide/data_types_and_io/flytedirectory.md b/docs/user_guide/data_types_and_io/flytedirectory.md new file mode 100644 index 0000000000..6dd75ed159 --- /dev/null +++ b/docs/user_guide/data_types_and_io/flytedirectory.md @@ -0,0 +1,199 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(folder)= + +# FlyteDirectory + +```{eval-rst} +.. tags:: Data, Basic +``` + +In addition to files, folders are another fundamental operating system primitive. +Flyte supports folders in the form of +[multi-part blobs](https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/core/types.proto#L73). + +To begin, import the libraries. + +```{code-cell} +import csv +import os +import urllib.request +from collections import defaultdict +from pathlib import Path +from typing import List + +import flytekit +from flytekit import task, workflow +from flytekit.types.directory import FlyteDirectory +``` + ++++ {"lines_to_next_cell": 0} + +Building upon the previous example demonstrated in the {std:ref}`file ` section, +let's continue by considering the normalization of columns in a CSV file. + +The following task downloads a list of URLs pointing to CSV files +and returns the folder path in a `FlyteDirectory` object. + +```{code-cell} +@task +def download_files(csv_urls: List[str]) -> FlyteDirectory: + working_dir = flytekit.current_context().working_directory + local_dir = Path(os.path.join(working_dir, "csv_files")) + local_dir.mkdir(exist_ok=True) + + # get the number of digits needed to preserve the order of files in the local directory + zfill_len = len(str(len(csv_urls))) + for idx, remote_location in enumerate(csv_urls): + local_image = os.path.join( + # prefix the file name with the index location of the file in the original csv_urls list + local_dir, + f"{str(idx).zfill(zfill_len)}_{os.path.basename(remote_location)}", + ) + urllib.request.urlretrieve(remote_location, local_image) + return FlyteDirectory(path=str(local_dir)) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +You can annotate a `FlyteDirectory` when you want to download or upload the contents of the directory in batches. +For example, + +```{code-block} +@task +def t1(directory: Annotated[FlyteDirectory, BatchSize(10)]) -> Annotated[FlyteDirectory, BatchSize(100)]: + ... + return FlyteDirectory(...) +``` + +Flytekit efficiently downloads files from the specified input directory in 10-file chunks. +It then loads these chunks into memory before writing them to the local disk. +The process repeats for subsequent sets of 10 files. +Similarly, for outputs, Flytekit uploads the resulting directory in chunks of 100. +::: + +We define a helper function to normalize the columns in-place. + +:::{note} +This is a plain Python function that will be called in a subsequent Flyte task. This example +demonstrates how Flyte tasks are simply entrypoints of execution, which can themselves call +other functions and routines that are written in pure Python. +::: + +```{code-cell} +def normalize_columns( + local_csv_file: str, + column_names: List[str], + columns_to_normalize: List[str], +): + # read the data from the raw csv file + parsed_data = defaultdict(list) + with open(local_csv_file, newline="\n") as input_file: + reader = csv.DictReader(input_file, fieldnames=column_names) + for row in (x for i, x in enumerate(reader) if i > 0): + for column in columns_to_normalize: + parsed_data[column].append(float(row[column].strip())) + + # normalize the data + normalized_data = defaultdict(list) + for colname, values in parsed_data.items(): + mean = sum(values) / len(values) + std = (sum([(x - mean) ** 2 for x in values]) / len(values)) ** 0.5 + normalized_data[colname] = [(x - mean) / std for x in values] + + # overwrite the csv file with the normalized columns + with open(local_csv_file, mode="w") as output_file: + writer = csv.DictWriter(output_file, fieldnames=columns_to_normalize) + writer.writeheader() + for row in zip(*normalized_data.values()): + writer.writerow({k: row[i] for i, k in enumerate(columns_to_normalize)}) +``` + ++++ {"lines_to_next_cell": 0} + +We then define a task that accepts the previously downloaded folder, along with some metadata about the +column names of each file in the directory and the column names that we want to normalize. + +```{code-cell} +@task +def normalize_all_files( + csv_files_dir: FlyteDirectory, + columns_metadata: List[List[str]], + columns_to_normalize_metadata: List[List[str]], +) -> FlyteDirectory: + for local_csv_file, column_names, columns_to_normalize in zip( + # make sure we sort the files in the directory to preserve the original order of the csv urls + [os.path.join(csv_files_dir, x) for x in sorted(os.listdir(csv_files_dir))], + columns_metadata, + columns_to_normalize_metadata, + ): + normalize_columns(local_csv_file, column_names, columns_to_normalize) + return FlyteDirectory(path=csv_files_dir.path) +``` + ++++ {"lines_to_next_cell": 0} + +Compose all of the above tasks into a workflow. This workflow accepts a list +of URL strings pointing to a remote location containing a CSV file, a list of column names +associated with each CSV file, and a list of columns that we want to normalize. + +```{code-cell} +@workflow +def download_and_normalize_csv_files( + csv_urls: List[str], + columns_metadata: List[List[str]], + columns_to_normalize_metadata: List[List[str]], +) -> FlyteDirectory: + directory = download_files(csv_urls=csv_urls) + return normalize_all_files( + csv_files_dir=directory, + columns_metadata=columns_metadata, + columns_to_normalize_metadata=columns_to_normalize_metadata, + ) +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + csv_urls = [ + "https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", + "https://people.sc.fsu.edu/~jburkardt/data/csv/faithful.csv", + ] + columns_metadata = [ + ["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], + ["Index", "Eruption length (mins)", "Eruption wait (mins)"], + ] + columns_to_normalize_metadata = [ + ["Age"], + ["Eruption length (mins)"], + ] + + print(f"Running {__file__} main...") + directory = download_and_normalize_csv_files( + csv_urls=csv_urls, + columns_metadata=columns_metadata, + columns_to_normalize_metadata=columns_to_normalize_metadata, + ) + print(f"Running download_and_normalize_csv_files on {csv_urls}: " f"{directory}") +``` diff --git a/docs/user_guide/data_types_and_io/flytefile.md b/docs/user_guide/data_types_and_io/flytefile.md new file mode 100644 index 0000000000..474cad4041 --- /dev/null +++ b/docs/user_guide/data_types_and_io/flytefile.md @@ -0,0 +1,168 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(file)= + +# FlyteFile + +```{eval-rst} +.. tags:: Data, Basic +``` + +Files are one of the most fundamental entities that users of Python work with, +and they are fully supported by Flyte. In the IDL, they are known as +[Blob](https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/core/literals.proto#L33) +literals which are backed by the +[blob type](https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/core/types.proto#L47). + +Let's assume our mission here is pretty simple. We download a few CSV file +links, read them with the python built-in {py:class}`csv.DictReader` function, +normalize some pre-specified columns, and output the normalized columns to +another csv file. + +First, import the libraries. + +```{code-cell} +import csv +import os +from collections import defaultdict +from typing import List + +import flytekit +from flytekit import task, workflow +from flytekit.types.file import FlyteFile +``` + ++++ {"lines_to_next_cell": 0} + +Define a task that accepts {py:class}`~flytekit.types.file.FlyteFile` as an input. +The following is a task that accepts a `FlyteFile`, a list of column names, +and a list of column names to normalize. The task then outputs a CSV file +containing only the normalized columns. For this example, we use z-score normalization, +which involves mean-centering and standard-deviation-scaling. + +:::{note} +The `FlyteFile` literal can be scoped with a string, which gets inserted +into the format of the Blob type ("jpeg" is the string in +`FlyteFile[typing.TypeVar("jpeg")]`). The format is entirely optional, +and if not specified, defaults to `""`. +Predefined aliases for commonly used flyte file formats are also available. +You can find them [here](https://github.com/flyteorg/flytekit/blob/master/flytekit/types/file/__init__.py). +::: + +```{code-cell} +@task +def normalize_columns( + csv_url: FlyteFile, + column_names: List[str], + columns_to_normalize: List[str], + output_location: str, +) -> FlyteFile: + # read the data from the raw csv file + parsed_data = defaultdict(list) + with open(csv_url, newline="\n") as input_file: + reader = csv.DictReader(input_file, fieldnames=column_names) + next(reader) # Skip header + for row in reader: + for column in columns_to_normalize: + parsed_data[column].append(float(row[column].strip())) + + # normalize the data + normalized_data = defaultdict(list) + for colname, values in parsed_data.items(): + mean = sum(values) / len(values) + std = (sum([(x - mean) ** 2 for x in values]) / len(values)) ** 0.5 + normalized_data[colname] = [(x - mean) / std for x in values] + + # write to local path + out_path = os.path.join( + flytekit.current_context().working_directory, + f"normalized-{os.path.basename(csv_url.path).rsplit('.')[0]}.csv", + ) + with open(out_path, mode="w") as output_file: + writer = csv.DictWriter(output_file, fieldnames=columns_to_normalize) + writer.writeheader() + for row in zip(*normalized_data.values()): + writer.writerow({k: row[i] for i, k in enumerate(columns_to_normalize)}) + + if output_location: + return FlyteFile(path=out_path, remote_path=output_location) + else: + return FlyteFile(path=out_path) +``` + ++++ {"lines_to_next_cell": 0} + +When the image URL is sent to the task, the Flytekit engine translates it into a `FlyteFile` object on the local +drive (but doesn't download it). The act of calling the `download()` method should trigger the download, and the `path` +attribute enables to `open` the file. + +If the `output_location` argument is specified, it will be passed to the `remote_path` argument of `FlyteFile`, +which will use that path as the storage location instead of a random location (Flyte's object store). + +When this task finishes, the Flytekit engine returns the `FlyteFile` instance, uploads the file to the location, and +creates a blob literal pointing to it. + +Lastly, define a workflow. The `normalize_csv_files` workflow has an `output_location` argument which is passed +to the `location` input of the task. If it's not an empty string, the task attempts to +upload its file to that location. + +```{code-cell} +@workflow +def normalize_csv_file( + csv_url: FlyteFile, + column_names: List[str], + columns_to_normalize: List[str], + output_location: str = "", +) -> FlyteFile: + return normalize_columns( + csv_url=csv_url, + column_names=column_names, + columns_to_normalize=columns_to_normalize, + output_location=output_location, + ) +``` + ++++ {"lines_to_next_cell": 0} + +You can run the workflow locally as follows: + +```{code-cell} +if __name__ == "__main__": + default_files = [ + ( + "https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", + ["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], + ["Age"], + ), + ( + "https://people.sc.fsu.edu/~jburkardt/data/csv/faithful.csv", + ["Index", "Eruption length (mins)", "Eruption wait (mins)"], + ["Eruption length (mins)"], + ), + ] + print(f"Running {__file__} main...") + for index, (csv_url, column_names, columns_to_normalize) in enumerate(default_files): + normalized_columns = normalize_csv_file( + csv_url=csv_url, + column_names=column_names, + columns_to_normalize=columns_to_normalize, + ) + print(f"Running normalize_csv_file workflow on {csv_url}: " f"{normalized_columns}") +``` diff --git a/docs/user_guide/data_types_and_io/index.md b/docs/user_guide/data_types_and_io/index.md new file mode 100644 index 0000000000..f419c99bd3 --- /dev/null +++ b/docs/user_guide/data_types_and_io/index.md @@ -0,0 +1,152 @@ +(data_types_and_io)= + +# Data Types and IO + +Flyte being a data-aware orchestration platform, types play a vital role within it. +This section provides an introduction to the wide range of data types that Flyte supports. +These types serve a dual purpose by not only validating the data but also enabling seamless +transfer of data between local and cloud storage. +They enable: + +- Data lineage +- Memoization +- Auto parallelization +- Simplifying access to data +- Auto generated CLI and launch UI + +For a more comprehensive understanding of how Flyte manages data, refer to the +{std:ref}`Understand How Flyte Handles Data ` guide. + +(python_to_flyte_type_mapping)= + +## Mapping Python to Flyte types + +Flytekit automatically translates most Python types into Flyte types. +Here's a breakdown of these mappings: + +```{eval-rst} +.. list-table:: + :widths: auto + :header-rows: 1 + + * - Python Type + - Flyte Type + - Conversion + - Comment + * - ``int`` + - ``Integer`` + - Automatic + - Use Python 3 type hints. + * - ``float`` + - ``Float`` + - Automatic + - Use Python 3 type hints. + * - ``str`` + - ``String`` + - Automatic + - Use Python 3 type hints. + * - ``bool`` + - ``Boolean`` + - Automatic + - Use Python 3 type hints. + * - ``bytes``/``bytearray`` + - ``Binary`` + - Not Supported + - You have the option to employ your own custom type transformer. + * - ``complex`` + - NA + - Not Supported + - You have the option to employ your own custom type transformer. + * - ``datetime.timedelta`` + - ``Duration`` + - Automatic + - Use Python 3 type hints. + * - ``datetime.datetime`` + - ``Datetime`` + - Automatic + - Use Python 3 type hints. + * - ``datetime.date`` + - ``Datetime`` + - Automatic + - Use Python 3 type hints. + * - ``typing.List[T]`` / ``list[T]`` + - ``Collection [T]`` + - Automatic + - Use ``typing.List[T]`` or ``list[T]``, where ``T`` can represent one of the other supported types listed in the table. + * - ``typing.Iterator[T]`` + - ``Collection [T]`` + - Automatic + - Use ``typing.Iterator[T]``, where ``T`` can represent one of the other supported types listed in the table. + * - File / file-like / ``os.PathLike`` + - ``FlyteFile`` + - Automatic + - If you're using ``file`` or ``os.PathLike`` objects, Flyte will default to the binary protocol for the file. + When using ``FlyteFile["protocol"]``, it is assumed that the file is in the specified protocol, such as 'jpg', 'png', 'hdf5', etc. + * - Directory + - ``FlyteDirectory`` + - Automatic + - When using ``FlyteDirectory["protocol"]``, it is assumed that all the files belong to the specified protocol. + * - ``typing.Dict[str, V]`` / ``dict[str, V]`` + - ``Map[str, V]`` + - Automatic + - Use ``typing.Dict[str, V]`` or ``dict[str, V]``, where ``V`` can be one of the other supported types in the table, + including a nested dictionary. + * - ``dict`` + - JSON (``struct.pb``) + - Automatic + - Use ``dict``. It's assumed that the untyped dictionary can be converted to JSON. + However, this may not always be possible and could result in a ``RuntimeError``. + * - ``@dataclass`` + - ``Struct`` + - Automatic + - The class should be a pure value class that inherits from Mashumaro's DataClassJSONMixin, + and be annotated with the ``@dataclass`` decorator. + * - ``np.ndarray`` + - File + - Automatic + - Use ``np.ndarray`` as a type hint. + * - ``pandas.DataFrame`` + - Structured Dataset + - Automatic + - Use ``pandas.DataFrame`` as a type hint. Pandas column types aren't preserved. + * - ``pyspark.DataFrame`` + - Structured Dataset + - To utilize the type, install the ``flytekitplugins-spark`` plugin. + - Use ``pyspark.DataFrame`` as a type hint. + * - ``pydantic.BaseModel`` + - ``Map`` + - To utilize the type, install the ``flytekitplugins-pydantic`` plugin. + - Use ``pydantic.BaseModel`` as a type hint. + * - ``torch.Tensor`` / ``torch.nn.Module`` + - File + - To utilize the type, install the ``torch`` library. + - Use ``torch.Tensor`` or ``torch.nn.Module`` as a type hint, and you can use their derived types. + * - ``tf.keras.Model`` + - File + - To utilize the type, install the ``tensorflow`` library. + - Use ``tf.keras.Model`` and its derived types. + * - ``sklearn.base.BaseEstimator`` + - File + - To utilize the type, install the ``scikit-learn`` library. + - Use ``sklearn.base.BaseEstimator`` and its derived types. + * - User defined types + - Any + - Custom transformers + - The ``FlytePickle`` transformer is the default option, but you can also define custom transformers. + **For instructions on building custom type transformers, please refer to :ref:`this section `**. +``` + +```{toctree} +:maxdepth: -1 +:name: data_types_and_io_toc +:hidden: + +flytefile +flytedirectory +structureddataset +dataclass +accessing_attributes +pytorch_type +enum_type +pickle_type +``` diff --git a/docs/user_guide/data_types_and_io/pickle_type.md b/docs/user_guide/data_types_and_io/pickle_type.md new file mode 100644 index 0000000000..b5cbb89f5a --- /dev/null +++ b/docs/user_guide/data_types_and_io/pickle_type.md @@ -0,0 +1,131 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(pickle_type)= + +# Pickle type + +```{eval-rst} +.. tags:: Basic +``` + +Flyte enforces type safety by utilizing type information for compiling tasks and workflows, +enabling various features such as static analysis and conditional branching. + +However, we also strive to offer flexibility to end-users so they don't have to invest heavily +in understanding their data structures upfront before experiencing the value Flyte has to offer. + +Flyte supports the `FlytePickle` transformer, which converts any unrecognized type hint into `FlytePickle`, +enabling the serialization/deserialization of Python values to/from a pickle file. + +:::{important} +Pickle can only be used to send objects between the exact same Python version. +For optimal performance, it's advisable to either employ Python types that are supported by Flyte +or register a custom transformer, as using pickle types can result in lower performance. +::: + +This example demonstrates how you can utilize custom objects without registering a transformer. + +```{code-cell} +from flytekit import task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +`Superhero` represents a user-defined complex type that can be serialized to a pickle file by Flytekit +and transferred between tasks as both input and output data. + +:::{note} +Alternatively, you can {ref}`turn this object into a dataclass ` for improved performance. +We have used a simple object here for demonstration purposes. +::: + +```{code-cell} +class Superhero: + def __init__(self, name, power): + self.name = name + self.power = power + + +@task +def welcome_superhero(name: str, power: str) -> Superhero: + return Superhero(name, power) + + +@task +def greet_superhero(superhero: Superhero) -> str: + return f"๐Ÿ‘‹ Hello {superhero.name}! Your superpower is {superhero.power}." + + +@workflow +def superhero_wf(name: str = "Thor", power: str = "Flight") -> str: + superhero = welcome_superhero(name=name, power=power) + return greet_superhero(superhero=superhero) +``` + ++++ {"lines_to_next_cell": 0} + +## Batch size + +By default, if the list subtype is unrecognized, a single pickle file is generated. +To optimize serialization and deserialization performance for scenarios involving a large number of items +or significant list elements, you can specify a batch size. +This feature allows for the processing of each batch as a separate pickle file. +The following example demonstrates how to set the batch size. + +```{code-cell} +from typing import Iterator + +from flytekit.types.pickle.pickle import BatchSize +from typing_extensions import Annotated + + +@task +def welcome_superheroes(names: list[str], powers: list[str]) -> Annotated[list[Superhero], BatchSize(3)]: + return [Superhero(name, power) for name, power in zip(names, powers)] + + +@task +def greet_superheroes(superheroes: list[Superhero]) -> Iterator[str]: + for superhero in superheroes: + yield f"๐Ÿ‘‹ Hello {superhero.name}! Your superpower is {superhero.power}." + + +@workflow +def superheroes_wf( + names: list[str] = ["Thor", "Spiderman", "Hulk"], + powers: list[str] = ["Flight", "Surface clinger", "Shapeshifting"], +) -> Iterator[str]: + superheroes = welcome_superheroes(names=names, powers=powers) + return greet_superheroes(superheroes=superheroes) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +The `welcome_superheroes` task will generate two pickle files: one containing two superheroes and the other containing one superhero. +::: + +You can run the workflows locally as follows: + +```{code-cell} +if __name__ == "__main__": + print(f"Superhero wf: {superhero_wf()}") + print(f"Superhero(es) wf: {superheroes_wf()}") +``` diff --git a/docs/user_guide/data_types_and_io/pytorch_type.md b/docs/user_guide/data_types_and_io/pytorch_type.md new file mode 100644 index 0000000000..4e5715d128 --- /dev/null +++ b/docs/user_guide/data_types_and_io/pytorch_type.md @@ -0,0 +1,219 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(pytorch_type)= + +# PyTorch type + +```{eval-rst} +.. tags:: MachineLearning, Basic +``` + +Flyte advocates for the use of strongly-typed data to simplify the development of robust and testable pipelines. +In addition to its application in data engineering, Flyte is primarily used for machine learning. +To streamline the communication between Flyte tasks, particularly when dealing with tensors and models, +we have introduced support for PyTorch types. + +## Tensors and modules + +At times, you may find the need to pass tensors and modules (models) within your workflow. +Without native support for PyTorch tensors and modules, Flytekit relies on {std:ref}`pickle ` for serializing +and deserializing these entities, as well as any unknown types. +However, this approach isn't the most efficient. As a result, we've integrated PyTorch's +serialization and deserialization support into the Flyte type system. + +```{code-cell} +import torch +from flytekit import task, workflow + + +@task +def generate_tensor_2d() -> torch.Tensor: + return torch.tensor([[1.0, -1.0, 2], [1.0, -1.0, 9], [0, 7.0, 3]]) + + +@task +def reshape_tensor(tensor: torch.Tensor) -> torch.Tensor: + # convert 2D to 3D + tensor.unsqueeze_(-1) + return tensor.expand(3, 3, 2) + + +@task +def generate_module() -> torch.nn.Module: + bn = torch.nn.BatchNorm1d(3, track_running_stats=True) + return bn + + +@task +def get_model_weight(model: torch.nn.Module) -> torch.Tensor: + return model.weight + + +class MyModel(torch.nn.Module): + def __init__(self): + super(MyModel, self).__init__() + self.l0 = torch.nn.Linear(4, 2) + self.l1 = torch.nn.Linear(2, 1) + + def forward(self, input): + out0 = self.l0(input) + out0_relu = torch.nn.functional.relu(out0) + return self.l1(out0_relu) + + +@task +def get_l1() -> torch.nn.Module: + model = MyModel() + return model.l1 + + +@workflow +def pytorch_native_wf(): + reshape_tensor(tensor=generate_tensor_2d()) + get_model_weight(model=generate_module()) + get_l1() +``` + ++++ {"lines_to_next_cell": 0} + +Passing around tensors and modules is no more a hassle! + +## Checkpoint + +`PyTorchCheckpoint` is a specialized checkpoint used for serializing and deserializing PyTorch models. +It checkpoints `torch.nn.Module`'s state, hyperparameters and optimizer state. + +This module checkpoint differs from the standard checkpoint as it specifically captures the module's `state_dict`. +Therefore, when restoring the module, the module's `state_dict` must be used in conjunction with the actual module. +According to the PyTorch [docs](https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-entire-model), +it's recommended to store the module's `state_dict` rather than the module itself, +although the serialization should work in either case. + +```{code-cell} +:lines_to_next_cell: 2 + +from dataclasses import dataclass + +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from dataclasses_json import dataclass_json +from flytekit.extras.pytorch import PyTorchCheckpoint + + +@dataclass_json +@dataclass +class Hyperparameters: + epochs: int + loss: float + + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(3, 6, 5) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(6, 16, 5) + self.fc1 = nn.Linear(16 * 5 * 5, 120) + self.fc2 = nn.Linear(120, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(-1, 16 * 5 * 5) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + +@task +def generate_model(hyperparameters: Hyperparameters) -> PyTorchCheckpoint: + bn = Net() + optimizer = optim.SGD(bn.parameters(), lr=0.001, momentum=0.9) + return PyTorchCheckpoint(module=bn, hyperparameters=hyperparameters, optimizer=optimizer) + + +@task +def load(checkpoint: PyTorchCheckpoint): + new_bn = Net() + new_bn.load_state_dict(checkpoint["module_state_dict"]) + optimizer = optim.SGD(new_bn.parameters(), lr=0.001, momentum=0.9) + optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) + + +@workflow +def pytorch_checkpoint_wf(): + checkpoint = generate_model(hyperparameters=Hyperparameters(epochs=10, loss=0.1)) + load(checkpoint=checkpoint) +``` + +:::{note} +`PyTorchCheckpoint` supports serializing hyperparameters of types `dict`, `NamedTuple` and `dataclass`. +::: + +## Auto GPU to CPU and CPU to GPU conversion + +Not all PyTorch computations require a GPU. In some cases, it can be advantageous to transfer the +computation to a CPU, especially after training the model on a GPU. +To utilize the power of a GPU, the typical construct to use is: `to(torch.device("cuda"))`. + +When working with GPU variables on a CPU, variables need to be transferred to the CPU using the `to(torch.device("cpu"))` construct. +However, this manual conversion recommended by PyTorch may not be very user-friendly. +To address this, we added support for automatic GPU to CPU conversion (and vice versa) for PyTorch types. + +```python +from flytekit import Resources +from typing import Tuple + + +@task(requests=Resources(gpu="1")) +def train() -> Tuple[PyTorchCheckpoint, torch.Tensor, torch.Tensor, torch.Tensor]: + ... + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + model = Model(X_train.shape[1]) + model.to(device) + ... + X_train, X_test = X_train.to(device), X_test.to(device) + y_train, y_test = y_train.to(device), y_test.to(device) + ... + return PyTorchCheckpoint(module=model), X_train, X_test, y_test + +@task +def predict( + checkpoint: PyTorchCheckpoint, + X_train: torch.Tensor, + X_test: torch.Tensor, + y_test: torch.Tensor, +): + new_bn = Model(X_train.shape[1]) + new_bn.load_state_dict(checkpoint["module_state_dict"]) + + accuracy_list = np.zeros((5,)) + + with torch.no_grad(): + y_pred = new_bn(X_test) + correct = (torch.argmax(y_pred, dim=1) == y_test).type(torch.FloatTensor) + accuracy_list = correct.mean() +``` + +The `predict` task will run on a CPU, and +the device conversion from GPU to CPU will be automatically handled by Flytekit. diff --git a/docs/user_guide/data_types_and_io/structureddataset.md b/docs/user_guide/data_types_and_io/structureddataset.md new file mode 100644 index 0000000000..e37c006d0e --- /dev/null +++ b/docs/user_guide/data_types_and_io/structureddataset.md @@ -0,0 +1,365 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(structured_dataset)= + +# StructuredDataset + +```{eval-rst} +.. tags:: Basic, DataFrame +``` +```{currentmodule} flytekit.types.structured +``` + +As with most type systems, Python has primitives, container types like maps and tuples, and support for user-defined structures. +However, while thereโ€™s a rich variety of dataframe classes (Pandas, Spark, Pandera, etc.), thereโ€™s no native Python type that +represents a dataframe in the abstract. This is the gap that the {py:class}`StructuredDataset` type is meant to fill. +It offers the following benefits: + +- Eliminate boilerplate code you would otherwise need to write to serialize/deserialize from file objects into dataframe instances, +- Eliminate additional inputs/outputs that convey metadata around the format of the tabular data held in those files, +- Add flexibility around how dataframe files are loaded, +- Offer a range of dataframe specific functionality - enforce compatibility of different schemas + (not only at compile time, but also runtime since type information is carried along in the literal), + store third-party schema definitions, and potentially in the future, render sample data, provide summary statistics, etc. + +This example demonstrates how to work with a structured dataset using Flyte entities. + +To begin, import the necessary dependencies. + +```{code-cell} +import os +import typing + +import numpy as np +import pandas as pd +import pyarrow as pa +import pyarrow.parquet as pq +from flytekit import FlyteContext, StructuredDatasetType, kwtypes, task, workflow +from flytekit.models import literals +from flytekit.models.literals import StructuredDatasetMetadata +from flytekit.types.structured.structured_dataset import ( + PARQUET, + StructuredDataset, + StructuredDatasetDecoder, + StructuredDatasetEncoder, + StructuredDatasetTransformerEngine, +) +from typing_extensions import Annotated +``` + ++++ {"lines_to_next_cell": 0} + +Define a task that returns a Pandas DataFrame. +Flytekit will detect the Pandas dataframe return signature and +convert the interface for the task to the new {py:class}`StructuredDataset` type. + +```{code-cell} +@task +def generate_pandas_df(a: int) -> pd.DataFrame: + return pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [a, 22], "Height": [160, 178]}) +``` + ++++ {"lines_to_next_cell": 0} + +Using this simplest form, however, the user is not able to set the additional dataframe information alluded to above, + +- Column type information +- Serialized byte format +- Storage driver and location +- Additional third party schema information + +This is by design as we wanted the default case to suffice for the majority of use-cases, and to require +as few changes to existing code as possible. Specifying these is simple, however, and relies on Python variable annotations, +which is designed explicitly to supplement types with arbitrary metadata. + +## Column type information +If you want to extract a subset of actual columns of the dataframe and specify their types for type validation, +you can just specify the column names and their types in the structured dataset type annotation. + +First, initialize column types you want to extract from the `StructuredDataset`. + +```{code-cell} +all_cols = kwtypes(Name=str, Age=int, Height=int) +col = kwtypes(Age=int) +``` + ++++ {"lines_to_next_cell": 0} + +Define a task that opens a structured dataset by calling `all()`. +When you invoke `all()` with ``pandas.DataFrame``, the Flyte engine downloads the parquet file on S3, and deserializes it to `pandas.DataFrame`. +Keep in mind that you can invoke ``open()`` with any dataframe type that's supported or added to structured dataset. +For instance, you can use ``pa.Table`` to convert the Pandas DataFrame to a PyArrow table. + +```{code-cell} +@task +def get_subset_pandas_df(df: Annotated[StructuredDataset, all_cols]) -> Annotated[StructuredDataset, col]: + df = df.open(pd.DataFrame).all() + df = pd.concat([df, pd.DataFrame([[30]], columns=["Age"])]) + return StructuredDataset(dataframe=df) + + +@workflow +def simple_sd_wf(a: int = 19) -> Annotated[StructuredDataset, col]: + pandas_df = generate_pandas_df(a=a) + return get_subset_pandas_df(df=pandas_df) +``` + ++++ {"lines_to_next_cell": 0} + +The code may result in runtime failures if the columns do not match. +The input ``df`` has ``Name``, ``Age`` and ``Height`` columns, whereas the output structured dataset will only have the ``Age`` column. + +## Serialized byte format +You can use a custom serialization format to serialize your dataframes. +Here's how you can register the Pandas to CSV handler, which is already available, +and enable the CSV serialization by annotating the structured dataset with the CSV format: + +```{code-cell} +from flytekit.types.structured import register_csv_handlers +from flytekit.types.structured.structured_dataset import CSV + +register_csv_handlers() + + +@task +def pandas_to_csv(df: pd.DataFrame) -> Annotated[StructuredDataset, CSV]: + return StructuredDataset(dataframe=df) + + +@workflow +def pandas_to_csv_wf() -> Annotated[StructuredDataset, CSV]: + pandas_df = generate_pandas_df(a=19) + return pandas_to_csv(df=pandas_df) +``` + ++++ {"lines_to_next_cell": 0} + +## Storage driver and location +By default, the data will be written to the same place that all other pointer-types (FlyteFile, FlyteDirectory, etc.) are written to. +This is controlled by the output data prefix option in Flyte which is configurable on multiple levels. + +That is to say, in the simple default case, Flytekit will, + +- Look up the default format for say, Pandas dataframes, +- Look up the default storage location based on the raw output prefix setting, +- Use these two settings to select an encoder and invoke it. + +So what's an encoder? To understand that, let's look into how the structured dataset plugin works. + +## Inner workings of a structured dataset plugin + +Two things need to happen with any dataframe instance when interacting with Flyte: + +- Serialization/deserialization from/to the Python instance to bytes (in the format specified above). +- Transmission/retrieval of those bits to/from somewhere. + +Each structured dataset plugin (called encoder or decoder) needs to perform both of these steps. +Flytekit decides which of the loaded plugins to invoke based on three attributes: + +- The byte format +- The storage location +- The Python type in the task or workflow signature. + +These three keys uniquely identify which encoder (used when converting a dataframe in Python memory to a Flyte value, +e.g. when a task finishes and returns a dataframe) or decoder (used when hydrating a dataframe in memory from a Flyte value, +e.g. when a task starts and has a dataframe input) to invoke. + +However, it is awkward to require users to use `typing.Annotated` on every signature. +Therefore, Flytekit has a default byte-format for every Python dataframe type registered with flytekit. + +## The `uri` argument + +BigQuery `uri` allows you to load and retrieve data from cloud using the `uri` argument. +The `uri` comprises of the bucket name and the filename prefixed with `gs://`. +If you specify BigQuery `uri` for structured dataset, BigQuery creates a table in the location specified by the `uri`. +The `uri` in structured dataset reads from or writes to S3, GCP, BigQuery or any storage. + +Before writing DataFrame to a BigQuery table, + +1. Create a [GCP account](https://cloud.google.com/docs/authentication/getting-started) and create a service account. +2. Create a project and add the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to your `.bashrc` file. +3. Create a dataset in your project. + +Here's how you can define a task that converts a pandas DataFrame to a BigQuery table: + +```python +@task +def pandas_to_bq() -> StructuredDataset: + df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) + return StructuredDataset(dataframe=df, uri="gs:///") +``` + +Replace `BUCKET_NAME` with the name of your GCS bucket and `FILE_NAME` with the name of the file the dataframe should be copied to. + +### Note that no format was specified in the structured dataset constructor, or in the signature. So how did the BigQuery encoder get invoked? +This is because the stock BigQuery encoder is loaded into Flytekit with an empty format. +The Flytekit `StructuredDatasetTransformerEngine` interprets that to mean that it is a generic encoder +(or decoder) and can work across formats, if a more specific format is not found. + +And here's how you can define a task that converts the BigQuery table to a pandas DataFrame: + +```python +@task +def bq_to_pandas(sd: StructuredDataset) -> pd.DataFrame: + return sd.open(pd.DataFrame).all() +``` + +:::{note} +Flyte creates a table inside the dataset in the project upon BigQuery query execution. +::: + +## How to return multiple dataframes from a task? +For instance, how would a task return say two dataframes: +- The first dataframe be written to BigQuery and serialized by one of their libraries, +- The second needs to be serialized to CSV and written at a specific location in GCS different from the generic pointer-data bucket + +If you want the default behavior (which is itself configurable based on which plugins are loaded), +you can work just with your current raw dataframe classes. + +```python +@task +def t1() -> typing.Tuple[StructuredDataset, StructuredDataset]: + ... + return StructuredDataset(df1, uri="bq://project:flyte.table"), \ + StructuredDataset(df2, uri="gs://auxiliary-bucket/data") +``` + +If you want to customize the Flyte interaction behavior, you'll need to wrap your dataframe in a `StructuredDataset` wrapper object. + +## How to define a custom structured dataset plugin? + +`StructuredDataset` ships with an encoder and a decoder that handles the conversion of a +Python value to a Flyte literal and vice-versa, respectively. +Here is a quick demo showcasing how one might build a NumPy encoder and decoder, +enabling the use of a 2D NumPy array as a valid type within structured datasets. + +### NumPy encoder + +Extend `StructuredDatasetEncoder` and implement the `encode` function. +The `encode` function converts NumPy array to an intermediate format (parquet file format in this case). + +```{code-cell} +class NumpyEncodingHandler(StructuredDatasetEncoder): + def encode( + self, + ctx: FlyteContext, + structured_dataset: StructuredDataset, + structured_dataset_type: StructuredDatasetType, + ) -> literals.StructuredDataset: + df = typing.cast(np.ndarray, structured_dataset.dataframe) + name = ["col" + str(i) for i in range(len(df))] + table = pa.Table.from_arrays(df, name) + path = ctx.file_access.get_random_remote_directory() + local_dir = ctx.file_access.get_random_local_directory() + local_path = os.path.join(local_dir, f"{0:05}") + pq.write_table(table, local_path) + ctx.file_access.upload_directory(local_dir, path) + return literals.StructuredDataset( + uri=path, + metadata=StructuredDatasetMetadata(structured_dataset_type=StructuredDatasetType(format=PARQUET)), + ) +``` + ++++ {"lines_to_next_cell": 0} + +### NumPy decoder + +Extend {py:class}`StructuredDatasetDecoder` and implement the {py:meth}`~StructuredDatasetDecoder.decode` function. +The {py:meth}`~StructuredDatasetDecoder.decode` function converts the parquet file to a `numpy.ndarray`. + +```{code-cell} +class NumpyDecodingHandler(StructuredDatasetDecoder): + def decode( + self, + ctx: FlyteContext, + flyte_value: literals.StructuredDataset, + current_task_metadata: StructuredDatasetMetadata, + ) -> np.ndarray: + local_dir = ctx.file_access.get_random_local_directory() + ctx.file_access.get_data(flyte_value.uri, local_dir, is_multipart=True) + table = pq.read_table(local_dir) + return table.to_pandas().to_numpy() +``` + ++++ {"lines_to_next_cell": 0} + +### NumPy renderer + +Create a default renderer for numpy array, then Flytekit will use this renderer to +display schema of NumPy array on the Flyte deck. + +```{code-cell} +class NumpyRenderer: + def to_html(self, df: np.ndarray) -> str: + assert isinstance(df, np.ndarray) + name = ["col" + str(i) for i in range(len(df))] + table = pa.Table.from_arrays(df, name) + return pd.DataFrame(table.schema).to_html(index=False) +``` + ++++ {"lines_to_next_cell": 0} + +In the end, register the encoder, decoder and renderer with the `StructuredDatasetTransformerEngine`. +Specify the Python type you want to register this encoder with (`np.ndarray`), +the storage engine to register this against (if not specified, it is assumed to work for all the storage backends), +and the byte format, which in this case is `PARQUET`. + +```{code-cell} +StructuredDatasetTransformerEngine.register(NumpyEncodingHandler(np.ndarray, None, PARQUET)) +StructuredDatasetTransformerEngine.register(NumpyDecodingHandler(np.ndarray, None, PARQUET)) +StructuredDatasetTransformerEngine.register_renderer(np.ndarray, NumpyRenderer()) +``` + ++++ {"lines_to_next_cell": 0} + +You can now use `numpy.ndarray` to deserialize the parquet file to NumPy and serialize a task's output (NumPy array) to a parquet file. + +```{code-cell} +@task +def generate_pd_df_with_str() -> pd.DataFrame: + return pd.DataFrame({"Name": ["Tom", "Joseph"]}) + + +@task +def to_numpy(sd: StructuredDataset) -> Annotated[StructuredDataset, None, PARQUET]: + numpy_array = sd.open(np.ndarray).all() + return StructuredDataset(dataframe=numpy_array) + + +@workflow +def numpy_wf() -> Annotated[StructuredDataset, None, PARQUET]: + return to_numpy(sd=generate_pd_df_with_str()) +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +`pyarrow` raises an `Expected bytes, got a 'int' object` error when the dataframe contains integers. +::: + +You can run the code locally as follows: + +```{code-cell} +if __name__ == "__main__": + sd = simple_sd_wf() + print(f"A simple Pandas dataframe workflow: {sd.open(pd.DataFrame).all()}") + print(f"Using CSV as the serializer: {pandas_to_csv_wf().open(pd.DataFrame).all()}") + print(f"NumPy encoder and decoder: {numpy_wf().open(np.ndarray).all()}") +``` diff --git a/docs/user_guide/development_lifecycle/agents.md b/docs/user_guide/development_lifecycle/agents.md new file mode 100644 index 0000000000..2ec1309272 --- /dev/null +++ b/docs/user_guide/development_lifecycle/agents.md @@ -0,0 +1,234 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(extend-agent-service)= + +# Agents + +```{eval-rst} +.. tags:: Extensibility, Contribute, Intermediate +``` + +:::{note} +This is an experimental feature, which is subject to change the API in the future. +::: + +## What is an agent? + +In Flyte, an agent is a long-running stateless service that can be used to execute tasks. It reduces the overhead of creating a pod for each task. +In addition, it's easy to scale up and down the agent service based on the workload. Agent services are designed to be language-agnostic. +For now, we only support Python agent, but we may support other languages in the future. + +Agent is designed to run a specific type of task. For example, you can create a BigQuery agent to run BigQuery task. Therefore, if you create a new type of task, you can +either run the task in the pod, or you can create a new agent to run it. You can determine how the task will be executed in the FlytePropeller configMap. + +Key goals of the agent service include: + +- Support for communication with external services: The focus is on enabling agents that seamlessly interact with external services. +- Independent testing and private deployment: Agents can be tested independently and deployed privately, providing flexibility and control over development. +- Flyte Agent usage in local development: Users, especially in `flytekit` and `unionml`, can leverage backend agents for local development, streamlining the development process. +- Language-agnostic: Agents can be authored in any programming language, allowing users to work with their preferred language and tools. +- Scalability: Agents are designed to be scalable, ensuring they can handle large-scale workloads effectively. +- Simple API: Agents offer a straightforward API, making integration and usage straightforward for developers. + +## Why do we need an agent service? + +Without agents, people need to implement a backend plugin in the propeller. The backend plugin is responsible for +creating a CRD and submitting a http request to the external service. However, it increases the complexity of flytepropeller, and +it's hard to maintain the backend plugin. For example, if we want to add a new plugin, we need to update and compile +flytepropeller, and it's also hard to test. In addition, the backend plugin is running in flytepropeller itself, so it +increases the load of the flytepropeller engine. + +Furthermore, implementing backend plugins can be challenging, particularly for data scientists and ML engineers who may lack proficiency in +Golang. Additionally, managing performance requirements, maintenance, and development can be burdensome. +To address these issues, we introduced the "Agent Service" in Flyte. This system enables rapid plugin +development while decoupling them from the core flytepropeller engine. + +## Overview + +The Flyte agent service is a Python-based agent registry powered by a gRPC server. It allows users and flytepropeller +to send gRPC requests to the registry for executing jobs such as BigQuery and Databricks. Each Agent service is a Kubernetes +deployment. You can create two different Agent services hosting different Agents. For example, you can create one production +agent service and one development agent service. + +:::{figure} https://i.ibb.co/vXhBDjP/Screen-Shot-2023-05-29-at-2-54-14-PM.png +:alt: Agent Service +:class: with-shadow +::: + +## How to register a new agent + +### Flytekit interface specification + +To register a new agent, you can extend the `AgentBase` class in the flytekit backend module. Implementing the following three methods is necessary, and it's important to ensure that all calls are idempotent: + +- `create`: This method is used to initiate a new task. Users have the flexibility to use gRPC, REST, or an SDK to create a task. +- `get`: This method allows retrieving the job Resource (jobID or output literal) associated with the task, such as a BigQuery Job ID or Databricks task ID. +- `delete`: Invoking this method will send a request to delete the corresponding job. + +```python +from flytekit.extend.backend.base_agent import AgentBase, AgentRegistry +from dataclasses import dataclass +import requests + +@dataclass +class Metadata: + # you can add any metadata you want, propeller will pass the metadata to the agent to get the job status. + # For example, you can add the job_id to the metadata, and the agent will use the job_id to get the job status. + # You could also add the s3 file path, and the agent can check if the file exists. + job_id: str + +class CustomAgent(AgentBase): + def __init__(self, task_type: str): + # Each agent should have a unique task type. Agent service will use the task type to find the corresponding agent. + self._task_type = task_type + + def create( + self, + context: grpc.ServicerContext, + output_prefix: str, + task_template: TaskTemplate, + inputs: typing.Optional[LiteralMap] = None, + ) -> TaskCreateResponse: + # 1. Submit the task to the external service (BigQuery, DataBricks, etc.) + # 2. Create a task metadata such as jobID. + # 3. Return the task metadata, and keep in mind that the metadata should be serialized to bytes. + res = requests.post(url, json=data) + return CreateTaskResponse(resource_meta=json.dumps(asdict(Metadata(job_id=str(res.job_id)))).encode("utf-8")) + + def get(self, context: grpc.ServicerContext, resource_meta: bytes) -> TaskGetResponse: + # 1. Deserialize the metadata. + # 2. Use the metadata to get the job status. + # 3. Return the job status. + metadata = Metadata(**json.loads(resource_meta.decode("utf-8"))) + res = requests.get(url, json={"job_id": metadata.job_id}) + return GetTaskResponse(resource=Resource(state=res.state) + + def delete(self, context: grpc.ServicerContext, resource_meta: bytes) -> TaskDeleteResponse: + # 1. Deserialize the metadata. + # 2. Use the metadata to delete the job. + # 3. If failed to delete the job, add the error message to the grpc context. + # context.set_code(grpc.StatusCode.INTERNAL) + # context.set_details(f"failed to create task with error {e}") + try: + metadata = Metadata(**json.loads(resource_meta.decode("utf-8"))) + requests.delete(url, json={"job_id": metadata.job_id}) + except Exception as e: + logger.error(f"failed to delete task with error {e}") + context.set_code(grpc.StatusCode.INTERNAL) + context.set_details(f"failed to delete task with error {e}") + return DeleteTaskResponse() + +# To register the custom agent +AgentRegistry.register(CustomAgent()) +``` + +Here is an example of [BigQuery Agent](https://github.com/flyteorg/flytekit/blob/9977aac26242ebbede8e00d476c2fbc59ac5487a/plugins/flytekit-bigquery/flytekitplugins/bigquery/agent.py#L35) implementation. + +### How to test the agent + +Agent can be tested locally without running backend server. It makes the development of the agent easier. + +The task inherited from AsyncAgentExecutorMixin can be executed locally, allowing flytekit to mimic the propeller's behavior to call the agent. +In some cases, you should store credentials in your local environment when testing locally. +For example, you need to set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable when testing the BigQuery task. +After setting up the CREDENTIALS, you can run the task locally. Flytekit will automatically call the agent to create, get, or delete the task. + +```python +bigquery_doge_coin = BigQueryTask( + name=f"bigquery.doge_coin", + inputs=kwtypes(version=int), + query_template="SELECT * FROM `bigquery-public-data.crypto_dogecoin.transactions` WHERE version = @version LIMIT 10;", + output_structured_dataset_type=StructuredDataset, + task_config=BigQueryConfig(ProjectID="flyte-test-340607") +) +``` + +Task above task as an example, you can run the task locally and test agent with the following command: + +```bash +pyflyte run wf.py bigquery_doge_coin --version 10 +``` + +### Build a new image + +The following is a sample Dockerfile for building an image for a flyte agent. + +```Dockerfile +FROM python:3.9-slim-buster + +MAINTAINER Flyte Team +LABEL org.opencontainers.image.source=https://github.com/flyteorg/flytekit + +WORKDIR /root +ENV PYTHONPATH /root + +# flytekit will autoload the agent if package is installed. +RUN pip install flytekitplugins-bigquery +CMD pyflyte serve agent --port 8000 +``` + +:::{note} +For flytekit versions `<=v1.10.2`, use `pyflyte serve`. +For flytekit versions `>v1.10.2`, use `pyflyte serve agent`. +::: + +### Update FlyteAgent + +1. Update the FlyteAgent deployment's [image](https://github.com/flyteorg/flyte/blob/c049865cba017ad826405c7145cd3eccbc553232/charts/flyteagent/templates/agent/deployment.yaml#L26) +2. Update the FlytePropeller configmap. + +```YAML +tasks: + task-plugins: + enabled-plugins: + - agent-service + default-for-task-types: + - bigquery_query_job_task: agent-service + - custom_task: agent-service + +plugins: + agent-service: + supportedTaskTypes: + - bigquery_query_job_task + - default_task + - custom_task + # By default, all the request will be sent to the default agent. + defaultAgent: + endpoint: "dns:///flyteagent.flyte.svc.cluster.local:8000" + insecure: true + timeouts: + GetTask: 200ms + defaultTimeout: 50ms + agents: + custom_agent: + endpoint: "dns:///custom-flyteagent.flyte.svc.cluster.local:8000" + insecure: false + defaultServiceConfig: '{"loadBalancingConfig": [{"round_robin":{}}]}' + timeouts: + GetTask: 100ms + defaultTimeout: 20ms + agentForTaskTypes: + # It will override the default agent for custom_task, which means propeller will send the request to this agent. + - custom_task: custom_agent +``` + +3. Restart the FlytePropeller + +``` +kubectl rollout restart deployment flytepropeller -n flyte +``` diff --git a/docs/user_guide/development_lifecycle/cache_serializing.md b/docs/user_guide/development_lifecycle/cache_serializing.md new file mode 100644 index 0000000000..f570b1c351 --- /dev/null +++ b/docs/user_guide/development_lifecycle/cache_serializing.md @@ -0,0 +1,74 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Cache serializing + +```{eval-rst} +.. tags:: Intermediate +``` + +Serializing means only executing a single instance of a unique cacheable task (determined by the cache_version parameter and task signature) at a time. Using this mechanism, Flyte ensures that during multiple concurrent executions of a task only a single instance is evaluated and all others wait until completion and reuse the resulting cached outputs. + +Ensuring serialized evaluation requires a small degree of overhead to coordinate executions using a lightweight artifact reservation system. Therefore, this should be viewed as an extension to rather than a replacement for non-serialized cacheable tasks. It is particularly well fit for long running or otherwise computationally expensive tasks executed in scenarios similar to the following examples: + +- Periodically scheduled workflow where a single task evaluation duration may span multiple scheduled executions. +- Running a commonly shared task within different workflows (which receive the same inputs). + ++++ {"lines_to_next_cell": 0} + +For any {py:func}`flytekit.task` in Flyte, there is always one required import, which is: + +```{code-cell} +from flytekit import task +``` + ++++ {"lines_to_next_cell": 0} + +Task cache serializing is disabled by default to avoid unexpected behavior for task executions. To enable use the `cache_serialize` parameter. +`cache_serialize` is a switch to enable or disable serialization of the task +This operation is only useful for cacheable tasks, where one may reuse output from a previous execution. Flyte requires implicitly enabling the `cache` parameter on all cache serializable tasks. +Cache key definitions follow the same rules as non-serialized cache tasks. It is important to understand the implications of the task signature and `cache_version` parameter in defining cached results. + +```{code-cell} +:lines_to_next_cell: 2 + +@task(cache=True, cache_serialize=True, cache_version="1.0") +def square(n: int) -> int: + """ + Parameters: + n (int): name of the parameter for the task will be derived from the name of the input variable. + The type will be automatically deduced to Types.Integer + + Return: + int: The label for the output will be automatically assigned, and the type will be deduced from the annotation + + """ + return n * n +``` + +In the above example calling `square(n=2)` multiple times concurrently (even in different executions or workflows) will only execute the multiplication operation once. +Concurrently evaluated tasks will wait for completion of the first instance before reusing the cached results and subsequent evaluations will instantly reuse existing cache results. + ++++ + +## How does serializing caches work? + +The cache serialize paradigm introduces a new artifact reservation system. Tasks may use this reservation system to acquire an artifact reservation, indicating that they are actively evaluating the task, and release the reservation, once the execution is completed. Flyte uses a clock-skew algorithm to define reservation timeouts. Therefore, tasks are required to periodically extend the reservation during execution. + +The first execution of a serializable cached task will successfully acquire the artifact reservation. Execution will be performed as usual and upon completion, the results are written to the cache and reservation is released. Concurrently executed task instances (i.e. in parallel with the initial execution) will observe an active reservation, in which case the execution will wait until the next reevaluation and perform another check. Once the initial execution completes it will reuse the cached results. Subsequently executed task instances (i.e. after an execution has already completed successfully) will immediately reuse the existing cached results. + +Flyte handles task execution failures using a timeout on the reservation. If the task currently holding the reservation fails to extend it before it times out, another task may acquire the reservation and begin executing the task. diff --git a/docs/user_guide/development_lifecycle/caching.md b/docs/user_guide/development_lifecycle/caching.md new file mode 100644 index 0000000000..b0cc202cd9 --- /dev/null +++ b/docs/user_guide/development_lifecycle/caching.md @@ -0,0 +1,240 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Caching + +```{eval-rst} +.. tags:: Basic +``` + +Flyte provides the ability to cache the output of task executions to make the subsequent executions faster. A well-behaved Flyte task should generate deterministic output given the same inputs and task functionality. + +Task caching is useful when a user knows that many executions with the same inputs may occur. For example, consider the following scenarios: + +- Running a task periodically on a schedule +- Running the code multiple times when debugging workflows +- Running the commonly shared tasks amongst different workflows, which receive the same inputs + +Let's watch a brief explanation of caching and a demo in this video, followed by how task caching can be enabled . + +```{eval-rst} +.. youtube:: WNkThCp-gqo + +``` + ++++ {"lines_to_next_cell": 0} + +Import the necessary libraries. + +```{code-cell} +import time + +import pandas +``` + ++++ {"lines_to_next_cell": 0} + +For any {py:func}`flytekit.task` in Flyte, there is always one required import, which is: + +```{code-cell} +:lines_to_next_cell: 1 + +from flytekit import HashMethod, task, workflow +from flytekit.core.node_creation import create_node +from typing_extensions import Annotated +``` + +Task caching is disabled by default to avoid unintended consequences of caching tasks with side effects. To enable caching and control its behavior, use the `cache` and `cache_version` parameters when constructing a task. +`cache` is a switch to enable or disable the cache, and `cache_version` pertains to the version of the cache. +`cache_version` field indicates that the task functionality has changed. +Bumping the `cache_version` is akin to invalidating the cache. +You can manually update this version and Flyte caches the next execution instead of relying on the old cache. + +```{code-cell} +@task(cache=True, cache_version="1.0") # noqa: F841 +def square(n: int) -> int: + """ + Parameters: + n (int): name of the parameter for the task will be derived from the name of the input variable. + The type will be automatically deduced to ``Types.Integer``. + + Return: + int: The label for the output will be automatically assigned, and the type will be deduced from the annotation. + + """ + return n * n +``` + +In the above example, calling `square(n=2)` twice (even if it's across different executions or different workflows) will only execute the multiplication operation once. +The next time, the output will be made available immediately since it is captured from the previous execution with the same inputs. + ++++ + +If in a subsequent code update, you update the signature of the task to return the original number along with the result, it'll automatically invalidate the cache (even though the cache version remains the same). + +```python +@task(cache=True, cache_version="1.0") +def square(n: int) -> Tuple[int, int]: + ... +``` + ++++ + +:::{note} +If the user changes the task interface in any way (such as adding, removing, or editing inputs/outputs), Flyte treats that as a task functionality change. In the subsequent execution, Flyte runs the task and stores the outputs as newly cached values. +::: + +## How does caching work? + +Caching is implemented differently depending on the mode the user is running, i.e. whether they are running locally or using remote Flyte. + +### How does remote caching work? + +The cache keys for remote task execution are composed of **Project**, **Domain**, **Cache Version**, **Task Signature**, and **Inputs** associated with the execution of the task, as per the following definitions: + +- **Project:** A task run under one project cannot use the cached task execution from another project which would cause inadvertent results between project teams that could result in data corruption. +- **Domain:** To separate test, staging, and production data, task executions are not shared across these environments. +- **Cache Version:** When task functionality changes, you can change the `cache_version` of the task. Flyte will know not to use older cached task executions and create a new cache entry on the subsequent execution. +- **Task Signature:** The cache is specific to the task signature associated with the execution. The signature constitutes the task name, input parameter names/types, and the output parameter name/type. +- **Task Input Values:** A well-formed Flyte task always produces deterministic outputs. This means, given a set of input values, every execution should have identical outputs. When task execution is cached, the input values are part of the cache key. + +The remote cache for a particular task is invalidated in two ways: + +1. Modifying the `cache_version`; +2. Updating the task signature. + +:::{note} +Task executions can be cached across different versions of the task because a change in SHA does not necessarily mean that it correlates to a change in the task functionality. +::: + +### How does local caching work? + +The flytekit package uses the [diskcache](https://github.com/grantjenks/python-diskcache) package, specifically [diskcache.Cache](http://www.grantjenks.com/docs/diskcache/tutorial.html#cache), to aid in the memoization of task executions. The results of local task executions are stored under `~/.flyte/local-cache/` and cache keys are composed of **Cache Version**, **Task Signature**, and **Task Input Values**. + +Similar to the remote case, a local cache entry for a task will be invalidated if either the `cache_version` or the task signature is modified. In addition, the local cache can also be emptied by running the following command: `pyflyte local-cache clear`, which essentially obliterates the contents of the `~/.flyte/local-cache/` directory. + +:::{note} +The format used by the store is opaque and not meant to be inspectable. +::: + +(cache-offloaded-objects)= + +## Caching of non-Flyte offloaded objects + +The default behavior displayed by Flyte's memoization feature might not match the user intuition. For example, this code makes use of pandas dataframes: + +```{code-cell} +@task +def foo(a: int, b: str) -> pandas.DataFrame: + df = pandas.DataFrame(...) + ... + return df + + +@task(cache=True, cache_version="1.0") +def bar(df: pandas.DataFrame) -> int: + ... + + +@workflow +def wf(a: int, b: str): + df = foo(a=a, b=b) + v = bar(df=df) # noqa: F841 +``` + +If run twice with the same inputs, one would expect that `bar` would trigger a cache hit, but it turns out that's not the case because of how dataframes are represented in Flyte. +However, with release 1.2.0, Flyte provides a new way to control memoization behavior of literals. This is done via a `typing.Annotated` call on the task signature. +For example, in order to cache the result of calls to `bar`, you can rewrite the code above like this: + +```{code-cell} +def hash_pandas_dataframe(df: pandas.DataFrame) -> str: + return str(pandas.util.hash_pandas_object(df)) + + +@task +def foo_1( # noqa: F811 + a: int, b: str # noqa: F821 +) -> Annotated[pandas.DataFrame, HashMethod(hash_pandas_dataframe)]: # noqa: F821 # noqa: F821 + df = pandas.DataFrame(...) # noqa: F821 + ... + return df + + +@task(cache=True, cache_version="1.0") # noqa: F811 +def bar_1(df: pandas.DataFrame) -> int: # noqa: F811 + ... # noqa: F811 + + +@workflow +def wf_1(a: int, b: str): # noqa: F811 + df = foo(a=a, b=b) # noqa: F811 + v = bar(df=df) # noqa: F841 +``` + +Note how the output of task `foo` is annotated with an object of type `HashMethod`. Essentially, it represents a function that produces a hash that is used as part of the cache key calculation in calling the task `bar`. + +### How does caching of offloaded objects work? + +Recall how task input values are taken into account to derive a cache key. +This is done by turning the literal representation into a string and using that string as part of the cache key. In the case of dataframes annotated with `HashMethod` we use the hash as the representation of the Literal. In other words, the literal hash is used in the cache key. + +This feature also works in local execution. + ++++ + +Here's a complete example of the feature: + +```{code-cell} +def hash_pandas_dataframe(df: pandas.DataFrame) -> str: + return str(pandas.util.hash_pandas_object(df)) + + +@task +def uncached_data_reading_task() -> Annotated[pandas.DataFrame, HashMethod(hash_pandas_dataframe)]: + return pandas.DataFrame({"column_1": [1, 2, 3]}) + + +@task(cache=True, cache_version="1.0") +def cached_data_processing_task(df: pandas.DataFrame) -> pandas.DataFrame: + time.sleep(1) + return df * 2 + + +@task +def compare_dataframes(df1: pandas.DataFrame, df2: pandas.DataFrame): + assert df1.equals(df2) + + +@workflow +def cached_dataframe_wf(): + raw_data = uncached_data_reading_task() + + # Execute `cached_data_processing_task` twice, but force those + # two executions to happen serially to demonstrate how the second run + # hits the cache. + t1_node = create_node(cached_data_processing_task, df=raw_data) + t2_node = create_node(cached_data_processing_task, df=raw_data) + t1_node >> t2_node + + # Confirm that the dataframes actually match + compare_dataframes(df1=t1_node.o0, df2=t2_node.o0) + + +if __name__ == "__main__": + df1 = cached_dataframe_wf() + print(f"Running cached_dataframe_wf once : {df1}") +``` diff --git a/docs/user_guide/development_lifecycle/creating_a_new_project.md b/docs/user_guide/development_lifecycle/creating_a_new_project.md new file mode 100644 index 0000000000..7741b810d2 --- /dev/null +++ b/docs/user_guide/development_lifecycle/creating_a_new_project.md @@ -0,0 +1,28 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Creating a new project + +Creates project to be used as a home for the flyte resources of tasks and workflows. +Refer to the [flytectl API reference](https://docs.flyte.org/projects/flytectl/en/stable/gen/flytectl_create_project.html) +for more details. + +```{eval-rst} +.. prompt:: bash + + flytectl create project --id "my-flyte-project-name" --labels "my-label=my-project-label" --description "my-flyte-project-name" --name "my-flyte-project-name" +``` diff --git a/docs/user_guide/development_lifecycle/debugging_executions.md b/docs/user_guide/development_lifecycle/debugging_executions.md new file mode 100644 index 0000000000..7c8f9562d3 --- /dev/null +++ b/docs/user_guide/development_lifecycle/debugging_executions.md @@ -0,0 +1,46 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Debugging executions + +The inspection of task and workflow execution would provide you log links to debug things further + +Using `--details` flag would shows you node executions view with log links. + +``` +โ””โ”€โ”€ n1 - FAILED - 2021-06-30 08:51:07.3111846 +0000 UTC - 2021-06-30 08:51:17.192852 +0000 UTC +โ”‚ โ”œโ”€โ”€ Attempt :0 +โ”‚ โ””โ”€โ”€ Task - FAILED - 2021-06-30 08:51:07.3111846 +0000 UTC - 2021-06-30 08:51:17.192852 +0000 UTC +โ”‚ โ””โ”€โ”€ Logs : +โ”‚ โ””โ”€โ”€ Name :Kubernetes Logs (User) +โ”‚ โ””โ”€โ”€ URI :http://localhost:30082/#/log/flytectldemo-development/f3a5a4034960f4aa1a09-n1-0/pod?namespace=flytectldemo-development +``` + +Additionally you can check the pods launched by flyte in \-\ namespace + +``` +kubectl get pods -n - +``` + +The launched pods will have a prefix of execution name along with suffix of nodeId + +``` +NAME READY STATUS RESTARTS AGE +f65009af77f284e50959-n0-0 0/1 ErrImagePull 0 18h +``` + +So here the investigation can move ahead by describing the pod and checking the issue with Image pull. diff --git a/docs/user_guide/development_lifecycle/decks.md b/docs/user_guide/development_lifecycle/decks.md new file mode 100644 index 0000000000..5aae4955b1 --- /dev/null +++ b/docs/user_guide/development_lifecycle/decks.md @@ -0,0 +1,316 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(decks)= + +# Decks + +```{eval-rst} +.. tags:: UI, Intermediate +``` + +The Decks feature enables you to obtain customizable and default visibility into your tasks. +Think of it as a visualization tool that you can utilize within your Flyte tasks. + +Decks are equipped with a variety of {ref}`renderers `, +such as FrameRenderer and MarkdownRenderer. These renderers produce HTML files. +As an example, FrameRenderer transforms a DataFrame into an HTML table, and MarkdownRenderer converts Markdown text into HTML. + +Each task has a minimum of three decks: input, output and default. +The input/output decks are used to render the input/output data of tasks, +while the default deck can be used to render line plots, scatter plots or Markdown text. +Additionally, you can create new decks to render your data using custom renderers. + +:::{note} +Flyte Decks is an opt-in feature; to enable it, set `enable_deck` to `True` in the task parameters. +::: + +To begin, import the dependencies. + +```{code-cell} +import flytekit +from flytekit import ImageSpec, task +from flytekitplugins.deck.renderer import MarkdownRenderer +from sklearn.decomposition import PCA +``` + ++++ {"lines_to_next_cell": 0} + +We create a new deck named `pca` and render Markdown content along with a +[PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) plot. + +You can begin by initializing an {ref}`ImageSpec ` object to encompass all the necessary dependencies. +This approach automatically triggers a Docker build, alleviating the need for you to manually create a Docker image. + +```{code-cell} +custom_image = ImageSpec(name="flyte-decks-example", packages=["plotly"], registry="ghcr.io/flyteorg") + +if custom_image.is_container(): + import plotly + import plotly.express as px +``` + ++++ {"lines_to_next_cell": 0} + +:::{important} +Replace `ghcr.io/flyteorg` with a container registry you've access to publish to. +To upload the image to the local registry in the demo cluster, indicate the registry as `localhost:30000`. +::: + +Note the usage of `append` to append the Plotly deck to the Markdown deck. + +```{code-cell} +@task(enable_deck=True, container_image=custom_image) +def pca_plot(): + iris_df = px.data.iris() + X = iris_df[["sepal_length", "sepal_width", "petal_length", "petal_width"]] + pca = PCA(n_components=3) + components = pca.fit_transform(X) + total_var = pca.explained_variance_ratio_.sum() * 100 + fig = px.scatter_3d( + components, + x=0, + y=1, + z=2, + color=iris_df["species"], + title=f"Total Explained Variance: {total_var:.2f}%", + labels={"0": "PC 1", "1": "PC 2", "2": "PC 3"}, + ) + main_deck = flytekit.Deck("pca", MarkdownRenderer().to_html("### Principal Component Analysis")) + main_deck.append(plotly.io.to_html(fig)) +``` + ++++ {"lines_to_next_cell": 0} + +:::{Important} +To view the log output locally, the `FLYTE_SDK_LOGGING_LEVEL` environment variable should be set to 20. +::: + +The following is the expected output containing the path to the `deck.html` file: + +``` +{"asctime": "2023-07-11 13:16:04,558", "name": "flytekit", "levelname": "INFO", "message": "pca_plot task creates flyte deck html to file:///var/folders/6f/xcgm46ds59j7g__gfxmkgdf80000gn/T/flyte-0_8qfjdd/sandbox/local_flytekit/c085853af5a175edb17b11cd338cbd61/deck.html"} +``` + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_deck_plot_local.webp +:alt: Flyte deck plot +:class: with-shadow +::: + +Once you execute this task on the Flyte cluster, you can access the deck by clicking the _Flyte Deck_ button: + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_deck_button.png +:alt: Flyte deck button +:class: with-shadow +::: + +(deck_renderer)= + +## Deck renderer + +The Deck renderer is an integral component of the Deck plugin, which offers both default and personalized task visibility. +Within the Deck, an array of renderers is present, responsible for generating HTML files. + +These renderers showcase HTML in the user interface, facilitating the visualization and documentation of task-associated data. + +In the Flyte context, a collection of deck objects is stored. +When the task connected with a deck object is executed, these objects employ renderers to transform data into HTML files. + +### Available renderers + +#### Frame renderer + +Creates a profile report from a Pandas DataFrame. + +```{code-cell} +import pandas as pd +from flytekitplugins.deck.renderer import FrameProfilingRenderer + + +@task(enable_deck=True) +def frame_renderer() -> None: + df = pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}) + flytekit.Deck("Frame Renderer", FrameProfilingRenderer().to_html(df=df)) +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_frame_renderer.png +:alt: Frame renderer +:class: with-shadow +::: + ++++ {"lines_to_next_cell": 0} + +#### Top-frame renderer + +Renders DataFrame as an HTML table. +This renderer doesn't necessitate plugin installation since it's accessible within the flytekit library. + +```{code-cell} +from typing import Annotated + +from flytekit.deck import TopFrameRenderer + + +@task(enable_deck=True) +def top_frame_renderer() -> Annotated[pd.DataFrame, TopFrameRenderer(1)]: + return pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}) +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_top_frame_renderer.png +:alt: Top frame renderer +:class: with-shadow +::: + +#### Markdown renderer + +Converts a Markdown string into HTML, producing HTML as a Unicode string. + +```{code-cell} +@task(enable_deck=True) +def markdown_renderer() -> None: + flytekit.current_context().default_deck.append( + MarkdownRenderer().to_html("You can install flytekit using this command: ```import flytekit```") + ) +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_markdown_renderer.png +:alt: Markdown renderer +:class: with-shadow +::: + +#### Box renderer + +Groups rows of DataFrame together into a +box-and-whisker mark to visualize their distribution. + +Each box extends from the first quartile (Q1) to the third quartile (Q3). +The median (Q2) is indicated by a line within the box. +Typically, the whiskers extend to the edges of the box, +plus or minus 1.5 times the interquartile range (IQR: Q3-Q1). + +```{code-cell} +from flytekitplugins.deck.renderer import BoxRenderer + + +@task(enable_deck=True) +def box_renderer() -> None: + iris_df = px.data.iris() + flytekit.Deck("Box Plot", BoxRenderer("sepal_length").to_html(iris_df)) +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_box_renderer.png +:alt: Box renderer +:class: with-shadow +::: + +#### Image renderer + +Converts a {ref}`FlyteFile ` or `PIL.Image.Image` object into an HTML string, +where the image data is encoded as a base64 string. + +```{code-cell} +from flytekit import workflow +from flytekit.types.file import FlyteFile +from flytekitplugins.deck.renderer import ImageRenderer + + +@task(enable_deck=True) +def image_renderer(image: FlyteFile) -> None: + flytekit.Deck("Image Renderer", ImageRenderer().to_html(image_src=image)) + + +@workflow +def image_renderer_wf( + image: FlyteFile = "https://bit.ly/3KZ95q4", +) -> None: + image_renderer(image=image) +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_image_renderer.png +:alt: Image renderer +:class: with-shadow +::: + +#### Table renderer + +Converts a Pandas dataframe into an HTML table. + +```{code-cell} +from flytekitplugins.deck.renderer import TableRenderer + + +@task(enable_deck=True) +def table_renderer() -> None: + flytekit.Deck( + "Table Renderer", + TableRenderer().to_html(df=pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}), table_width=50), + ) +``` + ++++ {"lines_to_next_cell": 0} + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_table_renderer.png +:alt: Table renderer +:class: with-shadow +::: + +#### Source code renderer + +Converts source code to HTML and renders it as a Unicode string on the deck. + +```{code-cell} +:lines_to_next_cell: 2 + +import inspect + +from flytekitplugins.deck.renderer import SourceCodeRenderer + + +@task(enable_deck=True) +def source_code_renderer() -> None: + file_path = inspect.getsourcefile(frame_renderer.__wrapped__) + with open(file_path, "r") as f: + source_code = f.read() + flytekit.Deck( + "Source Code Renderer", + SourceCodeRenderer().to_html(source_code), + ) +``` + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_source_code_renderer.png +:alt: Source code renderer +:class: with-shadow +::: + +### Contribute to renderers + +Don't hesitate to integrate a new renderer into +[renderer.py](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py) +if your deck renderers can enhance data visibility. +Feel encouraged to open a pull request and play a part in enhancing the Flyte deck renderer ecosystem! diff --git a/docs/user_guide/development_lifecycle/index.md b/docs/user_guide/development_lifecycle/index.md new file mode 100644 index 0000000000..693740c661 --- /dev/null +++ b/docs/user_guide/development_lifecycle/index.md @@ -0,0 +1,22 @@ +# Development lifecycle + +In this section, you will discover Flyte's features that aid in local workflow development. +You will gain an understanding of concepts like caching, the Flyte remote API, Agents, Decks and more. + +```{toctree} +:maxdepth: -1 +:name: development_lifecycle_toc +:hidden: + +agents +private_images +caching +cache_serializing +decks +creating_a_new_project +running_tasks +running_workflows +running_launch_plans +inspecting_executions +debugging_executions +``` diff --git a/docs/user_guide/development_lifecycle/inspecting_executions.md b/docs/user_guide/development_lifecycle/inspecting_executions.md new file mode 100644 index 0000000000..1ce09ae155 --- /dev/null +++ b/docs/user_guide/development_lifecycle/inspecting_executions.md @@ -0,0 +1,83 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Inspecting executions + +## Flytectl + +Flytectl supports inspecting execution by retrieving its details. For a deeper dive, refer to the +[API reference](https://docs.flyte.org/projects/flytectl/en/stable/gen/flytectl_get_execution.html) guide. + +Monitor the execution by providing the execution id from create command which can be task or workflow execution. + +``` +flytectl get execution -p flytesnacks -d development +``` + +For more details use `--details` flag which shows node executions along with task executions on them. + +``` +flytectl get execution -p flytesnacks -d development --details +``` + +If you prefer to see yaml/json view for the details then change the output format using the -o flag. + +``` +flytectl get execution -p flytesnacks -d development --details -o yaml +``` + +To see the results of the execution you can inspect the node closure outputUri in detailed yaml output. + +``` +"outputUri": "s3://my-s3-bucket/metadata/propeller/flytesnacks-development-/n0/data/0/outputs.pb" +``` + +## FlyteRemote + +With FlyteRemote, you can fetch the inputs and outputs of executions and inspect them. + +```python +from flytekit.remote import FlyteRemote + +# FlyteRemote object is the main entrypoint to API +remote = FlyteRemote( + config=Config.for_endpoint(endpoint="flyte.example.net"), + default_project="flytesnacks", + default_domain="development", +) + +execution = remote.fetch_execution( + name="fb22e306a0d91e1c6000", project="flytesnacks", domain="development" +) + +input_keys = execution.inputs.keys() +output_keys = execution.outputs.keys() + +# The inputs and outputs correspond to the top-level execution or the workflow itself. +# To fetch a specific output, say, a model file: +model_file = execution.outputs["model_file"] +with open(model_file) as f: + ... + +# You can use FlyteRemote.sync() to sync the entity object's state with the remote state during the execution run. +synced_execution = remote.sync(execution, sync_nodes=True) +node_keys = synced_execution.node_executions.keys() + +# node_executions will fetch all the underlying node executions recursively. +# To fetch output of a specific node execution: +node_execution_output = synced_execution.node_executions["n1"].outputs["model_file"] +``` diff --git a/docs/user_guide/development_lifecycle/private_images.md b/docs/user_guide/development_lifecycle/private_images.md new file mode 100644 index 0000000000..5ebd41ea73 --- /dev/null +++ b/docs/user_guide/development_lifecycle/private_images.md @@ -0,0 +1,72 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(private_images)= + +# Private images + +As we learned in the {ref}`Flyte Fundamentals ` guide, +Flyte uses OCI-compatible containers to package up your code and third-party +dependencies. For production use-cases your images may require proprietary code +and configuration that you want to keep private. + +You can use different private container registries to host your images, such as +[AWS ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html), +[Docker Hub](https://docs.docker.com/docker-hub/repos/#private-repositories), +[GitLab Container Registry](https://docs.gitlab.com/ee/ci/docker/using_docker_images.html#access-an-image-from-a-private-container-registry), +and [GitHub Container Registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry). + +To pull private images, ensure that you have the command line tools and login +information associated with the registry. + +## Create a secret + +First [create a secret](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) +that contains all the credentials needed to log into the registry. + +## Configure `imagePullSecrets` + +Then, you'll need to specify a `imagePullSecrets` configuration to pull a +private image using one of two methods below. + +```{eval-rst} +.. tabs:: + + .. tab:: Service Account + + You can use the default or new service account for this option: + + 1. Add your ``imagePullSecrets`` configuration to the + `service account `__. + 2. Use this service account to log into the private registry and pull the image. + 3. When you create a task/workflow execution this service account should + be specified to access the private image. + + .. tab:: Custom Pod Template + + This option uses a `custom pod template `__ + to create a pod. This template is automatically added to every ``pod`` that + Flyte creates. + + 1. Add your ``imagePullSecrets`` configuration to this custom pod template. + 2. Update `FlytePropeller `__ about the pod created in the previous step. + 3. FlytePropeller adds ``imagePullSecrets``, along with other customization for the pod, + to the PodSpec, which should look similar to this + `manifest `__. + 4. The pods with their keys can log in and access the images in the private registry. + Once you set up the token to authenticate with the private registry, you can pull images from them. +``` diff --git a/docs/user_guide/development_lifecycle/running_launch_plans.md b/docs/user_guide/development_lifecycle/running_launch_plans.md new file mode 100644 index 0000000000..1fb8bb4c2c --- /dev/null +++ b/docs/user_guide/development_lifecycle/running_launch_plans.md @@ -0,0 +1,84 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(remote_launchplan)= + +# Running launch plans + +## Flytectl + +This is multi-steps process where we create an execution spec file, update the spec file and then create the execution. +More details can be found [here](https://docs.flyte.org/projects/flytectl/en/stable/gen/flytectl_create_execution.html). + +**Generate an execution spec file** + +``` +flytectl get launchplan -p flytesnacks -d development myapp.workflows.example.my_wf --execFile exec_spec.yaml +``` + +**Update the input spec file for arguments to the workflow** + +``` +.... +inputs: + name: "adam" +.... +``` + +**Create execution using the exec spec file** + +``` +flytectl create execution -p flytesnacks -d development --execFile exec_spec.yaml +``` + +**Monitor the execution by providing the execution id from create command** + +``` +flytectl get execution -p flytesnacks -d development +``` + +## FlyteRemote + +A launch plan can be launched via FlyteRemote programmatically. + +```python +from flytekit.remote import FlyteRemote +from flytekit.configuration import Config +from flytekit import LaunchPlan + +# FlyteRemote object is the main entrypoint to API +remote = FlyteRemote( + config=Config.for_endpoint(endpoint="flyte.example.net"), + default_project="flytesnacks", + default_domain="development", +) + +# Fetch launch plan +flyte_lp = remote.fetch_launch_plan( + name="workflows.example.wf", version="v1", project="flytesnacks", domain="development" +) + +# Execute +execution = remote.execute( + flyte_lp, inputs={"mean": 1}, execution_name="lp-execution", wait=True +) + +# Or use execution_name_prefix to avoid repeated execution names +execution = remote.execute( + flyte_lp, inputs={"mean": 1}, execution_name_prefix="flyte", wait=True +) +``` diff --git a/docs/user_guide/development_lifecycle/running_tasks.md b/docs/user_guide/development_lifecycle/running_tasks.md new file mode 100644 index 0000000000..882380109d --- /dev/null +++ b/docs/user_guide/development_lifecycle/running_tasks.md @@ -0,0 +1,98 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(remote_task)= + +# Running tasks + +## Flytectl + +This is a multi-step process where we create an execution spec file, update the spec file, and then create the execution. +More details can be found in the [Flytectl API reference](https://docs.flyte.org/projects/flytectl/en/stable/gen/flytectl_create_execution.html). + +**Generate execution spec file** + +``` +flytectl get tasks -d development -p flytesnacks workflows.example.generate_normal_df --latest --execFile exec_spec.yaml +``` + +**Update the input spec file for arguments to the workflow** + +``` +iamRoleARN: 'arn:aws:iam::12345678:role/defaultrole' +inputs: + n: 200 + mean: 0.0 + sigma: 1.0 +kubeServiceAcct: "" +targetDomain: "" +targetProject: "" +task: workflows.example.generate_normal_df +version: "v1" +``` + +**Create execution using the exec spec file** + +``` +flytectl create execution -p flytesnacks -d development --execFile exec_spec.yaml +``` + +**Monitor the execution by providing the execution id from create command** + +``` +flytectl get execution -p flytesnacks -d development +``` + +## FlyteRemote + +A task can be launched via FlyteRemote programmatically. + +```python +from flytekit.remote import FlyteRemote +from flytekit.configuration import Config, SerializationSettings + +# FlyteRemote object is the main entrypoint to API +remote = FlyteRemote( + config=Config.for_endpoint(endpoint="flyte.example.net"), + default_project="flytesnacks", + default_domain="development", +) + +# Get Task +flyte_task = remote.fetch_task(name="workflows.example.generate_normal_df", version="v1") + +flyte_task = remote.register_task( + entity=flyte_task, + serialization_settings=SerializationSettings(image_config=None), + version="v2", +) + +# Run Task +execution = remote.execute( + flyte_task, inputs={"n": 200, "mean": 0.0, "sigma": 1.0}, execution_name="task-execution", wait=True +) + +# Or use execution_name_prefix to avoid repeated execution names +execution = remote.execute( + flyte_task, inputs={"n": 200, "mean": 0.0, "sigma": 1.0}, execution_name_prefix="flyte", wait=True +) + +# Inspecting execution +# The 'inputs' and 'outputs' correspond to the task execution. +input_keys = execution.inputs.keys() +output_keys = execution.outputs.keys() +``` diff --git a/docs/user_guide/development_lifecycle/running_workflows.md b/docs/user_guide/development_lifecycle/running_workflows.md new file mode 100644 index 0000000000..2e04714adc --- /dev/null +++ b/docs/user_guide/development_lifecycle/running_workflows.md @@ -0,0 +1,59 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Running workflows + +Workflows on their own are not runnable directly. However, a launchplan is always bound to a workflow and you can use +launchplans to **launch** a workflow. For cases in which you want the launchplan to have the same arguments as a workflow, +if you are using one of the SDK's to author your workflows - like flytekit, flytekit-java etc, then they should +automatically create a `default launchplan` for the workflow. + +A `default launchplan` has the same name as the workflow and all argument defaults are similar. See +{ref}`Launch Plans` to run a workflow via the default launchplan. + +{ref}`Tasks also can be executed ` using the launch command. +One difference between running a task and a workflow via launchplans is that launchplans cannot be associated with a +task. This is to avoid triggers and scheduling. + +## FlyteRemote + +Workflows can be executed with FlyteRemote because under the hood it fetches and triggers a default launch plan. + +```python +from flytekit.remote import FlyteRemote +from flytekit.configuration import Config + +# FlyteRemote object is the main entrypoint to API +remote = FlyteRemote( + config=Config.for_endpoint(endpoint="flyte.example.net"), + default_project="flytesnacks", + default_domain="development", +) + +# Fetch workflow +flyte_workflow = remote.fetch_workflow(name="workflows.example.wf", version="v1") + +# Execute +execution = remote.execute( + flyte_workflow, inputs={"mean": 1}, execution_name="workflow-execution", wait=True +) + +# Or use execution_name_prefix to avoid repeated execution names +execution = remote.execute( + flyte_workflow, inputs={"mean": 1}, execution_name_prefix="flyte", wait=True +) +``` diff --git a/docs/user_guide/environment_setup.md b/docs/user_guide/environment_setup.md new file mode 100644 index 0000000000..1fc41e749b --- /dev/null +++ b/docs/user_guide/environment_setup.md @@ -0,0 +1,245 @@ +(env_setup)= + +# Environment setup + +## Prerequisites + +- Make sure you have [docker](https://docs.docker.com/get-docker/) and [git](https://git-scm.com/) installed. +- Install {doc}`flytectl `, the commandline interface for Flyte. + +## Repo setup + +As we intend to execute the code locally, duplicate this code block into `hello_world.py`. + +```python +from flytekit import task, workflow + +@task +def say_hello() -> str: + return "Hello, World!" + +@workflow +def hello_world_wf() -> str: + res = say_hello() + return res + +if __name__ == "__main__": + print(f"Running hello_world_wf() {hello_world_wf()}") +``` + +To install `flytekit`, run the following command: + +``` +pip install flytekit +``` + +:::{tip} +**Recommended**: Create a new python virtual environment to make sure it doesn't interfere with your +development environment. You can do this by running the following commands in your terminal: + +```{prompt} bash +python -m venv ~/venvs/flyte-examples +source ~/venvs/flyte-examples/bin/activate +``` + +::: + +To make sure everything is working in your virtual environment, run `hello_world.py` locally: + +```{prompt} bash +python hello_world.py +``` + +Expected output: + +```{prompt} +Running hello_world_wf() Hello, World! +``` + +## Create a local demo Flyte cluster + +```{important} +Make sure the Docker daemon is running before starting the demo cluster. +``` + +Use `flytectl` to start a demo Flyte cluster: + +```{prompt} bash +flytectl demo start +``` + +After this completes, be sure to export the Flyte config as it will be essential later. Run the command in the output that looks like this: + +```{prompt} bash +export FLYTECTL_CONFIG= ~//.flyte/config-sandbox.yaml +``` + +## Running workflows + +Now you can run the example workflow locally using the default Docker image bundled with `flytekit`: + +```{prompt} bash +pyflyte run hello_world.py hello_world_wf +``` + +:::{note} +The initial arguments of `pyflyte run` take the form of +`path/to/script.py `, where `` +refers to the function decorated with `@task` or `@workflow` that you wish to run. +::: + +To run the workflow on the demo Flyte cluster, all you need to do is supply the `--remote` flag: + +``` +pyflyte run --remote hello_world.py hello_world_wf +``` + +You can also run the code directly from a remote source: + +``` +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py \ + hello_world_wf +``` + +You should see an output that looks like: + +```{prompt} +Go to https:///console/projects/flytesnacks/domains/development/executions/ to see execution in the console. +``` + +You can visit this URL to inspect the execution as it runs: + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/common/first_run_console.gif +:alt: A quick visual tour for launching your first Workflow. +::: + +Finally, run a workflow that takes some inputs, for example the `workflow.py` example: + +```{prompt} bash +pyflyte run --remote \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \ + simple_wf --x '[-3,0,3]' --y '[7,4,-2]' +``` + +:::{note} +We're passing in the workflow inputs as additional options to `pyflyte run`. In the above example, the +inputs are `--x '[-3,0,3]'` and `--y '[7,4,-2]'`. For snake-case argument names like `arg_name`, you can provide the +option as `--arg-name`. +::: + +## Visualizing workflows + +Workflows can be visualized as DAGs on the UI. +However, you can visualize workflows on the browser and in the terminal by _just_ using your terminal. + +To view workflow on the browser: + +```{prompt} bash $ +flytectl get workflows \ + --project flytesnacks \ + --domain development \ + --version \ + -o doturl \ + basics.workflow.simple_wf +``` + +To view workflow as a `strict digraph` on the command line: + +```{prompt} bash $ +flytectl get workflows \ + --project flytesnacks \ + --domain development \ + --version \ + -o dot \ + basics.workflow.simple_wf +``` + +Replace `` with the version obtained from the console UI, +which might resemble something like `BLrGKJaYsW2ME1PaoirK1g==`. + +:::{tip} +Running most of the examples in the **User guide** only requires the default Docker image that ships with Flyte. +Many examples in the {ref}`tutorials` and {ref}`integrations` section depend on additional libraries such as +`sklearn`, `pytorch` or `tensorflow`, which will not work with the default docker image used by `pyflyte run`. + +These examples will explicitly show you which images to use for running these examples by passing in the Docker +image you want to use with the `--image` option in `pyflyte run`. +::: + +๐ŸŽ‰ Congrats! Now you can run all the examples in the {ref}`userguide` ๐ŸŽ‰! + +## Configuring the demo cluster to use additional resources + +Depending on how resource intensive your workflows are, you may encounter errors such as +OOM (Out of Memory) errors or find pods with the status OOMKilled. +It is crucial to understand that the demo cluster is not set up to immediately accommodate +all workflow requirements, and some resource requests may be ignored based on the cluster's limits. + +:::{tip} +Keep in mind that, for production deployments, you should give careful consideration to +these configurations rather than simply setting large numbers. +::: + +Here's how you can go about modifying the configurations: + +1. Add cluster resource attributes to `cra.yaml`: + +``` +attributes: + projectQuotaCpu: "1000" + projectQuotaMemory: 5Ti +project: flytesnacks +domain: development +``` + +2. Add task resource attributes to `tra.yaml`: + +``` +defaults: + cpu: "2" + memory: 1Gi +limits: + cpu: "1000" + memory: 5Ti +project: flytesnacks +domain: development +``` + +3. Apply the two configuration files using the following commands: + +``` +$ flytectl update task-resource-attribute --attrFile tra.yaml +$ flytectl update cluster-resource-attribute --attrFile cra.yaml +``` + +4. Confirm that the configuration is applied using the following commands: + +``` +$ flytectl get task-resource-attribute -p flytesnacks -d development +{"project":"flytesnacks","domain":"development","defaults":{"cpu":"2","memory":"1Gi"},"limits":{"cpu":"1000","memory":"5Ti"}} + +$ flytectl get cluster-resource-attribute -p flytesnacks -d development +{"project":"flytesnacks","domain":"development","attributes":{"projectQuotaCpu":"1000","projectQuotaMemory":"5Ti"}} +``` + +And that's it! You have successfully modified your Flyte demo cluster to accommodate resource intensive workloads. + +For more information, refer to the +[Configuring Custom K8s Resources](https://docs.flyte.org/en/latest/deployment/configuration/general.html) guide. + +## Local registry + +If you find yourself using tasks dependent on `ImageSpec` containers built with `envd` on the demo cluster, +before you submit your workflow, you will need to inform `envd` how to push the images it builds to the cluster. +This can be done via: + +``` +envd context create --name flyte-sandbox --builder tcp --builder-address localhost:30003 --use +``` + +You will also need to update your `ImageSpec` instances to set `registry="localhost:30000"`. + +## What's next? + +Try out the examples in the {doc}`Basics ` section. diff --git a/docs/user_guide/extending/backend_plugins.md b/docs/user_guide/extending/backend_plugins.md new file mode 100644 index 0000000000..876ef8019e --- /dev/null +++ b/docs/user_guide/extending/backend_plugins.md @@ -0,0 +1,91 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +%% [markdown] +(extend-plugin-flyte-backend)= + +# Backend plugins + +```{eval-rst} +.. tags:: Extensibility, Contribute, Intermediate +``` + +This guide will take you through the why and how of writing a backend plugin for +Flyte. + +To recap, here are a few examples of why you would want to implement a backend plugin: + +1. We want to add a new capability to the Flyte Platform, for example we might want to: + - Talk to a new service like AWS Sagemaker, Snowflake, Redshift, Athena, BigQuery, etc. + - Orchestrate a set of containers in a new way like Spark, Flink, Distributed + training on Kubernetes (usually using a Kubernetes operator). + - Use a new container orchestration engine like AWS Batch/ECS, Hashicorp' Nomad + - Use a completely new runtime like AWS Lambda, KNative, etc. +3. You want to retain the capability to update the plugin implementation and roll + out new changes and fixes without affecting the users code or requiring them to update + versions of their plugins. +4. You want the same plugin to be accessible across multiple language SDK's. + +```{note} +Talking to a new service can be done using flytekit extensions and usually is the better way to get started. But, once matured, most of these extensions are better to be migrated to the backend. For the rest of the cases, it is possible to extend flytekit to achieve these scenarios, but this is less desirable, because of the associated overhead of first launching a container that launches these jobs downstream. +``` + +## Basics + +In this section we'll go through the components of a backend plugin using the {ref}`K8s Spark plugin` as a reference. A Flyte backend extension consists of 3 parts: interface +specification, `flytekit` plugin implementation, and `flytepropeller` plugin implementation. + +### Interface specification + +Usually Flyte extensions need information that is not covered by a {std:ref}`Flyte TaskTemplate `. The TaskTemplate consists of a +the interface, task_type identifier, some metadata and other fields. + +```{note} +An important field to note here is {std:ref}`custom `. The custom field is essentially an unstructured JSON. This makes it possible to extend a task-template beyond the default supported targets {std:ref}`container `. + +The motivation of the `custom`` field is to marshal a JSON structure that specifies information beyond what a regular TaskTemplate can capture. The actual structure of the JSON is known only to the implemented backend-plugin and the SDK components. The core Flyte platform, does not understand of look into the specifics of this structure. +``` + +It is highly recommended to use an interface definition language like Protobuf, OpenAPISpec etc to declare specify the structure of the JSON. From here, on we refer to this as the **Plugin Specification**. + +```{note} +For Spark we decided to use Protobuf to specify the plugin as can be seen [here](https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/plugins/spark.proto). Note it isn't necessary to have the Plugin structure specified in `flyteidl`, but we do it for simplicity, ease of maintenance alongside the core platform, and convenience leveraging existing tooling to generate code for protobuf. +``` + +### Flytekit plugin implementation + +Now that you have a specification, we have to implement a method to generate this new TaskTemplate, with the special custom field. Also, this is where the UX design comes into play. You want to write the best possible interface in the SDK that users are delighted to use. The end goal is to create the TaskTemplate with the Custom field populated with the actual JSON structure. + +We will currently refer to the Python `flytekit` SDK as an example for extending and +implementing the SDK. + +The SDK task should be implemented as an extension of {py:class}`flytekit.core.base_task.PythonTask`, or more commonly {py:class}`flytekit.PythonFunctionTask`. +In the case of Spark, we extend the {py:class}`flytekit.PythonFunctionTask`, as shown [here](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-spark/flytekitplugins/spark/task.py#L77-L123). + +The `SparkTask` is implemented as a regular flytekit plugin, with one exception: the `custom` field is now actually the `SparkJob` protocol buffer. When serializing a task, `flytekit` base classes will automatically invoke the [`get_custom` method](https://github.com/flyteorg/flytekit/blob/c02075d472b5587d199630bcfc7f9937673c6a0e/flytekit/core/base_task.py#L255). + +### FlytePropeller backend plugin + +The backend plugin is where the actual logic of the execution is implemented. The backend plugin uses the **Flyte PluginMachinery** interface to implement a plugin which can be one of the following supported types: + +1. [Kubernetes operator Plugin](https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.26/go/tasks/pluginmachinery/k8s#Plugin): The demo in the video below shows two examples of K8s backend plugins: flytekit `Athena` & `Spark`, and Flyte K8s `Pod` & `Spark`. + + ```{youtube} oK2RGQuP94k + ``` + +2. **A Web API plugin:** [Async](https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.26/go/tasks/pluginmachinery/webapi#AsyncPlugin) or [Sync](https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.26/go/tasks/pluginmachinery/webapi#SyncPlugin). +3. [Core Plugin](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/core#Plugin): if none of the above fits diff --git a/docs/user_guide/extending/container_interface.md b/docs/user_guide/extending/container_interface.md new file mode 100644 index 0000000000..c0be559d76 --- /dev/null +++ b/docs/user_guide/extending/container_interface.md @@ -0,0 +1,81 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(core-extend-flyte-container-interface)= + +# Container interface + +```{eval-rst} +.. tags:: Extensibility, Contribute, Intermediate +``` + +Flyte typically interacts with containers in the course of its task execution (since most tasks are container +tasks). This is what that process looks like: + +1. At compilation time for a container task, the arguments to that container (and the container image itself) are set. + + 1. This is done by flytekit for instance for your run of the mill `@task`. This step is **crucial** - the task needs to specify an image available in the registry configured in the flyte installation. + +2. At runtime, Flyte will execute your task via a plugin. The default container plugin will do the following: + + 1. Set a series of environment variables. + 2. Before running the container, search/replace values in the container arguments. The command templating section below details how this happens. + + :::{note} + This templating process *should* be done by **all** plugins, even plugins that don't run a container but need + some information from the execution side. For example, a query task that submits a query to an engine that + writes the output to the raw output location. Or a query that uses the unique retry key as a temp table name, etc. + ::: + +## Command templating + +The templating of container arguments at run-time is one of the more advanced constructs of Flyte, but one that +authors of new task types should be aware of. For example, when looking at the hello world task in the UI, +if you click the Task tab, you'd see JSON that contains something like the following: + +```json +"container": { + "command": [], + "args": [ + "pyflyte-execute", + "--inputs", + "{{.input}}", + "--output-prefix", + "{{.outputPrefix}}", + "--raw-output-data-prefix", + "{{.rawOutputDataPrefix}}", + "--resolver", + "flytekit.core.python_auto_container.default_task_resolver", + "--", + "task-module", + "core.basic.hello_world", + "task-name", + "say_hello" + ], +``` + +The following table explains what each of the `{{}}` items mean, along with some others. + +| Template | Example | Description | +| ------------------------ | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| {{.Input}} | `s3://my-bucket/inputs.pb` | Pb file containing a LiteralMap containing the inputs | +| {{.InputPrefix}} | `s3://my-bucket` | Just the bucket where the inputs.pb file can be found | +| {{.Inputs.\}} | `"hello world"` | For primitive inputs, the task can request that Flyte unpack the actual literal value, saving the task from having to download the file. Note that for Blob, Schema and StructuredDataset types, the uri where the data is stored will be filled in as the value. | +| {{.OutputPrefix}} | `s3://my-bucket/abc/data` | Location where the task should write a LiteralMap of output values in a file called `outputs.pb` | +| {{.RawOutputDataPrefix}} | `s3://your-data/` | Bucket where off-loaded data types (schemas, files, structureddatasets, etc.) are written. | +| {{.PerRetryUniqueKey}} | (random characters) | This is a random string that allows the task to differentiate between different executions of a task. Values will be unique per retry as well. | +| {{.TaskTemplatePath}} | `s3://my-bucket/task.pb` | For tasks that need the full task definition, use this template to access the full TaskTemplate IDL message. To ensure performance, propeller will not upload this file if this template was not requested by the task. | diff --git a/docs/user_guide/extending/custom_types.md b/docs/user_guide/extending/custom_types.md new file mode 100644 index 0000000000..af82d1a6ec --- /dev/null +++ b/docs/user_guide/extending/custom_types.md @@ -0,0 +1,195 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(advanced_custom_types)= + +# Custom types + +```{eval-rst} +.. tags:: Extensibility, Contribute, Intermediate +``` + +Flyte is a strongly-typed framework for authoring tasks and workflows. But there are situations when the existing +types do not directly work. This is true with any programming language! + +Similar to a programming language enabling higher-level concepts to describe user-specific objects such as classes in Python/Java/C++, struct in C/Golang, etc., +Flytekit allows modeling user classes. The idea is to make an interface that is more productive for the +use case, while writing a transformer that converts the user-defined type into one of the generic constructs in Flyte's type system. + +This example will try to model an example user-defined dataset and show how it can be seamlessly integrated with Flytekit's type engine. + +The example is demonstrated in the video below: + +```{eval-rst} +.. youtube:: 1xExpRzz8Tw + + +``` + ++++ {"lines_to_next_cell": 0} + +First, we import the dependencies. + +```{code-cell} +import os +import tempfile +import typing +from typing import Type + +from flytekit import Blob, BlobMetadata, BlobType, FlyteContext, Literal, LiteralType, Scalar, task, workflow +from flytekit.extend import TypeEngine, TypeTransformer +``` + ++++ {"lines_to_next_cell": 0} + +:::{note} +`FlyteContext` is used to access a random local directory. +::: + +Defined type here represents a list of files on the disk. We will refer to it as `MyDataset`. + +```{code-cell} +class MyDataset(object): + """ + ``MyDataset`` is a collection of files. In Flyte, this maps to a multi-part blob or directory. + """ + + def __init__(self, base_dir: str = None): + if base_dir is None: + self._tmp_dir = tempfile.TemporaryDirectory() + self._base_dir = self._tmp_dir.name + self._files = [] + else: + self._base_dir = base_dir + files = os.listdir(base_dir) + self._files = [os.path.join(base_dir, f) for f in files] + + @property + def base_dir(self) -> str: + return self._base_dir + + @property + def files(self) -> typing.List[str]: + return self._files + + def new_file(self, name: str) -> str: + new_file = os.path.join(self._base_dir, name) + self._files.append(new_file) + return new_file +``` + ++++ {"lines_to_next_cell": 0} + +`MyDataset` represents a set of files locally. However, when a workflow consists of multiple steps, we want the data to +flow between different steps. To achieve this, it is necessary to explain how the data will be transformed to +Flyte's remote references. To do this, we create a new instance of +{py:class}`~flytekit:flytekit.extend.TypeTransformer`, for the type `MyDataset` as follows: + +:::{note} +The `TypeTransformer` is a Generic abstract base class. The `Generic` type argument refers to the actual object +that we want to work with. In this case, it is the `MyDataset` object. +::: + +```{code-cell} +class MyDatasetTransformer(TypeTransformer[MyDataset]): + _TYPE_INFO = BlobType(format="binary", dimensionality=BlobType.BlobDimensionality.MULTIPART) + + def __init__(self): + super(MyDatasetTransformer, self).__init__(name="mydataset-transform", t=MyDataset) + + def get_literal_type(self, t: Type[MyDataset]) -> LiteralType: + """ + This is useful to tell the Flytekit type system that ``MyDataset`` actually refers to what corresponding type. + In this example, we say its of format binary (do not try to introspect) and there is more than one file in it. + """ + return LiteralType(blob=self._TYPE_INFO) + + def to_literal( + self, + ctx: FlyteContext, + python_val: MyDataset, + python_type: Type[MyDataset], + expected: LiteralType, + ) -> Literal: + """ + This method is used to convert from the given python type object ``MyDataset`` to the Literal representation. + """ + # Step 1: Upload all the data into a remote place recommended by Flyte + remote_dir = ctx.file_access.get_random_remote_directory() + ctx.file_access.upload_directory(python_val.base_dir, remote_dir) + # Step 2: Return a pointer to this remote_dir in the form of a Literal + return Literal(scalar=Scalar(blob=Blob(uri=remote_dir, metadata=BlobMetadata(type=self._TYPE_INFO)))) + + def to_python_value(self, ctx: FlyteContext, lv: Literal, expected_python_type: Type[MyDataset]) -> MyDataset: + """ + In this method, we want to be able to re-hydrate the custom object from Flyte Literal value. + """ + # Step 1: Download remote data locally + local_dir = ctx.file_access.get_random_local_directory() + ctx.file_access.download_directory(lv.scalar.blob.uri, local_dir) + # Step 2: Create the ``MyDataset`` object + return MyDataset(base_dir=local_dir) +``` + ++++ {"lines_to_next_cell": 0} + +Before we can use MyDataset in our tasks, we need to let Flytekit know that `MyDataset` should be considered as a valid type. +This is done using {py:class}`~flytekit:flytekit.extend.TypeEngine`'s `register` method. + +```{code-cell} +TypeEngine.register(MyDatasetTransformer()) +``` + ++++ {"lines_to_next_cell": 0} + +The new type should be ready to use! Let us write an example generator and consumer for this new datatype. + +```{code-cell} +@task +def generate() -> MyDataset: + d = MyDataset() + for i in range(3): + fp = d.new_file(f"x{i}") + with open(fp, "w") as f: + f.write(f"Contents of file{i}") + + return d + + +@task +def consume(d: MyDataset) -> str: + s = "" + for f in d.files: + with open(f) as fp: + s += fp.read() + s += "\n" + return s + + +@workflow +def wf() -> str: + return consume(d=generate()) +``` + ++++ {"lines_to_next_cell": 0} + +This workflow can be executed and tested locally. Flytekit will exercise the entire path even if you run it locally. + +```{code-cell} +if __name__ == "__main__": + print(wf()) +``` diff --git a/docs/user_guide/extending/index.md b/docs/user_guide/extending/index.md new file mode 100644 index 0000000000..19a553ddfc --- /dev/null +++ b/docs/user_guide/extending/index.md @@ -0,0 +1,218 @@ +(plugins_extend)= + +# Extending Flyte + +The core of Flyte is a container execution engine, where you can write one or more tasks and compose them together to +form a data dependency DAG, called a `workflow`. If your work involves writing simple Python or Java tasks that can +either perform operations on their own or call out to {ref}`Supported external services `, +then there's _no need to extend Flyte_. + +## Define a Custom Type + +Flyte, just like a programming language, has a core type-system, which can be extended by adding user-defined data types. +For example, Flyte supports adding support for a dataframe type from a new library, a custom user data structure, or a +grouping of images in a specific encoding. + +Flytekit natively supports structured data like {py:func}`~dataclasses.dataclass` using JSON as the +representation format (see {ref}`Using Custom Python Objects `). + +Flytekit allows users to extend Flyte's type system and implement types in Python that are not representable as JSON documents. The user has to implement a {py:class}`~flytekit.extend.TypeTransformer` +class to enable the translation of type from user type to Flyte-understood type. + +As an example, instead of using {py:class}`pandas.DataFrame` directly, you may want to use +[Pandera](https://pandera.readthedocs.io/en/stable/) to perform validation of an input or output dataframe +(see {ref}`Basic Schema Example `). + +To extend the type system, refer to {ref}`advanced_custom_types`. + +## Add a New Task Plugin + +Often you want to interact with services like: + +- Databases (e.g., Postgres, MySQL, etc.) +- DataWarehouses (e.g., Snowflake, BigQuery, Redshift etc.) +- Computation (e.g., AWS EMR, Databricks etc.) + +You might want this interaction to be available as a template for the open-source community or in your organization. This +can be done by creating a task plugin, which makes it possible to reuse the task's underlying functionality within Flyte +workflows. + +If you want users to write code simply using the {py:func}`~flytekit.task` decorator, but want to provide the +capability of running the function as a spark job or a sagemaker training job, then you can extend Flyte's task system. + +```{code-block} python +@task(task_config=MyContainerExecutionTask( + plugin_specific_config_a=..., + plugin_specific_config_b=..., + ... +)) +def foo(...) -> ...: + ... +``` + +Alternatively, you can provide an interface like this: + +```{code-block} python +query_task = SnowflakeTask( + query="Select * from x where x.time < {{.inputs.time}}", + inputs=kwtypes(time=datetime), + output_schema_type=pandas.DataFrame, +) + +@workflow +def my_wf(t: datetime) -> ...: + df = query_task(time=t) + return process(df=df) +``` + +There are two options when writing a new task plugin: you can write a task plugin as an extension in Flytekit or you can go deeper and write a plugin in the Flyte backend. + +## Flytekit-Only Task Plugin + +Flytekit is designed to be extremely extensible. You can add new task-types that are useful only for your use-case. +Flyte does come with the capability of extending the backend, but that is only required if you want the capability to be +extended to all users of Flyte, or there is a cost/visibility benefit of doing so. + +Writing your own Flytekit plugin is simple and is typically where you want to start when enabling custom task functionality. + +```{list-table} +:widths: 50 50 +:header-rows: 1 + +* - Pros + - Cons +* - Simple to write โ€” implement in Python. Flyte will treat it like a container execution and blindly pass + control to the plugin. + - Limited ways of providing additional visibility in progress, external links, etc. +* - Simple to publish: `flytekitplugins` can be published as independent libraries and they follow a simple API. + - Has to be implemented in every language as these are SDK-side plugins only. +* - Simple to perform testing: test locally in flytekit. + - In case of side-effects, it could lead to resource leaks. For example, if the plugin runs a BigQuery job, + it is possible that the plugin may crash after running the job and Flyte cannot guarantee that the BigQuery job + will be successfully terminated. +* - + - Potentially expensive: in cases where the plugin runs a remote job, running a new pod for every task execution + causes severe strain on Kubernetes and the task itself uses almost no CPUs. Also because of its stateful nature, + using spot-instances is not trivial. +* - + - A bug fix to the runtime needs a new library version of the plugin. +* - + - Not trivial to implement resource controls, like throttling, resource pooling, etc. +``` + +### User Container vs. Pre-built Container Task Plugin + +A Flytekit-only task plugin can be a {ref}`user container ` or {ref}`pre-built container ` task plugin. + +```{list-table} +:widths: 10 50 50 +:header-rows: 1 + +* - + - User Container + - Pre-built Container +* - Serialization + - At serialization time, a Docker container image is required. The assumption is that this Docker image has the task code. + - The Docker container image is hardcoded at serialization time into the task definition by the author of that task plugin. +* - Serialization + - The serialized task contains instructions to the container on how to reconstitute the task. + - Serialized task should contain all the information needed to run that task instance (but not necessarily to reconstitute it). +* - Run-time + - When Flyte runs the task, the container is launched, and the user-given instructions recreate a Python object representing the task. + - When Flyte runs the task, the container is launched. The container should have an executor built into it that knows how to execute the task. +* - Run-time + - The task object that gets serialized at compile-time is recreated using the user's code at run time. + - The task object that gets serialized at compile-time does not exist at run time. +* - Run-time + - At platform-run-time, the user-decorated function is executed. + - At platform-run-time, there is no user function, and the executor is responsible for producing outputs, given the inputs to the task. +``` + +### Backend Plugin + +{ref}`Writing a Backend plugin ` makes it possible for users to write extensions for FlytePropeller - Flyte's scheduling engine. This enables complete control of the visualization and availability +of the plugin. + +```{list-table} +:widths: 50 50 +:header-rows: 1 + +* - Pros + - Cons +* - Service oriented way of deploying new plugins - strong contracts. Maintainers can deploy new versions of the backend plugin, fix bugs, without needing the users to upgrade libraries, etc. + - Need to be implemented in Golang. +* - Drastically cheaper and more efficient to execute. FlytePropeller is written in Golang and uses an event loop model. Each process of FlytePropeller can execute thousands of tasks concurrently. + - Needs a FlytePropeller build (*currently*). +* - Flyte guarantees resource cleanup. + - Need to implement contract in a spec language like protobuf, OpenAPI, etc. +* - Flyteconsole plugins (capability coming soon!) can be added to customize visualization and progress tracking of the execution. + - Development cycle can be much slower than flytekit-only plugins. +* - Resource controls and backpressure management is available. + - +* - Implement once, use in any SDK or language! + - +``` + +#### Flyte Agent Service + +_New in Flyte 1.7.0_ + +{ref}`Flyte Agent Service ` allows you to write backend +plugins in Python. + +### Summary + +```{mermaid} + +flowchart LR + U{Use Case} + F([Python Flytekit Plugin]) + B([Golang
Backend Plugin]) + + subgraph WFTP[Writing Flytekit Task Plugins] + UCP([User Container Plugin]) + PCP([Pre-built Container Plugin]) + end + + subgraph WBE[Writing Backend Extensions] + K8S([K8s Plugin]) + WP([WebAPI Plugin]) + CP([Complex Plugin]) + end + + subgraph WCFT[Writing Custom Flyte Types] + T([Flytekit
Type Transformer]) + end + + U -- Light-weight
Extensions --> F + U -- Performant
Multi-language
Extensions --> B + U -- Specialized
Domain-specific Types --> T + F -- Require
user-defined
container --> UCP + F -- Provide
prebuilt
container --> PCP + B --> K8S + B --> WP + B --> CP + + style WCFT fill:#eee,stroke:#aaa + style WFTP fill:#eee,stroke:#aaa + style WBE fill:#eee,stroke:#aaa + style U fill:#fff2b2,stroke:#333 + style B fill:#EAD1DC,stroke:#333 + style K8S fill:#EAD1DC,stroke:#333 + style WP fill:#EAD1DC,stroke:#333 + style CP fill:#EAD1DC,stroke:#333 +``` + +Use the flow-chart above to point you to one of these examples: + +```{toctree} +:maxdepth: -1 +:name: extending_toc +:hidden: + +custom_types +prebuilt_container_task_plugins +user_container_task_plugins +backend_plugins +container_interface +``` diff --git a/docs/user_guide/extending/prebuilt_container_task_plugins.md b/docs/user_guide/extending/prebuilt_container_task_plugins.md new file mode 100644 index 0000000000..ed51f1b7a6 --- /dev/null +++ b/docs/user_guide/extending/prebuilt_container_task_plugins.md @@ -0,0 +1,105 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(prebuilt_container)= + +# Prebuilt container task plugins + +```{eval-rst} +.. tags:: Extensibility, Contribute, Intermediate +``` + +A prebuilt container task plugin runs a prebuilt container. The following are the advantages of using a prebuilt container in comparison to a user-defined container: + +- Shifts the burden of writing Dockerfile from the user who uses the task in workflows to the author of the task type. +- Allows the author to optimize the image that the task runs on. +- Makes it possible to (largely) extend the Flyte task execution behavior without using the backend Golang plugin. + The caveat is that these tasks can't access the K8s cluster, so you'll still need a backend plugin if you want a custom task type that generates CRD. + +## Usage + +Take a look at the [example PR](https://github.com/flyteorg/flytekit/pull/470), where we switched the built-in SQLite3 task from the old (user-container) to the new style of writing tasks. + +There aren't many changes from the user's standpoint: +\- Install whichever Python library has the task type definition (in the case of SQLite3, it's bundled in Flytekit, but this isn't always the case (for example, [SQLAlchemy](https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-sqlalchemy))). +\- Import and instantiate the task as you would for any other type of non-function-based task. + +## How to write a task + +Writing a pre-built container task consists of three steps: + +1. Defining a Task class +2. Defining an Executor class +3. Creating a Dockerfile that is executed when any user runs your task. It'll most likely include Flytekit, Python, and your task extension code. + +To follow along, use the [PR (mentioned above)](https://github.com/flyteorg/flytekit/pull/470) where we migrate the SQLite3 task. + +## Python library + +### Defining a task + +New tasks of this type must be created as a subclass of the `PythonCustomizedContainerTask` class. + +Specifically, you need to customize the following three arguments which would be sent to the parent class constructor: + +- `container_image`: This is the container image that will run on a Flyte platform when the user invokes the job. +- `executor_type`: This should be the Python class that inherits the `ShimTaskExecutor`. +- `task_type`: All types have a task type. Flyte engine uses this string to identify which plugin to use when running a task. + +The container plugin will be used for everything that doesn't have an explicit match (which is correct in this case). +So you may call it whatever you want, just not something that's already been claimed (like "spark"). + +Referring to the SQLite3 example, + +``` +container_image="ghcr.io/flyteorg/flytekit:py38-v0.19.0b7", +executor_type=SQLite3TaskExecutor, +task_type="sqlite", +``` + +Note that the container is special in this case since we utilize the Flytekit image. + +Furthermore, you need to override the `get_custom` function to include all the information the executor will need to run. + +Keep in mind that the task's execution behavior is entirely defined by the task's serialized form (that is, the serialized `TaskTemplate`). +This function stores and inserts the data into the template's [custom field](https://github.com/flyteorg/flyteidl/blob/7302971c064b6061a148f2bee79f673bc8cf30ee/protos/flyteidl/core/tasks.proto#L114). +However, keep the task template's overall size to a minimum. + +### Executor + +You must subclass and override the `execute_from_model` function for the `ShimTaskExecutor` abstract class. +This function will be invoked in both local workflow execution and platform-run-time execution, and will include all of the business logic of your task. + +The signature of this execute function differs from the `execute` functions of most other tasks since the `TaskTemplate` determines all the business logic, including how the task is run. + +### Image + +This is the custom image that you specified in the subclass `PythonCustomizedContainerTask`. Out of the box, when Flyte runs the container, these tasks will run a command that looks like this + +``` +pyflyte-execute --inputs s3://inputs.pb --output-prefix s3://outputs --raw-output-data-prefix s3://user-data --resolver flytekit.core.python_customized_container_task.default_task_template_resolver -- {{.taskTemplatePath}} path.to.your.executor.subclass +``` + +This means that your [Docker image](https://github.com/flyteorg/flytekit/blob/master/Dockerfile) will need Python and Flytekit installed. +The container's Python interpreter should be able to find your custom executor class at the import path `path.to.your.executor.subclass`. + +______________________________________________________________________ + +The key takeaways of a pre-built container task plugin are: + +- The task object serialized at compile time does not exist at run time. +- There is no user function at platform run time, and the executor is responsible for producing outputs based on the task's inputs. diff --git a/docs/user_guide/extending/user_container_task_plugins.md b/docs/user_guide/extending/user_container_task_plugins.md new file mode 100644 index 0000000000..68cc1a859d --- /dev/null +++ b/docs/user_guide/extending/user_container_task_plugins.md @@ -0,0 +1,164 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +(user_container)= + +# User container task plugins + +```{eval-rst} +.. tags:: Extensibility, Contribute, Intermediate +``` + +A user container task plugin runs a user-defined container that has the user code. + +This tutorial will walk you through writing your own sensor-style plugin that allows users to wait for a file to land +in the object store. Remember that if you follow the flyte/flytekit constructs, you will automatically make your plugin portable +across all cloud platforms that Flyte supports. + +## Sensor plugin + +A sensor plugin waits for some event to happen before marking the task as success. You need not worry about the +timeout as that will be handled by the flyte engine itself when running in production. + +### Plugin API + +```python +sensor = WaitForObjectStoreFile(metadata=metadata(timeout="1H", retries=10)) + +@workflow +def wait_and_run(path: str) -> int: + # To demonstrate how to create outputs, we will also + # return the output from the sensor. The output will be the + # same as the path + path = sensor(path=path) + return do_next(path=path) +``` + +```{code-cell} +import typing +from datetime import timedelta +from time import sleep + +from flytekit import TaskMetadata, task, workflow +from flytekit.extend import Interface, PythonTask, context_manager +``` + ++++ {"lines_to_next_cell": 0} + +### Plugin structure + +As illustrated above, to achieve this structure we need to create a class named `WaitForObjectStoreFile`, which +derives from {py:class}`flytekit.PythonFunctionTask` as follows. + +```{code-cell} +class WaitForObjectStoreFile(PythonTask): + """ + Add documentation here for your plugin. + This plugin creates an object store file sensor that waits and exits only when the file exists. + """ + + _VAR_NAME: str = "path" + + def __init__( + self, + name: str, + poll_interval: timedelta = timedelta(seconds=10), + **kwargs, + ): + super(WaitForObjectStoreFile, self).__init__( + task_type="object-store-sensor", + name=name, + task_config=None, + interface=Interface(inputs={self._VAR_NAME: str}, outputs={self._VAR_NAME: str}), + **kwargs, + ) + self._poll_interval = poll_interval + + def execute(self, **kwargs) -> typing.Any: + # No need to check for existence, as that is guaranteed. + path = kwargs[self._VAR_NAME] + ctx = context_manager.FlyteContext.current_context() + user_context = ctx.user_space_params + while True: + user_context.logging.info(f"Sensing file in path {path}...") + if ctx.file_access.exists(path): + user_context.logging.info(f"file in path {path} exists!") + return path + user_context.logging.warning(f"file in path {path} does not exists!") + sleep(self._poll_interval.seconds) +``` + +#### Config objects + +Flytekit routes to the right plugin based on the type of `task_config` class if using the `@task` decorator. +Config is very useful for cases when you want to customize the behavior of the plugin or pass the config information +to the backend plugin; however, in this case there's no real configuration. The config object can be any class that your +plugin understands. + +:::{note} +Observe that the base class is Generic; it is parameterized with the desired config class. +::: + +:::{note} +To create a task decorator-based plugin, `task_config` is required. +In this example, we are creating a named class plugin, and hence, this construct does not need a plugin. +::: + +Refer to the [spark plugin](https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-spark) for an example of a config object. + ++++ + +### Actual usage + +```{code-cell} +sensor = WaitForObjectStoreFile( + name="my-objectstore-sensor", + metadata=TaskMetadata(retries=10, timeout=timedelta(minutes=20)), + poll_interval=timedelta(seconds=1), +) + + +@task +def print_file(path: str) -> str: + print(path) + return path + + +@workflow +def my_workflow(path: str) -> str: + return print_file(path=sensor(path=path)) +``` + ++++ {"lines_to_next_cell": 0} + +And of course, you can run the workflow locally using your own new shiny plugin! + +```{code-cell} +if __name__ == "__main__": + f = "/tmp/some-file" + with open(f, "w") as w: + w.write("Hello World!") + + print(my_workflow(path=f)) +``` + +The key takeaways of a user container task plugin are: + +- The task object that gets serialized at compile-time is recreated using the user's code at run time. +- At platform-run-time, the user-decorated function is executed. diff --git a/docs/user_guide/index.md b/docs/user_guide/index.md new file mode 100644 index 0000000000..9c4009c9fe --- /dev/null +++ b/docs/user_guide/index.md @@ -0,0 +1,72 @@ +--- +:next-page: environment_setup +:next-page-title: Environment Setup +:prev-page: getting_started/analytics +:prev-page-title: Analytics +--- + +(userguide)= + +# User guide + +If this is your first time using Flyte, check out the {doc}`Getting Started ` guide. + +This _User guide_, the {ref}`Tutorials ` and the {ref}`Integrations ` examples cover all of +the key features of Flyte for data analytics, data science and machine learning practitioners, organized by topic. Each +section below introduces a core feature of Flyte and how you can use it to address specific use cases. Code for all +of the examples can be found in the [flytesnacks repo](https://github.com/flyteorg/flytesnacks). + +It comes with a specific environment to make running, documenting +and contributing samples easy. If this is your first time running these examples, follow the +{doc}`environment setup guide ` to get started. + +```{tip} +To learn about how to spin up and manage a Flyte cluster in the cloud, see the +{doc}`Deployment Guides `. +``` + +```{note} +Want to contribute or update an example? Check out the {doc}`Contribution Guide <../flytesnacks/contribute>`. +``` + +## Table of contents + +```{list-table} +:header-rows: 0 +:widths: 20 30 + +* - {doc}`๐ŸŒณ Environment Setup ` + - Set up a development environment to run the examples in the user guide. +* - {doc}`๐Ÿ”ค Basics ` + - Learn about tasks, workflows, launch plans, caching and managing files and directories. +* - {doc}`โŒจ๏ธ Data Types and IO ` + - Improve pipeline robustness with Flyte's portable and extensible type system. +* - {doc}`๐Ÿ”ฎ Advanced Composition ` + - Implement conditionals, nested and dynamic workflows, map tasks and even recursion! +* - {doc}`๐Ÿงฉ Customizing Dependencies ` + - Provide custom dependencies to run your Flyte entities. +* - {doc}`๐Ÿก Development Lifecycle ` + - Develop and test locally on the demo cluster. +* - {doc}`โš—๏ธ Testing ` + - Test tasks and workflows with Flyte's testing utilities. +* - {doc}`๐Ÿšข Productionizing ` + - Ship and configure your machine learning pipelines on a production Flyte installation. +* - {doc}`๐Ÿ— Extending ` + - Define custom plugins that aren't currently supported in the Flyte ecosystem. +``` + +```{toctree} +:maxdepth: -1 +:name: user_guide_toc +:hidden: + +environment_setup +basics/index +data_types_and_io/index +advanced_composition/index +customizing_dependencies/index +development_lifecycle/index +testing/index +productionizing/index +extending/index +``` \ No newline at end of file diff --git a/docs/user_guide/productionizing/configuring_access_to_gpus.md b/docs/user_guide/productionizing/configuring_access_to_gpus.md new file mode 100644 index 0000000000..f2575b5adb --- /dev/null +++ b/docs/user_guide/productionizing/configuring_access_to_gpus.md @@ -0,0 +1,53 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(configure-gpus)= + +# Configuring access to GPUs + +```{eval-rst} +.. tags:: Deployment, Infrastructure, GPU, Intermediate +``` + +Along with the simpler resources like CPU/Memory, you may want to configure and access GPU resources. Flyte +allows you to configure the GPU access poilcy for your cluster. GPUs are expensive and it would not be ideal to +treat machines with GPUs and machines with CPUs equally. You may want to reserve machines with GPUs for tasks +that explicitly request GPUs. To achieve this, Flyte uses the Kubernetes concept of [taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/). + +Kubernetes can automatically apply tolerations for extended resources like GPUs using the [ExtendedResourceToleration plugin](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#extendedresourcetoleration), enabled by default in some cloud environments. Make sure the GPU nodes are tainted with a key matching the resource name, i.e., `key: nvidia.com/gpu`. + +You can also configure Flyte backend to apply specific tolerations. This configuration is controlled under generic k8s plugin configuration as can be found [here](https://github.com/flyteorg/flyteplugins/blob/5a00b19d88b93f9636410a41f81a73356a711482/go/tasks/pluginmachinery/flytek8s/config/config.go#L120). + +The idea of this configuration is that whenever a task that can execute on Kubernetes requests for GPUs, it automatically +adds the matching toleration for that resource (in this case, `gpu`) to the generated PodSpec. +As it follows here, you can configure it to access specific resources using the tolerations for all resources supported by +Kubernetes. + +Here's an example configuration: + +```yaml +plugins: + k8s: + resource-tolerations: + - nvidia.com/gpu: + - key: "key1" + operator: "Equal" + value: "value1" + effect: "NoSchedule" +``` + +Getting this configuration into your deployment will depend on how Flyte is deployed on your cluster. If you use the default Opta/Helm route, you'll need to amend your Helm chart values ([example](https://github.com/flyteorg/flyte/blob/cc127265aec490ad9537d29bd7baff828043c6f5/charts/flyte-core/values.yaml#L629)) so that they end up [here](https://github.com/flyteorg/flyte/blob/3d265f166fcdd8e20b07ff82b494c0a7f6b7b108/deployment/eks/flyte_helm_generated.yaml#L521). diff --git a/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md b/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md new file mode 100644 index 0000000000..57502a8c77 --- /dev/null +++ b/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md @@ -0,0 +1,138 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(configure-logging)= + +# Configuring logging links in the UI + +```{eval-rst} +.. tags:: Deployment, Intermediate, UI +``` + +To debug your workflows in production, you want to access logs from your tasks as they run. +These logs are different from the core Flyte platform logs, are specific to execution, and may vary from plugin to plugin; +for example, Spark may have driver and executor logs. + +Every organization potentially uses different log aggregators, making it hard to create a one-size-fits-all solution. +Some examples of the log aggregators include cloud-hosted solutions like AWS CloudWatch, GCP Stackdriver, Splunk, Datadog, etc. + +Flyte provides a simplified interface to configure your log provider. Flyte-sandbox +ships with the Kubernetes dashboard to visualize the logs. This may not be safe for production, hence we recommend users +explore other log aggregators. + +## How to configure? + +To configure your log provider, the provider needs to support `URL` links that are shareable and can be templatized. +The templating engine has access to [these](https://github.com/flyteorg/flyteplugins/blob/b0684d97a1cf240f1a44f310f4a79cc21844caa9/go/tasks/pluginmachinery/tasklog/plugin.go#L7-L16) parameters. + +The parameters can be used to generate a unique URL to the logs using a templated URI that pertain to a specific task. The templated URI has access to the following parameters: + +```{eval-rst} +.. list-table:: Parameters to generate a templated URI + :widths: 25 50 + :header-rows: 1 + + * - Parameter + - Description + * - ``{{ .podName }}`` + - Gets the pod name as it shows in k8s dashboard + * - ``{{ .podUID }}`` + - The pod UID generated by the k8s at runtime + * - ``{{ .namespace }}`` + - K8s namespace where the pod runs + * - ``{{ .containerName }}`` + - The container name that generated the log + * - ``{{ .containerId }}`` + - The container id docker/crio generated at run time + * - ``{{ .logName }}`` + - A deployment specific name where to expect the logs to be + * - ``{{ .hostname }}`` + - The hostname where the pod is running and logs reside + * - ``{{ .podRFC3339StartTime }}`` + - The pod creation time (in RFC3339 format, e.g. "2021-01-01T02:07:14Z", also conforming to ISO 8601) + * - ``{{ .podRFC3339FinishTime }}`` + - Don't have a good mechanism for this yet, but approximating with ``time.Now`` for now + * - ``{{ .podUnixStartTime }}`` + - The pod creation time (in unix seconds, not millis) + * - ``{{ .podUnixFinishTime }}`` + - Don't have a good mechanism for this yet, but approximating with ``time.Now`` for now +``` + +The parameterization engine uses Golangs native templating format and hence uses `{{ }}`. An example configuration can be seen as follows: + +```yaml +task_logs: + plugins: + logs: + templates: + - displayName: + templateUris: + - "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log" + - "https://some-other-source/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log" + messageFormat: 0 # this parameter is optional, but use 0 for "unknown", 1 for "csv", or 2 for "json" +``` + +:::{tip} +Since helm chart uses the same templating syntax for args (like `{{ }}`), compiling the chart results in helm replacing Flyte log link templates as well. To avoid this, you can use escaped templating for Flyte logs in the helm chart. +This ensures that Flyte log link templates remain in place during helm chart compilation. +For example: + +If your configuration looks like this: + +`https://someexample.com/app/podName={{ "{{" }} .podName {{ "}}" }}&containerName={{ .containerName }}` + +Helm chart will generate: + +`https://someexample.com/app/podName={{.podName}}&containerName={{.containerName}}` + +Flytepropeller pod would be created as: + +`https://someexample.com/app/podName=pname&containerName=cname` +::: + +This code snippet will output two logs per task that use the log plugin. +However, not all task types use the log plugin; for example, the SageMaker plugin uses the log output provided by Sagemaker, and the Snowflake plugin will use a link to the snowflake console. + +## Datadog integration + +To send your Flyte workflow logs to Datadog, you can follow these steps: + +1. Enable collection of logs from containers and collection of logs using files. The precise configuration steps will vary depending on your specific setup. + +For instance, if you're using Helm, use the following config: + +```yaml +logs: + enabled: true + containerCollectAll: true + containerCollectUsingFiles: true +``` + +If you're using environment variables, use the following config: + +```yaml +DD_LOGS_ENABLED: "false" +DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL: "true" +DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE: "true" +DD_CONTAINER_EXCLUDE_LOGS: "name:datadog-agent" # This is to avoid tracking logs produced by the datadog agent itself +``` + +:::{warning} +The boolean values have to be represented as strings. +::: + +2. The Datadog [guide](https://docs.datadoghq.com/containers/kubernetes/log/?tab=daemonset) includes a section on mounting volumes. It is essential (and a prerequisite for proper functioning) to map the volumes "logpodpath" and "logcontainerpath" as illustrated in the linked example. While the "pointerdir" volume is optional, it is recommended that you map it to prevent the loss of container logs during restarts or network issues (as stated in the guide). diff --git a/docs/user_guide/productionizing/customizing_task_resources.md b/docs/user_guide/productionizing/customizing_task_resources.md new file mode 100644 index 0000000000..39fee64d55 --- /dev/null +++ b/docs/user_guide/productionizing/customizing_task_resources.md @@ -0,0 +1,181 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Customizing task resources + +```{eval-rst} +.. tags:: Deployment, Infrastructure, Basic +``` + +One of the reasons to use a hosted Flyte environment is the potential of leveraging CPU, memory and storage resources, far greater than what's available locally. +Flytekit makes it possible to specify these requirements declaratively and close to where the task itself is declared. + ++++ + +In this example, the memory required by the function increases as the dataset size increases. +Large datasets may not be able to run locally, so we would want to provide hints to the Flyte backend to request for more memory. +This is done by decorating the task with the hints as shown in the following code sample. + +Tasks can have `requests` and `limits` which mirror the native [equivalents in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits). +A task can possibly be allocated more resources than it requests, but never more than its limit. +Requests are treated as hints to schedule tasks on nodes with available resources, whereas limits +are hard constraints. + +For either a request or limit, refer to the {py:class}`flytekit:flytekit.Resources` documentation. + +The following attributes can be specified for a `Resource`. + +1. `cpu` +2. `mem` +3. `gpu` + +To ensure that regular tasks that don't require GPUs are not scheduled on GPU nodes, a separate node group for GPU nodes can be configured with [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/). + +To ensure that tasks that require GPUs get the needed tolerations on their pods, set up FlytePropeller using the following [configuration](https://github.com/flyteorg/flytepropeller/blob/v0.10.5/config.yaml#L51,L56). Ensure that this toleration config matches the taint config you have configured to protect your GPU providing nodes from dealing with regular non-GPU workloads (pods). + +The actual values follow the [Kubernetes convention](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes). +Let's look at an example to understand how to customize resources. + ++++ {"lines_to_next_cell": 0} + +Import the dependencies. + +```{code-cell} +import typing + +from flytekit import Resources, task, workflow +``` + ++++ {"lines_to_next_cell": 0} + +Define a task and configure the resources to be allocated to it. + +```{code-cell} +@task(requests=Resources(cpu="1", mem="100Mi"), limits=Resources(cpu="2", mem="150Mi")) +def count_unique_numbers(x: typing.List[int]) -> int: + s = set() + for i in x: + s.add(i) + return len(s) +``` + ++++ {"lines_to_next_cell": 0} + +Define a task that computes the square of a number. + +```{code-cell} +@task +def square(x: int) -> int: + return x * x +``` + ++++ {"lines_to_next_cell": 0} + +You can use the tasks decorated with memory and storage hints like regular tasks in a workflow. + +```{code-cell} +@workflow +def my_workflow(x: typing.List[int]) -> int: + return square(x=count_unique_numbers(x=x)) +``` + ++++ {"lines_to_next_cell": 0} + +You can execute the workflow locally. + +```{code-cell} +if __name__ == "__main__": + print(count_unique_numbers(x=[1, 1, 2])) + print(my_workflow(x=[1, 1, 2])) +``` + +:::{note} +To alter the limits of the default platform configuration, change the [admin config](https://github.com/flyteorg/flyte/blob/b16ffd76934d690068db1265ac9907a278fba2ee/deployment/eks/flyte_helm_generated.yaml#L203-L213) and [namespace level quota](https://github.com/flyteorg/flyte/blob/b16ffd76934d690068db1265ac9907a278fba2ee/deployment/eks/flyte_helm_generated.yaml#L214-L240) on the cluster. +::: + ++++ + +(resource_with_overrides)= + +## Using `with_overrides` + +You can use the `with_overrides` method to override the resources allocated to the tasks dynamically. +Let's understand how the resources can be initialized with an example. + ++++ {"lines_to_next_cell": 0} + +Import the dependencies. + +```{code-cell} +import typing # noqa: E402 + +from flytekit import Resources, task, workflow # noqa: E402 +``` + ++++ {"lines_to_next_cell": 0} + +Define a task and configure the resources to be allocated to it. +You can use tasks decorated with memory and storage hints like regular tasks in a workflow. + +```{code-cell} +@task(requests=Resources(cpu="1", mem="200Mi"), limits=Resources(cpu="2", mem="350Mi")) +def count_unique_numbers_1(x: typing.List[int]) -> int: + s = set() + for i in x: + s.add(i) + return len(s) +``` + ++++ {"lines_to_next_cell": 0} + +Define a task that computes the square of a number. + +```{code-cell} +@task +def square_1(x: int) -> int: + return x * x +``` + ++++ {"lines_to_next_cell": 0} + +The `with_overrides` method overrides the old resource allocations. + +```{code-cell} +@workflow +def my_pipeline(x: typing.List[int]) -> int: + return square_1(x=count_unique_numbers_1(x=x)).with_overrides(limits=Resources(cpu="6", mem="500Mi")) +``` + ++++ {"lines_to_next_cell": 0} + +You can execute the workflow locally. + +```{code-cell} +if __name__ == "__main__": + print(count_unique_numbers_1(x=[1, 1, 2])) + print(my_pipeline(x=[1, 1, 2])) +``` + +You can see the memory allocation below. The memory limit is `500Mi` rather than `350Mi`, and the +CPU limit is 4, whereas it should have been 6 as specified using `with_overrides`. +This is because the default platform CPU quota for every pod is 4. + +:::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/core/resource_allocation.png +:alt: Resource allocated using "with_overrides" method + +Resource allocated using "with_overrides" method +::: diff --git a/docs/user_guide/productionizing/index.md b/docs/user_guide/productionizing/index.md new file mode 100644 index 0000000000..a7748b1256 --- /dev/null +++ b/docs/user_guide/productionizing/index.md @@ -0,0 +1,24 @@ +(deployment_workflow)= + +# Productionize + +In this section, you will learn how to take Flyte pipelines into production. +You will explore concepts such as customizing resources, notifications, scheduling, +GPU configuration, secrets, spot instances and more. + +```{toctree} +:maxdepth: -1 +:name: productionizing_toc +:hidden: + +customizing_task_resources +reference_tasks +reference_launch_plans +notifications +schedules +configuring_logging_links_in_the_ui +configuring_access_to_gpus +spot_instances +secrets +workflow_labels_and_annotations +``` diff --git a/docs/user_guide/productionizing/notifications.md b/docs/user_guide/productionizing/notifications.md new file mode 100644 index 0000000000..133a402c43 --- /dev/null +++ b/docs/user_guide/productionizing/notifications.md @@ -0,0 +1,225 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Notifications + +```{eval-rst} +.. tags:: Intermediate + +``` + ++++ + +When a workflow is completed, users can be notified by: + +- Email +- [Pagerduty](https://support.pagerduty.com/docs/email-integration-guide#integrating-with-a-pagerduty-service) +- [Slack](https://slack.com/help/articles/206819278-Send-emails-to-Slack) + +The content of these notifications is configurable at the platform level. + +## Code example + +When a workflow reaches a specified [terminal workflow execution phase](https://github.com/flyteorg/flytekit/blob/v0.16.0b7/flytekit/core/notification.py#L10,L15), +the {py:class}`flytekit:flytekit.Email`, {py:class}`flytekit:flytekit.PagerDuty`, or {py:class}`flytekit:flytekit.Slack` +objects can be used in the construction of a {py:class}`flytekit:flytekit.LaunchPlan`. + +```{code-cell} +from datetime import timedelta +``` + ++++ {"lines_to_next_cell": 0} + +Consider the following example workflow: + +```{code-cell} +from flytekit import Email, FixedRate, LaunchPlan, PagerDuty, Slack, WorkflowExecutionPhase, task, workflow + + +@task +def double_int_and_print(a: int) -> str: + return str(a * 2) + + +@workflow +def int_doubler_wf(a: int) -> str: + doubled = double_int_and_print(a=a) + return doubled +``` + ++++ {"lines_to_next_cell": 0} + +Here are three scenarios that can help deepen your understanding of how notifications work: + +1. Launch Plan triggers email notifications when the workflow execution reaches the `SUCCEEDED` phase. + +```{code-cell} +int_doubler_wf_lp = LaunchPlan.get_or_create( + name="email_notifications_lp", + workflow=int_doubler_wf, + default_inputs={"a": 4}, + notifications=[ + Email( + phases=[WorkflowExecutionPhase.SUCCEEDED], + recipients_email=["admin@example.com"], + ) + ], +) +``` + ++++ {"lines_to_next_cell": 0} + +2. Notifications shine when used for scheduled workflows to alert for failures. + +```{code-cell} +:lines_to_next_cell: 2 + +int_doubler_wf_scheduled_lp = LaunchPlan.get_or_create( + name="int_doubler_wf_scheduled", + workflow=int_doubler_wf, + default_inputs={"a": 4}, + notifications=[ + PagerDuty( + phases=[WorkflowExecutionPhase.FAILED, WorkflowExecutionPhase.TIMED_OUT], + recipients_email=["abc@pagerduty.com"], + ) + ], + schedule=FixedRate(duration=timedelta(days=1)), +) +``` + +3. Notifications can be combined with different permutations of terminal phases and recipient targets. + +```{code-cell} +wacky_int_doubler_lp = LaunchPlan.get_or_create( + name="wacky_int_doubler", + workflow=int_doubler_wf, + default_inputs={"a": 4}, + notifications=[ + Email( + phases=[WorkflowExecutionPhase.FAILED], + recipients_email=["me@example.com", "you@example.com"], + ), + Email( + phases=[WorkflowExecutionPhase.SUCCEEDED], + recipients_email=["myboss@example.com"], + ), + Slack( + phases=[ + WorkflowExecutionPhase.SUCCEEDED, + WorkflowExecutionPhase.ABORTED, + WorkflowExecutionPhase.TIMED_OUT, + ], + recipients_email=["myteam@slack.com"], + ), + ], +) +``` + +4. You can use pyflyte register to register the launch plan and launch it in the web console to get the notifications. + +``` +pyflyte register lp_notifications.py +``` + +Choose the launch plan with notifications config +:::{figure} https://i.ibb.co/cLT5tRX/lp.png +:alt: Notifications Launch Plan +:class: with-shadow +::: + ++++ + +### Future work + +Work is ongoing to support a generic event egress system that can be used to publish events for tasks, workflows, and +workflow nodes. When this is complete, generic event subscribers can asynchronously process these events for a rich +and fully customizable experience. + +## Platform configuration changes + +The `notifications` top-level portion of the Flyteadmin config specifies how to handle notifications. + +As in schedules, the handling of notifications is composed of two partsโ€” one part handles enqueuing notifications asynchronously. The other part handles processing pending notifications and sends out emails and alerts. + +This is only supported for Flyte instances running on AWS. + +### Config +#### For Sandbox +To publish notifications, you'll need to register a sendgrid api key from [sendgrid](https://sendgrid.com/), it's free for 100 emails per day. +You have to add notifications config in your sandbox config file. + +```yaml +# config-sandbox.yaml +notifications: + type: sandbox # noqa: F821 + emailer: + emailServerConfig: + serviceName: sendgrid + apiKeyEnvVar: SENDGRID_API_KEY + subject: "Notice: Execution \"{{ workflow.name }}\" has {{ phase }} in \"{{ domain }}\"." + sender: "flyte-notifications@company.com" + body: > + Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at + + http://flyte.company.com/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}. {{ error }} +``` + +Note that you should set and export the `SENDGRID_API_KEY` environment variable in your shell. + +#### For AWS +To publish notifications, you'll need to set up an [SNS topic](https://aws.amazon.com/sns/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc). + +To process notifications, you'll need to set up an [AWS SQS](https://aws.amazon.com/sqs/) queue to consume notification events. This queue must be configured as a subscription to your SNS topic you created above. + +To publish notifications, you'll need a [verified SES email address](https://docs.aws.amazon.com/ses/latest/DeveloperGuide/verify-addresses-and-domains.html) which will be used to send notification emails and alerts using email APIs. + +The role you use to run Flyteadmin must have permissions to read and write to your SNS topic and SQS queue. + +Let's look into the following config section and explain what each value represents: + +```bash +notifications: + type: "aws" # noqa: F821 + region: "us-east-1" + publisher: + topicName: "arn:aws:sns:us-east-1:{{ YOUR ACCOUNT ID }}:{{ YOUR TOPIC }}" + processor: + queueName: "{{ YOUR QUEUE NAME }}" + accountId: "{{ YOUR ACCOUNT ID }}" + emailer: + subject: "Notice: Execution \"{{ workflow.name }}\" has {{ phase }} in \"{{ domain }}\"." + sender: "flyte-notifications@company.com" + body: > + Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at + + http://flyte.company.com/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}. {{ error }} +``` + +- **type**: AWS is the only cloud back-end supported for executing scheduled workflows; hence `"aws"` is the only valid value. By default, the no-op executor is used. +- **region**: Specifies the region AWS clients should use when creating SNS and SQS clients. +- **publisher**: Handles pushing notification events to your SNS topic. + : - **topicName**: This is the arn of your SNS topic. +- **processor**: Handles recording notification events and enqueueing them to be processed asynchronously. + : - **queueName**: Name of the SQS queue which will capture pending notification events. + - **accountId**: AWS [account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html#FindingYourAWSId). +- **emailer**: Encloses config details for sending and formatting emails used as notifications. + : - **subject**: Configurable subject line used in notification emails. + - **sender**: Your verified SES email sender. + - **body**: Configurable email body used in notifications. + +The complete set of parameters that can be used for email templating are checked in [here](https://github.com/flyteorg/flyteadmin/blob/a84223dab00dfa52d8ba1ed2d057e77b6c6ab6a7/pkg/async/notifications/email.go#L18,L30). diff --git a/docs/user_guide/productionizing/reference_launch_plans.md b/docs/user_guide/productionizing/reference_launch_plans.md new file mode 100644 index 0000000000..8ea476ef3e --- /dev/null +++ b/docs/user_guide/productionizing/reference_launch_plans.md @@ -0,0 +1,88 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Reference launch plans + +```{eval-rst} +.. tags:: Intermediate +``` + +A {py:func}`flytekit.reference_launch_plan` references previously defined, serialized, and registered Flyte launch plans. +You can reference launch plans from other projects and create workflows that use launch plans declared by others. + +The following example illustrates how to use reference launch plans. + +:::{note} +Reference launch plans cannot be run locally. You must mock them out. +::: + +```{code-cell} +:lines_to_next_cell: 2 + +from typing import List + +from flytekit import reference_launch_plan, workflow +from flytekit.types.file import FlyteFile + + +@reference_launch_plan( + project="flytesnacks", + domain="development", + name="data_types_and_io.file.normalize_csv_file", + version="{{ registration.version }}", +) +def normalize_csv_file( + csv_url: FlyteFile, + column_names: List[str], + columns_to_normalize: List[str], + output_location: str, +) -> FlyteFile: + ... + + +@workflow +def reference_lp_wf() -> FlyteFile: + return normalize_csv_file( + csv_url="https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", + column_names=["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], + columns_to_normalize=["Age"], + output_location="", + ) +``` + +It's important to verify that the workflow interface corresponds to that of the referenced workflow. + +:::{note} +The macro `{{ registration.version }}` is populated by `flytectl register` during registration. +Generally, it is unnecessary for reference launch plans, as it is preferable to bind to a specific version of the task or launch plan. +However, in this example, we are registering both the launch plan `core.flyte_basics.files.normalize_csv_file` and the workflow that references it. +Therefore, we need the macro to be updated to the version of a specific Flytesnacks release. +This is why `{{ registration.version }}` is used. + +A typical reference launch plan would resemble the following: + +```python +@reference_launch_plan( + project="flytesnacks", + domain="development", + name="core.flyte_basics.files.normalize_csv_file", + version="d06cebcfbeabc02b545eefa13a01c6ca992940c8", # If using GIT for versioning OR 0.16.0, if semver +) +def normalize_csv_file(...): + ... +``` +::: diff --git a/docs/user_guide/productionizing/reference_tasks.md b/docs/user_guide/productionizing/reference_tasks.md new file mode 100644 index 0000000000..057986d74a --- /dev/null +++ b/docs/user_guide/productionizing/reference_tasks.md @@ -0,0 +1,89 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +# Reference tasks + +```{eval-rst} +.. tags:: Intermediate +``` + +A {py:func}`flytekit.reference_task` references the Flyte tasks that have already been defined, serialized, and registered. +You can reference tasks from other projects and create workflows that use tasks declared by others. +These tasks can be in their own containers, python runtimes, flytekit versions, and even different languages. + +The following example illustrates how to use reference tasks. + +:::{note} +Reference tasks cannot be run locally. You must mock them out. +::: + +```{code-cell} +:lines_to_next_cell: 2 + +from typing import List + +from flytekit import reference_task, workflow +from flytekit.types.file import FlyteFile + + +@reference_task( + project="flytesnacks", + domain="development", + name="data_types_and_io.file.normalize_columns", + version="{{ registration.version }}", +) +def normalize_columns( + csv_url: FlyteFile, + column_names: List[str], + columns_to_normalize: List[str], + output_location: str, +) -> FlyteFile: + ... + + +@workflow +def wf() -> FlyteFile: + return normalize_columns( + csv_url="https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", + column_names=["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], + columns_to_normalize=["Age"], + output_location="", + ) +``` + +:::{note} +The macro `{{ registration.version }}` is populated by `flytectl register` during registration. +Generally, it is unnecessary for reference tasks, as it is preferable to bind to a specific version of the task or launch plan. +However, in this example, we are registering both the task `core.flyte_basics.files.normalize_columns` and the workflow that references it. +Therefore, we need the macro to be updated to the version of a specific Flytesnacks release. +This is why `{{ registration.version }}` is used. + +A typical reference task would resemble the following: + +```python +@reference_task( + project="flytesnacks", + domain="development", + name="core.flyte_basics.files.normalize_columns", + version="d06cebcfbeabc02b545eefa13a01c6ca992940c8", # If using GIT for versioning OR 0.16.0, if semver + ) + def normalize_columns(...): + ... +``` +::: diff --git a/docs/user_guide/productionizing/schedules.md b/docs/user_guide/productionizing/schedules.md new file mode 100644 index 0000000000..0deda30d04 --- /dev/null +++ b/docs/user_guide/productionizing/schedules.md @@ -0,0 +1,227 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(scheduling_launch_plan)= + +# Schedules + +```{eval-rst} +.. tags:: Basic +``` + +{ref}`flyte:divedeep-launchplans` can be set to run automatically on a schedule using the Flyte Native Scheduler. +For workflows that depend on knowing the kick-off time, Flyte supports passing in the scheduled time (not the actual time, which may be a few seconds off) as an argument to the workflow. + +Check out a demo of how the Native Scheduler works: + +```{eval-rst} +.. youtube:: sQoCp2qSQK4 +``` + +:::{note} +Native scheduler doesn't support [AWS syntax](http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html#CronExpressions). +::: + ++++ {"lines_to_next_cell": 0} + +Consider the following example workflow: + +```{code-cell} +from datetime import datetime + +from flytekit import task, workflow + + +@task +def format_date(run_date: datetime) -> str: + return run_date.strftime("%Y-%m-%d %H:%M") + + +@workflow +def date_formatter_wf(kickoff_time: datetime): + formatted_kickoff_time = format_date(run_date=kickoff_time) + print(formatted_kickoff_time) +``` + ++++ {"lines_to_next_cell": 0} + +The `date_formatter_wf` workflow can be scheduled using either the `CronSchedule` or the `FixedRate` object. + +(cron-schedules)= + +## Cron schedules + +[Cron](https://en.wikipedia.org/wiki/Cron) expression strings use this {ref}`syntax `. +An incorrect cron schedule expression would lead to failure in triggering the schedule. + +```{code-cell} +from flytekit import CronSchedule, LaunchPlan # noqa: E402 + +# creates a launch plan that runs every minute. +cron_lp = LaunchPlan.get_or_create( + name="my_cron_scheduled_lp", + workflow=date_formatter_wf, + schedule=CronSchedule( + # Note that the ``kickoff_time_input_arg`` matches the workflow input we defined above: kickoff_time + # But in case you are using the AWS scheme of schedules and not using the native scheduler then switch over the schedule parameter with cron_expression + schedule="*/1 * * * *", # Following schedule runs every min + kickoff_time_input_arg="kickoff_time", + ), +) +``` + +The `kickoff_time_input_arg` corresponds to the workflow input `kickoff_time`. +Specifying this argument means that Flyte will pass in the kick-off time of the +cron schedule into the `kickoff_time` argument of the `date_formatter_wf` workflow. + ++++ + +## Fixed rate intervals + +If you prefer to use an interval rather than a cron scheduler to schedule your workflows, you can use the fixed-rate scheduler. +A fixed-rate scheduler runs at the specified interval. + +Here's an example: + +```{code-cell} +from datetime import timedelta # noqa: E402 + +from flytekit import FixedRate, LaunchPlan # noqa: E402 + + +@task +def be_positive(name: str) -> str: + return f"You're awesome, {name}" + + +@workflow +def positive_wf(name: str): + reminder = be_positive(name=name) + print(f"{reminder}") + + +fixed_rate_lp = LaunchPlan.get_or_create( + name="my_fixed_rate_lp", + workflow=positive_wf, + # Note that the workflow above doesn't accept any kickoff time arguments. + # We just omit the ``kickoff_time_input_arg`` from the FixedRate schedule invocation + schedule=FixedRate(duration=timedelta(minutes=10)), + fixed_inputs={"name": "you"}, +) +``` + +This fixed-rate scheduler runs every ten minutes. Similar to a cron scheduler, a fixed-rate scheduler also accepts `kickoff_time_input_arg` (which is omitted in this example). + +(activating-schedules)= + +## Activating a schedule + +After initializing your launch plan, [activate the specific version of the launch plan](https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_update_launchplan.html) so that the schedule runs. + +```bash +flytectl update launchplan -p flyteexamples -d development {{ name_of_lp }} --version --activate +``` + ++++ + +Verify if your launch plan was activated: + +```bash +flytectl get launchplan -p flytesnacks -d development +``` + ++++ + +## Deactivating a schedule + +You can [archive/deactivate the launch plan](https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_update_launchplan.html) to deschedule any scheduled job associated with it. + +```bash +flytectl update launchplan -p flyteexamples -d development {{ name_of_lp }} --version --archive +``` + ++++ + +## Platform configuration changes for AWS scheduler + +The Scheduling feature can be run using the Flyte native scheduler which comes with Flyte. If you intend to use the AWS scheduler then it requires additional infrastructure to run, so these will have to be created and configured. The following sections are only required if you use the AWS scheme for the scheduler. You can still run the Flyte native scheduler on AWS. + +### Setting up scheduled workflows + +To run workflow executions based on user-specified schedules, you'll need to fill out the top-level `scheduler` portion of the flyteadmin application configuration. + +In particular, you'll need to configure the two components responsible for scheduling workflows and processing schedule event triggers. + +:::{note} +This functionality is currently only supported for AWS installs. +::: + +#### Event scheduler + +To schedule workflow executions, you'll need to set up an [AWS SQS](https://aws.amazon.com/sqs/) queue. A standard-type queue should suffice. The flyteadmin event scheduler creates [AWS CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/Create-CloudWatch-Events-Scheduled-Rule.html) event rules that invoke your SQS queue as a target. + +With that in mind, let's take a look at an example `eventScheduler` config section and dive into what each value represents: + +```bash +scheduler: + eventScheduler: + scheme: "aws" + region: "us-east-1" + scheduleRole: "arn:aws:iam::{{ YOUR ACCOUNT ID }}:role/{{ ROLE }}" + targetName: "arn:aws:sqs:us-east-1:{{ YOUR ACCOUNT ID }}:{{ YOUR QUEUE NAME }}" + scheduleNamePrefix: "flyte" +``` + ++++ + +- **scheme**: in this case because AWS is the only cloud back-end supported for scheduling workflows, only `"aws"` is a valid value. By default, the no-op scheduler is used. +- **region**: this specifies which region initialized AWS clients should use when creating CloudWatch rules. +- **scheduleRole** This is the IAM role ARN with permissions set to `Allow` + : - `events:PutRule` + - `events:PutTargets` + - `events:DeleteRule` + - `events:RemoveTargets` +- **targetName** this is the ARN for the SQS Queue you've allocated to scheduling workflows. +- **scheduleNamePrefix** this is an entirely optional prefix used when creating schedule rules. Because of AWS naming length restrictions, scheduled rules are a random hash and having a shared prefix makes these names more readable and indicates who generated the rules. + +#### Workflow executor + +Scheduled events which trigger need to be handled by the workflow executor, which subscribes to triggered events from the SQS queue configured above. + +:::{NOTE} +Failure to configure a workflow executor will result in all your scheduled events piling up silently without ever kicking off workflow executions. +::: + +Again, let's break down a sample config: + +```bash +scheduler: + eventScheduler: + ... + workflowExecutor: + scheme: "aws" + region: "us-east-1" + scheduleQueueName: "{{ YOUR QUEUE NAME }}" + accountId: "{{ YOUR ACCOUNT ID }}" +``` + ++++ + +- **scheme**: in this case because AWS is the only cloud back-end supported for executing scheduled workflows, only `"aws"` is a valid value. By default, the no-op executor is used and in case of sandbox we use `"local"` scheme which uses the Flyte native scheduler. +- **region**: this specifies which region AWS clients should use when creating an SQS subscriber client. +- **scheduleQueueName**: this is the name of the SQS Queue you've allocated to scheduling workflows. +- **accountId**: Your AWS [account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html#FindingYourAWSId). diff --git a/docs/user_guide/productionizing/secrets.md b/docs/user_guide/productionizing/secrets.md new file mode 100644 index 0000000000..dba145c483 --- /dev/null +++ b/docs/user_guide/productionizing/secrets.md @@ -0,0 +1,447 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +(secrets)= + +# Secrets + +```{eval-rst} +.. tags:: Kubernetes, Intermediate +``` + +Flyte supports running a variety of tasks, from containers to SQL queries and +service calls, and it provides a native Secret construct to request and access +secrets. + +This example explains how you can access secrets in a Flyte Task. Flyte provides +different types of secrets, but for users writing Python tasks, you can only access +secure secrets either as environment variables or as a file injected into the +running container. + ++++ + +## Creating secrets with a secrets manager + +:::{admonition} Prerequisites +:class: important + +- Install [kubectl](https://kubernetes.io/docs/tasks/tools/). +- Have access to a Flyte cluster, for e.g. with `flytectl demo start` as + described {ref}`here `. +::: + +The first step to using secrets in Flyte is to create one on the backend. +By default, Flyte uses the K8s-native secrets manager, which we'll use in this +example, but you can also {ref}`configure different secret managers `. + +First, we use `kubectl` to create a secret called `user-info` with a +`user_secret` key: + +```{eval-rst} +.. prompt:: bash $ + + kubectl create secret -n - generic user-info --from-literal=user_secret=mysecret +``` + +:::{note} +Be sure to specify the correct Kubernetes namespace when creating a secret. If you plan on accessing +the secret in the `flytesnacks` project under the `development` domain, replace `-` +with `flytesnacks-development`. This is because secrets need to be in the same namespace as the +workflow execution. +::: + +:::{important} +The imperative command above is useful for creating secrets in an ad hoc manner, +but it may not be the most secure or sustainable way to do so. You can, however, +define secrets using a [configuration file](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file/) +or tools like [Kustomize](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kustomize/). +::: + ++++ + +## Using secrets in tasks + +Once you've defined a secret on the Flyte backend, `flytekit` exposes a class +called {py:class}`~flytekit.Secret`s, which allows you to request a secret +from the configured secret manager. + +```{code-cell} +import os +from typing import Tuple + +import flytekit +from flytekit import Secret, task, workflow +from flytekit.testing import SecretsManager + +secret = Secret( + group="", + key="", + mount_requirement=Secret.MountType.ENV_VAR, +) +``` + +Secrets consists of `group`, `key`, and `mounting_requirement` arguments, +where a secret group can have multiple secrets associated with it. +If the `mounting_requirement` argument is not specified, the secret will +be injected as an environment variable by default. + +In the code below we specify two variables, `SECRET_GROUP` and +`SECRET_NAME`, which maps onto the `user-info` secret that we created +with `kubectl` above, with a key called `user_secret`. + +```{code-cell} +SECRET_GROUP = "user-info" +SECRET_NAME = "user_secret" +``` + +Now we declare the secret in the `secret_requests` argument of the +{py:func}`@task ` decorator. The request tells Flyte to make +the secret available to the task. + +The secret can then be accessed inside the task using the +{py:class}`~flytekit.ExecutionParameters` object, which is returned by +invoking the {py:func}`flytekit.current_context` function, as shown below. + +At runtime, flytekit looks inside the task pod for an environment variable or +a mounted file with a predefined name/path and loads the value. + +```{code-cell} +@task(secret_requests=[Secret(group=SECRET_GROUP, key=SECRET_NAME)]) +def secret_task() -> str: + context = flytekit.current_context() + secret_val = context.secrets.get(SECRET_GROUP, SECRET_NAME) + print(secret_val) + return secret_val +``` + +:::{warning} +Never print secret values! The example above is just for demonstration purposes. +::: + +:::{note} +- In case Flyte fails to access the secret, an error is raised. +- The `Secret` group and key are required parameters during declaration + and usage. Failure to specify will cause a {py:class}`ValueError`. +::: + +### Multiple keys grouped into one secret + +In some cases you may have multiple secrets and sometimes, they maybe grouped +as one secret in the SecretStore. + +For example, In Kubernetes secrets, it is possible to nest multiple keys under +the same secret: + +```{eval-rst} +.. prompt:: bash $ + + kubectl create secret generic user-info \ + --from-literal=user_secret=mysecret \ + --from-literal=username=my_username \ + --from-literal=password=my_password +``` + +In this case, the secret group will be `user-info`, with three available +secret keys: `user_secret`, `username`, and `password`. + +```{code-cell} +USERNAME_SECRET = "username" +PASSWORD_SECRET = "password" +``` + ++++ {"lines_to_next_cell": 0} + +The Secret structure allows passing two fields, matching the key and the group, as previously described: + +```{code-cell} +@task( + secret_requests=[ + Secret(key=USERNAME_SECRET, group=SECRET_GROUP), + Secret(key=PASSWORD_SECRET, group=SECRET_GROUP), + ] +) +def user_info_task() -> Tuple[str, str]: + context = flytekit.current_context() + secret_username = context.secrets.get(SECRET_GROUP, USERNAME_SECRET) + secret_pwd = context.secrets.get(SECRET_GROUP, PASSWORD_SECRET) + print(f"{secret_username}={secret_pwd}") + return secret_username, secret_pwd +``` + +:::{warning} +Never print secret values! The example above is just for demonstration purposes. +::: + +### Mounting secrets as files or environment variables + +It is also possible to make Flyte mount the secret as a file or an environment +variable. + +The file type is useful for large secrets that do not fit in environment variables, +which are typically asymmetric keys (like certs, etc). Another reason may be that a +dependent library requires the secret to be available as a file. +In these scenarios you can specify the `mount_requirement=Secret.MountType.FILE`. + +In the following example we force the mounting to be an environment variable: + +```{code-cell} +@task( + secret_requests=[ + Secret( + group=SECRET_GROUP, + key=SECRET_NAME, + mount_requirement=Secret.MountType.ENV_VAR, + ) + ] +) +def secret_file_task() -> Tuple[str, str]: + secret_manager = flytekit.current_context().secrets + + # get the secrets filename + f = secret_manager.get_secrets_file(SECRET_GROUP, SECRET_NAME) + + # get secret value from an environment variable + secret_val = secret_manager.get(SECRET_GROUP, SECRET_NAME) + + # returning the filename and the secret_val + return f, secret_val +``` + ++++ {"lines_to_next_cell": 0} + +These tasks can be used in your workflow as usual + +```{code-cell} +@workflow +def my_secret_workflow() -> Tuple[str, str, str, str, str]: + x = secret_task() + y, z = user_info_task() + f, s = secret_file_task() + return x, y, z, f, s +``` + +### Testing with mock secrets + +The simplest way to test secret accessibility is to export the secret as an +environment variable. There are some helper methods available to do so: + +```{code-cell} +if __name__ == "__main__": + sec = SecretsManager() + os.environ[sec.get_secrets_env_var(SECRET_GROUP, SECRET_NAME)] = "value" + os.environ[sec.get_secrets_env_var(SECRET_GROUP, USERNAME_SECRET)] = "username_value" + os.environ[sec.get_secrets_env_var(SECRET_GROUP, PASSWORD_SECRET)] = "password_value" + x, y, z, f, s = my_secret_workflow() + assert x == "value" + assert y == "username_value" + assert z == "password_value" + assert f == sec.get_secrets_file(SECRET_GROUP, SECRET_NAME) + assert s == "value" +``` + +## Using secrets in task templates + +For task types that connect to a remote database, you'll need to specify +secret request as well. For example, for the {py:class}`~flytekitplugins.sqlalchemy.SQLAlchemyTask` +you need to: + +1. Specify the `secret_requests` argument. +2. Configure the {py:class}`~flytekitplugins.sqlalchemy.SQLAlchemyConfig` to + declare which secret maps onto which connection argument. + +```python +from flytekit import kwtypes +from flytekitplugins.sqlalchemy import SQLAlchemyTask, SQLAlchemyConfig + + +# define the secrets +secrets = { + "username": Secret(group="", key=""), + "password": Secret(group="", key=""), +} + + +sql_query = SQLAlchemyTask( + name="sql_query", + query_template="""SELECT * FROM my_table LIMIT {{ .inputs.limit }}""", + inputs=kwtypes(limit=int), + + # request secrets + secret_requests=[*secrets.values()], + + # specify username and password credentials in the configuration + task_config=SQLAlchemyConfig( + uri="", + secret_connect_args=secrets, + ), +) +``` + ++++ + +:::{note} +Here the `secret_connect_args` map to the +[SQLAlchemy engine configuration](https://docs.sqlalchemy.org/en/20/core/engines.html) +argument names for the username and password. +::: + +You can then use the `sql_query` task inside a workflow to grab data and +perform downstream transformations on it. + ++++ + +## How secrets injection works + +The rest of this page describes how secrets injection works under the hood. +For a simple task that launches a Pod, the flow would look something like this: + +```{image} https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K1BsdWdpbnM6IENyZWF0ZSBLOHMgUmVzb3VyY2VcbiAgICBQbHVnaW5zLT4-LVByb3BlbGxlcjogUmVzb3VyY2UgT2JqZWN0XG4gICAgUHJvcGVsbGVyLT4-K1Byb3BlbGxlcjogU2V0IExhYmVscyAmIEFubm90YXRpb25zXG4gICAgUHJvcGVsbGVyLT4-K0FwaVNlcnZlcjogQ3JlYXRlIE9iamVjdCAoZS5nLiBQb2QpXG4gICAgQXBpU2VydmVyLT4-K1BvZCBXZWJob29rOiAvbXV0YXRlXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IExvb2t1cCBnbG9iYWxzXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IEluamVjdCBTZWNyZXQgQW5ub3RhdGlvbnMgKGUuZy4gSzhzLCBWYXVsdC4uLiBldGMuKVxuICAgIFBvZCBXZWJob29rLT4-LUFwaVNlcnZlcjogTXV0YXRlZCBQb2RcbiAgICBcbiAgICAgICAgICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ +:target: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K1BsdWdpbnM6IENyZWF0ZSBLOHMgUmVzb3VyY2VcbiAgICBQbHVnaW5zLT4-LVByb3BlbGxlcjogUmVzb3VyY2UgT2JqZWN0XG4gICAgUHJvcGVsbGVyLT4-K1Byb3BlbGxlcjogU2V0IExhYmVscyAmIEFubm90YXRpb25zXG4gICAgUHJvcGVsbGVyLT4-K0FwaVNlcnZlcjogQ3JlYXRlIE9iamVjdCAoZS5nLiBQb2QpXG4gICAgQXBpU2VydmVyLT4-K1BvZCBXZWJob29rOiAvbXV0YXRlXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IExvb2t1cCBnbG9iYWxzXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IEluamVjdCBTZWNyZXQgQW5ub3RhdGlvbnMgKGUuZy4gSzhzLCBWYXVsdC4uLiBldGMuKVxuICAgIFBvZCBXZWJob29rLT4-LUFwaVNlcnZlcjogTXV0YXRlZCBQb2RcbiAgICBcbiAgICAgICAgICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ +``` + +Breaking down this sequence diagram: + +1. Flyte invokes a plugin to create the K8s object. This can be a Pod or a more complex CRD (e.g. Spark, PyTorch, etc.) + + :::{note} + The plugin ensures that the labels and annotations are passed to any Pod that is spawned due to the creation of the CRD. + ::: + +2. Flyte applies labels and annotations that are referenced to all secrets the task is requesting access to. Note that secrets are not case sensitive. + +3. Flyte sends a `POST` request to `ApiServer` to create the object. + +4. Before persisting the Pod, `ApiServer` invokes all the registered Pod Webhooks and Flyte's Pod Webhook is called. + +5. Using the labels and annotiations attached in **step 2**, Flyte Pod Webhook looks up globally mounted secrets for each of the requested secrets. + +6. If found, the Pod Webhook mounts them directly in the Pod. If not found, the Pod Webhook injects the appropriate annotations to load the secrets for K8s (or Vault or Confidant or any secret management system plugin configured) into the task pod. + +Once the secret is injected into the task pod, Flytekit can read it using the secret manager. + +The webhook is included in all overlays in the Flytekit repo. The deployment file creates two things; a **Job** and a **Deployment**. + +1. `flyte-pod-webhook-secrets` **Job**: This job runs `flytepropeller webhook init-certs` command that issues self-signed CA Certificate as well as a derived TLS certificate and its private key. Ensure that the private key is in lower case, that is, `my_token` in contrast to `MY_TOKEN`. It stores them into a new secret `flyte-pod-webhook-secret`. +2. `flyte-pod-webhook` **Deployment**: This deployment creates the Webhook pod which creates a MutatingWebhookConfiguration on startup. This serves as the registration contract with the ApiServer to know about the Webhook before it starts serving traffic. + +## Secret discovery + +Flyte identifies secrets using a secret group and a secret key, which can +be accessed by {py:func}`flytekit.current_context` in the task function +body, as shown in the code examples above. + +Flytekit relies on the following environment variables to load secrets (defined [here](https://github.com/flyteorg/flytekit/blob/9d313429c577a919ec0ad4cd397a5db356a1df0d/flytekit/configuration/internal.py#L141-L159)). When running tasks and workflows locally you should make sure to store your secrets accordingly or to modify these: + +- `FLYTE_SECRETS_DEFAULT_DIR`: The directory Flytekit searches for secret files. **Default:** `"/etc/secrets"` +- `FLYTE_SECRETS_FILE_PREFIX`: a common file prefix for Flyte secrets. **Default:** `""` +- `FLYTE_SECRETS_ENV_PREFIX`: a common env var prefix for Flyte secrets. **Default:** `"_FSEC_"` + +When running a workflow on a Flyte cluster, the configured secret manager will use the secret Group and Key to try and retrieve a secret. +If successful, it will make the secret available as either file or environment variable and will if necessary modify the above variables automatically so that the task can load and use the secrets. + +(configure_secret_management)= + +## Configuring a secret management system plugin + +When a task requests a secret, Flytepropeller will try to retrieve secrets in the following order: + +1. Checking for global secrets, i.e. secrets mounted as files or environment variables on the `flyte-pod-webhook` pod +2. Checking with an additional configurable secret manager. + +:::{important} +The global secrets take precedence over any secret discoverable by the secret manager plugins. +::: + +The following secret managers are available at the time of writing: + +- [K8s secrets](https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret) (**default**): `flyte-pod-webhook` will try to look for a K8s secret named after the secret Group and retrieve the value for the secret Key. +- [AWS Secret Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html): `flyte-pod-webhook` will add the AWS Secret Manager sidecar container to a task Pod which will mount the secret. +- [Vault Agent Injector](https://developer.hashicorp.com/vault/tutorials/getting-started/getting-started-first-secret#write-a-secret) : `flyte-pod-webhook` will annotate the task Pod with the respective Vault annotations that trigger an existing Vault Agent Injector to retrieve the specified secret Key from a vault path defined as secret Group. + +You can configure the additional secret manager by defining `secretManagerType` to be either 'K8s', 'AWS' or 'Vault' in +the [core config](https://github.com/flyteorg/flyte/blob/master/kustomize/base/single_cluster/headless/config/propeller/core.yaml#L34) of the Flytepropeller. + +When using the K8s secret manager plugin, which is enabled by default, the secrets need to be available in the same namespace as the task execution +(for example `flytesnacks-development`). K8s secrets can be mounted as either files or injected as environment variables into the task pod, +so if you need to make larger files available to the task, then this might be the better option. + +Furthermore, this method also allows you to have separate credentials for different domains but still using the same name for the secret. + +### AWS secrets manager + +When using the AWS secret management plugin, secrets need to be specified by naming them in the format +`:`, where the secret string is a plain-text value, **not** key/value json. + +### Vault secrets manager + +When using the Vault secret manager, make sure you have Vault Agent deployed on your cluster as described in this +[step-by-step tutorial](https://learn.hashicorp.com/tutorials/vault/kubernetes-sidecar). +Vault secrets can only be mounted as files and will become available under `"/etc/flyte/secrets/SECRET_GROUP/SECRET_NAME"`. + +Vault comes with various secrets engines. Currently Flyte supports working with both version 1 and 2 of the `Key Vault engine ` as well as the `databases secrets engine `. +You can use use the `group_version` parameter to specify which secret backend engine to use. Available choices are: "kv1", "kv2", "db": + ++++ {"lines_to_next_cell": 0} + +How to request secrets with the Vault secret manager + +```{code-cell} +secret = Secret( + group="", + key="", + group_version="", +) +``` + +The group parameter is used to specify the path to the secret in the Vault backend. For example, if you have a secret stored in Vault at `"secret/data/flyte/secret"` then the group parameter should be `"secret/data/flyte"`. +When using either of the Key Vault engine versions, the secret key is the name of a specific secret entry to be retrieved from the group path. +When using the database secrets engine, the secret key itself is arbitrary but is required by Flyte to name and identify the secret file. It is arbitrary because the database secrets engine returns always two keys, `username` and `password` and we need to retrieve a matching pair in one request. + +**Configuration** + +You can configure the Vault role under which Flyte will try to read the secret by setting webhook.vaultSecretManager.role (default: `"flyte"`). +There is also a deprecated `webhook.vaultSecretManager.kvVersion` setting in the configmap that can be used to specify the version but only for the Key Vault backend engine. +Available choices are: "1", "2". Note that the version number needs to be an explicit string (e.g. `"1"`). + +**Annotations** + +By default, `flyte-pod-webhook` injects following annotations to task pod: + +1. `vault.hashicorp.com/agent-inject` to configure whether injection is explicitly enabled or disabled for a pod. +2. `vault.hashicorp.com/secret-volume-path` to configure where on the filesystem a secret will be rendered. +3. `vault.hashicorp.com/role` to configure the Vault role used by the Vault Agent auto-auth method. +4. `vault.hashicorp.com/agent-pre-populate-only` to configure whether an init container is the only injected container. +5. `vault.hashicorp.com/agent-inject-secret` to configure Vault Agent to retrieve the secrets from Vault required by the container. +6. `vault.hashicorp.com/agent-inject-file` to configure the filename and path in the secrets volume where a Vault secret will be written. +7. `vault.hashicorp.com/agent-inject-template` to configure the template Vault Agent should use for rendering a secret. + +It is possible to add extra annotations or override the existing ones in Flyte either at the task level using pod annotations or at the installation level. +If Flyte administrator wants to set up annotations for the entire system, they can utilize `webhook.vaultSecretManager.annotations` to accomplish this. + +## Scaling the webhook + +### Vertical scaling + +To scale the Webhook to be able to process the number/rate of pods you need, you may need to configure a vertical [pod +autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler). + +### Horizontal scaling + +The Webhook does not make any external API Requests in response to Pod mutation requests. It should be able to handle traffic +quickly. For horizontal scaling, adding additional replicas for the Pod in the +deployment should be sufficient. A single `MutatingWebhookConfiguration` object will be used, the same TLS certificate +will be shared across the pods and the Service created will automatically load balance traffic across the available pods. diff --git a/docs/user_guide/productionizing/spot_instances.md b/docs/user_guide/productionizing/spot_instances.md new file mode 100644 index 0000000000..85024f6996 --- /dev/null +++ b/docs/user_guide/productionizing/spot_instances.md @@ -0,0 +1,92 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + ++++ {"lines_to_next_cell": 0} + +# Spot instances + +```{eval-rst} +.. tags:: AWS, GCP, Intermediate + +``` + ++++ + +## What are spot instances? + +Spot instances are unused EC2 capacity in AWS. [Spot instances](https://aws.amazon.com/ec2/spot/?cards.sort-by=item.additionalFields.startDateTime&cards.sort-order=asc) can result in up to 90% savings on on-demand prices. The caveat is that these instances can be preempted at any point and no longer be available for use. This can happen due to: + +- Price โ€“ The spot price is greater than your maximum price. +- Capacity โ€“ If there are not enough unused EC2 instances to meet the demand for spot instances, Amazon EC2 interrupts spot instances. Amazon EC2 determines the order in which the instances are interrupted. +- Constraints โ€“ If your request includes a constraint such as a launch group or an Availability Zone group, these spot instances are terminated as a group when the constraint can no longer be met. + +Generally, most spot instances are obtained for around 2 hours (median), with the floor being about 20 minutes and the ceiling of unbounded duration. + +:::{note} +Spot Instances are called `Preemptible Instances` in the GCP terminology. +::: + +### Setting up spot instances + +- AWS: +- GCP: + +If an auto-scaling group (ASG) is set up, you may want to isolate the tasks you want to trigger on spot/preemptible instances from the regular workloads. +This can be done by setting taints and tolerations using the [config](https://github.com/flyteorg/flyteplugins/blob/60b94c688ef2b98aa53a9224b529ac672af04540/go/tasks/pluginmachinery/flytek8s/config/config.go#L84-L92) available at `flyteorg/flyteplugins` repo. + +:::{admonition} What's an ASG for a spot/preemptible instance? +When your spot/preemptible instance is terminated, ASG attempts to launch a replacement instance to maintain the desired capacity for the group. +::: + ++++ + +## What are interruptible tasks? + +If specified, the `interruptible flag` is added to the task definition and signals to the Flyte engine that it may be scheduled on machines that may be preempted, such as AWS spot instances. This is low-hanging fruit for any cost-savings initiative. + +### Setting interruptible tasks + +To run your workload on a spot/preemptible instance, you can set interruptible to `True`. In case you would like to automatically retry in case the node gets preemted, please also make sure to set at least one retry. For example: + +```python +@task(cache_version='1', interruptible=True, retries=1) +def add_one_and_print(value_to_print: int) -> int: + return value_to_print + 1 +``` + ++++ + +By setting this value, Flyte will schedule your task on an auto-scaling group (ASG) with only spot instances. + +:::{note} +If you set `retries=n`, for instance, and the task gets preempted repeatedly, Flyte will retry on a preemptible/spot instance `n-1` times and for the last attempt will retry your task on a non-spot (regular) instance. Please note that tasks will only be retried if at least one retry is allowed using the `retries` parameter in the `task` decorator. +::: + +### Which tasks should be set to interruptible? + +Most Flyte workloads should be good candidates for spot instances. +If your task does NOT exhibit the following properties, you can set `interruptible` to true. + +- Time-sensitive: It needs to run now and can not have any unexpected delays. +- Side Effects: The task is not idempotent, and retrying will cause issues. +- Long-Running Tasks: The task takes > 2 hours. Having an interruption during this time frame could potentially waste a lot of computation. + +In a nutshell, you should use spot/preemptible instances when you want to reduce the total cost of running jobs at the expense of potential delays in execution due to restarts. + ++++ + +% TODO: Write "How to Recover From Interruptions?" section diff --git a/docs/user_guide/productionizing/workflow_labels_and_annotations.md b/docs/user_guide/productionizing/workflow_labels_and_annotations.md new file mode 100644 index 0000000000..5b9fe6d4c6 --- /dev/null +++ b/docs/user_guide/productionizing/workflow_labels_and_annotations.md @@ -0,0 +1,75 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Workflow labels and annotations + +```{eval-rst} +.. tags:: Kubernetes, Intermediate +``` + +In Flyte, workflow executions are created as Kubernetes resources. These can be extended with +[labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) and +[annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/). + +**Labels** and **annotations** are key value pairs which can be used to identify workflows for your own uses. + +:::{Warning} +Note that adding labels and annotations to your K8s resources may have side-effects depending on webhook behavior on your execution clusters. +::: + +Labels are meant to be used as identifying attributes, whereas annotations are arbitrary, *non-identifying* metadata. + +Using labels and annotations is entirely optional. They can be used to categorize and identify workflow executions. + +Labels and annotations are optional parameters to launch plan and execution invocations. When an execution +defines labels and/or annotations *and* the launch plan does as well, the execution spec values will be preferred. + +## Launch plan usage example + +```python +from flytekit import Labels, Annotations + +@workflow +class MyWorkflow(object): + ... + +my_launch_plan = MyWorkflow.create_launch_plan( + labels=Labels({"myexecutionlabel": "bar", ...}), + annotations=Annotations({"region": "SEA", ...}), + ... +) + +my_launch_plan.execute(...) +``` + +## Execution example + +```python +from flytekit import Labels, Annotations + +@workflow +class MyWorkflow(object): + ... + +my_launch_plan = MyWorkflow.create_launch_plan(...) + +my_launch_plan.execute( + labels=Labels({"myexecutionlabel": "bar", ...}), + annotations=Annotations({"region": "SEA", ...}), + ... +) +``` diff --git a/docs/user_guide/testing/index.md b/docs/user_guide/testing/index.md new file mode 100644 index 0000000000..80d3f6ef4e --- /dev/null +++ b/docs/user_guide/testing/index.md @@ -0,0 +1,16 @@ +(testing)= + +# Testing + +The `flytekit` python SDK provides a few utilities for making it easier to test +your tasks and workflows in your test suite. For more details, you can also refer +to the {py:mod}`~flytekit.testing` module in the API reference. + + +```{toctree} +:maxdepth: -1 +:name: testing_toc +:hidden: + +mocking_tasks +``` diff --git a/docs/user_guide/testing/mocking_tasks.md b/docs/user_guide/testing/mocking_tasks.md new file mode 100644 index 0000000000..07b4f0239e --- /dev/null +++ b/docs/user_guide/testing/mocking_tasks.md @@ -0,0 +1,101 @@ +--- +jupytext: + cell_metadata_filter: all + formats: md:myst + main_language: python + notebook_metadata_filter: all + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Mocking tasks + +A lot of the tasks that you write you can run locally, but some of them you will not be able to, usually because they +are tasks that depend on a third-party only available on the backend. Hive tasks are a common example, as most users +will not have access to the service that executes Hive queries from their development environment. However, it's still +useful to be able to locally run a workflow that calls such a task. In these instances, flytekit provides a couple +of utilities to help navigate this. + +```{code-cell} +import datetime + +import pandas +from flytekit import SQLTask, TaskMetadata, kwtypes, task, workflow +from flytekit.testing import patch, task_mock +from flytekit.types.schema import FlyteSchema +``` + ++++ {"lines_to_next_cell": 0} + +This is a generic SQL task (and is by default not hooked up to any datastore nor handled by any plugin), and must +be mocked. + +```{code-cell} +sql = SQLTask( + "my-query", + query_template="SELECT * FROM hive.city.fact_airport_sessions WHERE ds = '{{ .Inputs.ds }}' LIMIT 10", + inputs=kwtypes(ds=datetime.datetime), + outputs=kwtypes(results=FlyteSchema), + metadata=TaskMetadata(retries=2), +) +``` + ++++ {"lines_to_next_cell": 0} + +This is a task that can run locally + +```{code-cell} +@task +def t1() -> datetime.datetime: + return datetime.datetime.now() +``` + ++++ {"lines_to_next_cell": 0} + +Declare a workflow that chains these two tasks together. + +```{code-cell} +@workflow +def my_wf() -> FlyteSchema: + dt = t1() + return sql(ds=dt) +``` + ++++ {"lines_to_next_cell": 0} + +Without a mock, calling the workflow would typically raise an exception, but with the `task_mock` construct, which +returns a `MagicMock` object, we can override the return value. + +```{code-cell} +def main_1(): + with task_mock(sql) as mock: + mock.return_value = pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]}) + assert (my_wf().open().all() == pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]})).all().all() +``` + ++++ {"lines_to_next_cell": 0} + +There is another utility as well called `patch` which offers the same functionality, but in the traditional Python +patching style, where the first argument is the `MagicMock` object. + +```{code-cell} +def main_2(): + @patch(sql) + def test_user_demo_test(mock_sql): + mock_sql.return_value = pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]}) + assert (my_wf().open().all() == pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]})).all().all() + + test_user_demo_test() + + +if __name__ == "__main__": + main_1() + main_2() +``` diff --git a/rfc/system/1893-caching-of-offloaded-objects.md b/rfc/system/1893-caching-of-offloaded-objects.md index 806b6cfd36..9a748df02d 100644 --- a/rfc/system/1893-caching-of-offloaded-objects.md +++ b/rfc/system/1893-caching-of-offloaded-objects.md @@ -6,7 +6,7 @@ ## 1 Executive Summary -We propose a way to override the default behavior of [caching task executions](https://docs.flyte.org/en/latest/flytesnacks/examples/development_lifecycle/task_cache.html), enabling cache-by-value semantics for certain categories of objects. +We propose a way to override the default behavior of [caching task executions](https://docs.flyte.org/en/latest/user_guide/development_lifecycle/caching.html), enabling cache-by-value semantics for certain categories of objects. ## 2 Motivation diff --git a/rfc/system/2633-eviction-of-cached-task-outputs.md b/rfc/system/2633-eviction-of-cached-task-outputs.md index 45c3cd8d73..52b4adebf6 100644 --- a/rfc/system/2633-eviction-of-cached-task-outputs.md +++ b/rfc/system/2633-eviction-of-cached-task-outputs.md @@ -158,7 +158,7 @@ The potential for malicious exploitation is deemed non-existent as no access to 3. Which Flyte tools (`flyteconsole`/`flytectl`) should support the proposed `AdminService` API extension for `flyteadmin`, if any? - **RESOLVED**: `flytectl`, `flytekit.remote`, `flyteconsole` 4. Should we support automatic eviction of cached results on workflow archival (opt-out via `flyteconsole`)? -5. Should we evict [Intratask Checkpoints](https://docs.flyte.org/en/latest/flytesnacks/examples/advanced_composition/checkpoint.html#intratask-checkpoints) from the cache as well since they might return cached results? If so, should we evict them from the backend side or pass the `cache_override` flag along to `flytekit`/its `Checkpointer` to skip any available entries? +5. Should we evict [Intratask Checkpoints](https://docs.flyte.org/en/latest/user_guide/advanced_composition/intratask_checkpoints.html) from the cache as well since they might return cached results? If so, should we evict them from the backend side or pass the `cache_override` flag along to `flytekit`/its `Checkpointer` to skip any available entries? - **RESOLVED**: not for the initial implementation. Intratask checkpoints are only relevant for consecutive retries of a task - their results would not be considered when launching another execution with a `cache_override` flag set. ## 9 Conclusion