add PipesLambdaClient #17924

alangenfeld · 2023-11-10T22:09:27Z

Create a pipes client for AWS lambda.

How I Tested These Changes

added unit tests that use a fake version of lambda
manually tested against a real lambda instance

alangenfeld · 2023-11-10T22:09:38Z

Current dependencies on/for this PR:

master
- PR [pipes] move s3 tests to dagster_aws #18306
  - PR add PipesLambdaClient #17924 👈

This stack of pull requests is managed by Graphite.

schrockn

What is the behavior here when unstructured logs are >4k? Does it still pick up all structured messages?

python_modules/dagster-pipes/dagster_pipes/__init__.py

schrockn · 2023-11-14T14:33:53Z

python_modules/libraries/dagster-aws/dagster_aws/pipes.py

+
+
+@experimental
+class PipesLambdaLogsMessageReader(PipesMessageReader):


Can you document it's capabilities. In particular the limitation on log length and what the product experience would be if you use this.

Is it possible to implement this using the PipesLogReader API used for databricks? If I'm reading this correctly this doesn't support continuous polling of the logs.

Or does that not work/is otherwise inappropriate for lambda?

The lambda sync invoke API gives you back the last 4k of logs directly in the response, so this initial version is exploring that. While it does have its limitations, its very simple and as long as the meaningful messages are towards the end of the computation (which i believe would be common) then it should work.

The full logs are available via Cloud Watch so need to explore what the exact constraints of using those APIs to trail the logs are (likely via PipesLogReader).

schrockn · 2023-11-14T14:34:19Z

python_modules/libraries/dagster-aws/dagster_aws/pipes.py

+        log_result = base64.b64decode(response["LogResult"]).decode("utf-8")
+
+        for log_line in log_result.splitlines():
+            extract_message_or_forward_to_stdout(handler, log_line)


Hmmmm should this be stderr?

python_modules/dagster-pipes/dagster_pipes/__init__.py

smackesey · 2023-11-13T21:51:12Z

python_modules/libraries/dagster-docker/dagster_docker/pipes.py

@@ -75,7 +75,7 @@ class _PipesDockerClient(PipesClient):
            the docker client.
        context_injector (Optional[PipesContextInjector]): A context injector to use to inject
            context into the docker container process. Defaults to :py:class:`PipesEnvContextInjector`.
-        message_reader (Optional[PipesContextInjector]): A message reader to use to read messages
+        message_reader (Optional[PipesMessageReader]): A message reader to use to read messages


python_modules/dagster-pipes/dagster_pipes/__init__.py

smackesey · 2023-11-14T18:26:41Z

python_modules/libraries/dagster-aws/dagster_aws/pipes.py

+
+
+@experimental
+class PipesLambdaLogsMessageReader(PipesMessageReader):


Is it possible to implement this using the PipesLogReader API used for databricks? If I'm reading this correctly this doesn't support continuous polling of the logs.

Or does that not work/is otherwise inappropriate for lambda?

schrockn

The lambda sync invoke API gives you back the last 4k of logs directly in the response, so this initial version is exploring that. While it does have its limitations, its very simple and as long as the meaningful messages are towards the end of the computation (which i believe would be common) then it should work.

This seems very unreliable. I suggest doing something less rickety.

Ideas:

Structured messages via s3
Wait to serialize structured messages until the very end to guarantee availability.

Open to other suggestions.

changes applied

alangenfeld

Wait to serialize structured messages until the very end to guarantee availability.

I did this with [1]

Structured messages via s3

This is under test / demonstrated [2]

A maybe not grounded motivation for keeping the log tails baed default is avoiding the need to get boto3 dep installed in your lambda. Anecdotally, copy pasting the single file for access to dagster-pipes is pretty low friction where as getting boto3 leads to https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-dependencies . I assume most mature users of lambda have some tooling to manage and package deps, but not clear to me at this time how prevalent that maturity is.

alangenfeld · 2023-11-27T22:27:23Z

python_modules/dagster-pipes/dagster_pipes/__init__.py

+class PipesBufferedStreamMessageWriterChannel(PipesMessageWriterChannel):
+    """Message writer channel that buffers messages and then writes them all out to a
+    `TextIO` stream on close.
+    """
+
+    def __init__(self, stream: TextIO):
+        self._buffer = []
+        self._stream = stream
+
+    def write_message(self, message: PipesMessage) -> None:
+        self._buffer.append(message)
+
+    def flush(self):
+        for message in self._buffer:
+            self._stream.writelines((json.dumps(message), "\n"))
+        self._buffer = []


alangenfeld · 2023-11-27T22:27:46Z

python_modules/libraries/dagster-aws/dagster_aws_tests/pipes_tests/test_pipes.py

+            PipesLambdaClient(
+                FakeLambdaClient(),
+                message_reader=PipesS3MessageReader(
+                    client=s3_client, bucket=_S3_TEST_BUCKET, interval=0.001
+                ),
+            )
+            .run(
+                context=context,
+                function_name=LambdaFunctions.pipes_s3_messages.__name__,
+                event={},
+            )
+            .get_materialize_result()


alangenfeld · 2023-11-27T22:27:58Z

python_modules/libraries/dagster-aws/dagster_aws_tests/pipes_tests/fake_lambda.py

+        s3_client = boto3.client(
+            "s3", region_name="us-east-1", endpoint_url="http://localhost:5193"
+        )
+        with open_dagster_pipes(
+            params_loader=PipesMappingParamsLoader(event),
+            message_writer=PipesS3MessageWriter(client=s3_client, interval=0.001),


schrockn · 2023-12-04T15:00:23Z

python_modules/dagster-pipes/dagster_pipes/__init__.py

+class PipesMappingParamsLoader(PipesParamsLoader):
+    """Params loader that extracts params from a Mapping provided at init time."""
+
+    def __init__(self, mapping: Mapping[str, str]):
+        self._mapping = mapping

    def is_dagster_pipes_process(self) -> bool:
        # use the presence of DAGSTER_PIPES_CONTEXT to discern if we are in a pipes process
-        return DAGSTER_PIPES_CONTEXT_ENV_VAR in os.environ
+        return DAGSTER_PIPES_CONTEXT_ENV_VAR in self._mapping

    def load_context_params(self) -> PipesParams:
-        return _param_from_env_var(DAGSTER_PIPES_CONTEXT_ENV_VAR)
+        raw_value = self._mapping[DAGSTER_PIPES_CONTEXT_ENV_VAR]
+        return decode_env_var(raw_value)

    def load_messages_params(self) -> PipesParams:
-        return _param_from_env_var(DAGSTER_PIPES_MESSAGES_ENV_VAR)
+        raw_value = self._mapping[DAGSTER_PIPES_MESSAGES_ENV_VAR]
+        return decode_env_var(raw_value)
+
+
+class PipesEnvVarParamsLoader(PipesMappingParamsLoader):
+    """Params loader that extracts params from environment variables."""
+
+    def __init__(self):
+        super().__init__(mapping=os.environ)


Potentially a behavior change if someone smashes os.environ by assigning a new dictionary.

I don't think any action is worth taking on this, but just wanted to call it out.

schrockn

Ok great. Let's write some docs on this and post it to #dagster-pipes for this week's release.

schrockn · 2023-12-04T15:02:50Z

python_modules/libraries/dagster-aws/dagster_aws/pipes.py

+    """Message reader that consumes buffered pipes messages that were flushed on exit from the
+    final 4k of logs that are returned from issuing a sync lambda invocation.
+
+    Limitations: If the volume of pipes messages exceeds 4k, messages will be lost and it is
+    recommended to switch to PipesS3MessageWriter & PipesS3MessageReader.
+    """


Also note explicitly that messages emitted during the computation will only be emitted to the Dagster event log until after the lambda complete.

github-actions · 2023-12-04T17:16:12Z

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-qic7klf09-elementl.vercel.app
https://al-11-07--prototype-lambda-pipes-client.components-storybook.dagster-docs.io

Built with commit 3419ee6.
This pull request is being automatically deployed with vercel-action

github-actions · 2023-12-04T17:17:02Z

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-m8yrrxoos-elementl.vercel.app
https://al-11-07--prototype-lambda-pipes-client.dagster.dagster-docs.io

Direct link to changed pages:

github-actions · 2023-12-04T17:18:59Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-ai5qvzk75-elementl.vercel.app
https://al-11-07--prototype-lambda-pipes-client.core-storybook.dagster-docs.io

Built with commit 3419ee6.
This pull request is being automatically deployed with vercel-action

## Summary & Motivation This PR adds a guide for using Pipes with AWS Lambda (#17924) ## How I Tested These Changes 👀 , help from Alex

Create a pipes client for AWS lambda. ## How I Tested These Changes added unit tests that use a fake version of lambda manually tested against a real lambda instance

## Summary & Motivation This PR adds a guide for using Pipes with AWS Lambda (dagster-io#17924) ## How I Tested These Changes 👀 , help from Alex

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 120e6d0 to 57fa0ab Compare November 10, 2023 22:17

alangenfeld changed the title ~~[prototype] lambda pipes client~~ add PipesLambdaClient Nov 10, 2023

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch 2 times, most recently from 9854f8d to a5233d6 Compare November 10, 2023 22:20

alangenfeld requested review from schrockn and smackesey November 13, 2023 21:46

alangenfeld marked this pull request as ready for review November 13, 2023 21:47

schrockn requested changes Nov 14, 2023

View reviewed changes

smackesey reviewed Nov 14, 2023

View reviewed changes

schrockn previously requested changes Nov 20, 2023

View reviewed changes

alangenfeld changed the base branch from master to al/11-27-_pipes_move_s3_tests_to_dagster_aws November 27, 2023 21:31

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from a5233d6 to efb6de5 Compare November 27, 2023 21:31

alangenfeld mentioned this pull request Nov 27, 2023

[pipes] move s3 tests to dagster_aws #18306

Merged

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from efb6de5 to 32dc6e2 Compare November 27, 2023 21:39

alangenfeld commented Nov 27, 2023

View reviewed changes

alangenfeld force-pushed the al/11-27-_pipes_move_s3_tests_to_dagster_aws branch from cdff2b3 to ef4166e Compare November 29, 2023 15:25

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 32dc6e2 to 3746da4 Compare November 29, 2023 15:25

Base automatically changed from al/11-27-_pipes_move_s3_tests_to_dagster_aws to master November 29, 2023 15:53

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 3746da4 to 0cd8dc5 Compare November 29, 2023 15:53

erinkcochran87 mentioned this pull request Nov 29, 2023

[docs] - Pipes + AWS Lambda guide #18373

Merged

schrockn reviewed Dec 4, 2023

View reviewed changes

schrockn approved these changes Dec 4, 2023

View reviewed changes

[prototype] lambda pipes client

3419ee6

alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 0cd8dc5 to 3419ee6 Compare December 4, 2023 17:13

alangenfeld merged commit 57f56c6 into master Dec 4, 2023
3 checks passed

alangenfeld deleted the al/11-07-_prototype_lambda_pipes_client branch December 4, 2023 18:54

erinkcochran87 added a commit that referenced this pull request Dec 4, 2023

[docs] - Pipes + AWS Lambda guide (#18373)

a0a523e

## Summary & Motivation This PR adds a guide for using Pipes with AWS Lambda (#17924) ## How I Tested These Changes 👀 , help from Alex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add PipesLambdaClient #17924

add PipesLambdaClient #17924

alangenfeld commented Nov 10, 2023 •

edited

Loading

alangenfeld commented Nov 10, 2023 •

edited

Loading

schrockn left a comment

schrockn Nov 14, 2023

smackesey Nov 14, 2023

alangenfeld Nov 14, 2023

schrockn Nov 14, 2023

smackesey Nov 13, 2023

smackesey Nov 14, 2023

schrockn left a comment

alangenfeld left a comment

alangenfeld Nov 27, 2023

alangenfeld Nov 27, 2023

alangenfeld Nov 27, 2023

schrockn Dec 4, 2023

schrockn Dec 4, 2023

schrockn left a comment

schrockn Dec 4, 2023

github-actions bot commented Dec 4, 2023

github-actions bot commented Dec 4, 2023

github-actions bot commented Dec 4, 2023



		@experimental
		class PipesLambdaLogsMessageReader(PipesMessageReader):

add PipesLambdaClient #17924

add PipesLambdaClient #17924

Conversation

alangenfeld commented Nov 10, 2023 • edited Loading

How I Tested These Changes

alangenfeld commented Nov 10, 2023 • edited Loading

schrockn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schrockn left a comment

Choose a reason for hiding this comment

alangenfeld left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schrockn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 4, 2023

github-actions bot commented Dec 4, 2023

github-actions bot commented Dec 4, 2023

alangenfeld commented Nov 10, 2023 •

edited

Loading

alangenfeld commented Nov 10, 2023 •

edited

Loading