Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PipesLambdaClient #17924

Merged
merged 1 commit into from
Dec 4, 2023
Merged

Conversation

alangenfeld
Copy link
Member

@alangenfeld alangenfeld commented Nov 10, 2023

Create a pipes client for AWS lambda.

How I Tested These Changes

added unit tests that use a fake version of lambda
manually tested against a real lambda instance

@alangenfeld
Copy link
Member Author

alangenfeld commented Nov 10, 2023

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 120e6d0 to 57fa0ab Compare November 10, 2023 22:17
@alangenfeld alangenfeld changed the title [prototype] lambda pipes client add PipesLambdaClient Nov 10, 2023
@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch 2 times, most recently from 9854f8d to a5233d6 Compare November 10, 2023 22:20
@alangenfeld alangenfeld marked this pull request as ready for review November 13, 2023 21:47
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior here when unstructured logs are >4k? Does it still pick up all structured messages?

python_modules/dagster-pipes/dagster_pipes/__init__.py Outdated Show resolved Hide resolved


@experimental
class PipesLambdaLogsMessageReader(PipesMessageReader):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document it's capabilities. In particular the limitation on log length and what the product experience would be if you use this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to implement this using the PipesLogReader API used for databricks? If I'm reading this correctly this doesn't support continuous polling of the logs.

Or does that not work/is otherwise inappropriate for lambda?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda sync invoke API gives you back the last 4k of logs directly in the response, so this initial version is exploring that. While it does have its limitations, its very simple and as long as the meaningful messages are towards the end of the computation (which i believe would be common) then it should work.

The full logs are available via Cloud Watch so need to explore what the exact constraints of using those APIs to trail the logs are (likely via PipesLogReader).

log_result = base64.b64decode(response["LogResult"]).decode("utf-8")

for log_line in log_result.splitlines():
extract_message_or_forward_to_stdout(handler, log_line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm should this be stderr?

python_modules/dagster-pipes/dagster_pipes/__init__.py Outdated Show resolved Hide resolved
@@ -75,7 +75,7 @@ class _PipesDockerClient(PipesClient):
the docker client.
context_injector (Optional[PipesContextInjector]): A context injector to use to inject
context into the docker container process. Defaults to :py:class:`PipesEnvContextInjector`.
message_reader (Optional[PipesContextInjector]): A message reader to use to read messages
message_reader (Optional[PipesMessageReader]): A message reader to use to read messages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

python_modules/dagster-pipes/dagster_pipes/__init__.py Outdated Show resolved Hide resolved


@experimental
class PipesLambdaLogsMessageReader(PipesMessageReader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to implement this using the PipesLogReader API used for databricks? If I'm reading this correctly this doesn't support continuous polling of the logs.

Or does that not work/is otherwise inappropriate for lambda?

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda sync invoke API gives you back the last 4k of logs directly in the response, so this initial version is exploring that. While it does have its limitations, its very simple and as long as the meaningful messages are towards the end of the computation (which i believe would be common) then it should work.

This seems very unreliable. I suggest doing something less rickety.

Ideas:

  1. Structured messages via s3
  2. Wait to serialize structured messages until the very end to guarantee availability.

Open to other suggestions.

@alangenfeld alangenfeld changed the base branch from master to al/11-27-_pipes_move_s3_tests_to_dagster_aws November 27, 2023 21:31
@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from a5233d6 to efb6de5 Compare November 27, 2023 21:31
@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from efb6de5 to 32dc6e2 Compare November 27, 2023 21:39
Copy link
Member Author

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait to serialize structured messages until the very end to guarantee availability.

I did this with [1]

Structured messages via s3

This is under test / demonstrated [2]

A maybe not grounded motivation for keeping the log tails baed default is avoiding the need to get boto3 dep installed in your lambda. Anecdotally, copy pasting the single file for access to dagster-pipes is pretty low friction where as getting boto3 leads to https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-dependencies . I assume most mature users of lambda have some tooling to manage and package deps, but not clear to me at this time how prevalent that maturity is.

Comment on lines +738 to +755
class PipesBufferedStreamMessageWriterChannel(PipesMessageWriterChannel):
"""Message writer channel that buffers messages and then writes them all out to a
`TextIO` stream on close.
"""

def __init__(self, stream: TextIO):
self._buffer = []
self._stream = stream

def write_message(self, message: PipesMessage) -> None:
self._buffer.append(message)

def flush(self):
for message in self._buffer:
self._stream.writelines((json.dumps(message), "\n"))
self._buffer = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1]

Comment on lines 257 to 270
PipesLambdaClient(
FakeLambdaClient(),
message_reader=PipesS3MessageReader(
client=s3_client, bucket=_S3_TEST_BUCKET, interval=0.001
),
)
.run(
context=context,
function_name=LambdaFunctions.pipes_s3_messages.__name__,
event={},
)
.get_materialize_result()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2]

Comment on lines 39 to 44
s3_client = boto3.client(
"s3", region_name="us-east-1", endpoint_url="http://localhost:5193"
)
with open_dagster_pipes(
params_loader=PipesMappingParamsLoader(event),
message_writer=PipesS3MessageWriter(client=s3_client, interval=0.001),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2]

@alangenfeld alangenfeld force-pushed the al/11-27-_pipes_move_s3_tests_to_dagster_aws branch from cdff2b3 to ef4166e Compare November 29, 2023 15:25
@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 32dc6e2 to 3746da4 Compare November 29, 2023 15:25
Base automatically changed from al/11-27-_pipes_move_s3_tests_to_dagster_aws to master November 29, 2023 15:53
@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 3746da4 to 0cd8dc5 Compare November 29, 2023 15:53
Comment on lines +760 to +785
class PipesMappingParamsLoader(PipesParamsLoader):
"""Params loader that extracts params from a Mapping provided at init time."""

def __init__(self, mapping: Mapping[str, str]):
self._mapping = mapping

def is_dagster_pipes_process(self) -> bool:
# use the presence of DAGSTER_PIPES_CONTEXT to discern if we are in a pipes process
return DAGSTER_PIPES_CONTEXT_ENV_VAR in os.environ
return DAGSTER_PIPES_CONTEXT_ENV_VAR in self._mapping

def load_context_params(self) -> PipesParams:
return _param_from_env_var(DAGSTER_PIPES_CONTEXT_ENV_VAR)
raw_value = self._mapping[DAGSTER_PIPES_CONTEXT_ENV_VAR]
return decode_env_var(raw_value)

def load_messages_params(self) -> PipesParams:
return _param_from_env_var(DAGSTER_PIPES_MESSAGES_ENV_VAR)
raw_value = self._mapping[DAGSTER_PIPES_MESSAGES_ENV_VAR]
return decode_env_var(raw_value)


class PipesEnvVarParamsLoader(PipesMappingParamsLoader):
"""Params loader that extracts params from environment variables."""

def __init__(self):
super().__init__(mapping=os.environ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially a behavior change if someone smashes os.environ by assigning a new dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any action is worth taking on this, but just wanted to call it out.

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great. Let's write some docs on this and post it to #dagster-pipes for this week's release.

Comment on lines 123 to 129
"""Message reader that consumes buffered pipes messages that were flushed on exit from the
final 4k of logs that are returned from issuing a sync lambda invocation.

Limitations: If the volume of pipes messages exceeds 4k, messages will be lost and it is
recommended to switch to PipesS3MessageWriter & PipesS3MessageReader.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note explicitly that messages emitted during the computation will only be emitted to the Dagster event log until after the lambda complete.

@alangenfeld alangenfeld force-pushed the al/11-07-_prototype_lambda_pipes_client branch from 0cd8dc5 to 3419ee6 Compare December 4, 2023 17:13
Copy link

github-actions bot commented Dec 4, 2023

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-qic7klf09-elementl.vercel.app
https://al-11-07--prototype-lambda-pipes-client.components-storybook.dagster-docs.io

Built with commit 3419ee6.
This pull request is being automatically deployed with vercel-action

Copy link

github-actions bot commented Dec 4, 2023

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-m8yrrxoos-elementl.vercel.app
https://al-11-07--prototype-lambda-pipes-client.dagster.dagster-docs.io

Direct link to changed pages:

Copy link

github-actions bot commented Dec 4, 2023

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-ai5qvzk75-elementl.vercel.app
https://al-11-07--prototype-lambda-pipes-client.core-storybook.dagster-docs.io

Built with commit 3419ee6.
This pull request is being automatically deployed with vercel-action

@alangenfeld alangenfeld merged commit 57f56c6 into master Dec 4, 2023
3 checks passed
@alangenfeld alangenfeld deleted the al/11-07-_prototype_lambda_pipes_client branch December 4, 2023 18:54
erinkcochran87 added a commit that referenced this pull request Dec 4, 2023
## Summary & Motivation

This PR adds a guide for using Pipes with AWS Lambda (#17924)

## How I Tested These Changes

👀 , help from Alex
zyd14 pushed a commit to zyd14/dagster that referenced this pull request Jan 20, 2024
Create a pipes client for AWS lambda. 

## How I Tested These Changes

added unit tests that use a fake version of lambda
manually tested against a real lambda instance
zyd14 pushed a commit to zyd14/dagster that referenced this pull request Jan 20, 2024
## Summary & Motivation

This PR adds a guide for using Pipes with AWS Lambda (dagster-io#17924)

## How I Tested These Changes

👀 , help from Alex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants