Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable inference serving capabilities on sagemaker endpoint #536

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gwang111
Copy link

@gwang111 gwang111 commented Dec 27, 2024

Description of changes:
Added source code to enable serving capabilities on SageMaker Endpoint.

  • when the serve command is passed on container startup, the inference server script will execute
  • it will then start a Tornado web server in either async or sync mode

Testing
TODO:

  • will add formal testing once the implementation details are aligned on

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@gwang111 gwang111 changed the title [Feat] Enable inference serving capabilities on sagemaker endpoint using tor… Feat: Enable inference serving capabilities on sagemaker endpoint using tor… Dec 27, 2024
@gwang111 gwang111 changed the title Feat: Enable inference serving capabilities on sagemaker endpoint using tor… feat: Enable inference serving capabilities on sagemaker endpoint Dec 27, 2024
@gwang111 gwang111 force-pushed the inference-serving branch 3 times, most recently from 9cee77b to bb132ea Compare December 27, 2024 21:29
if requirements_txt.is_file():
try:
subprocess.check_call(
[sys.executable, "-m", "pip", "install", "-r", str(requirements_txt)]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMD recommends that we use micromamba instead of pip for installing dependencies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any micromamba resources for installed runtime dependencies without pip. Will reach out to Tian on this

Comment on lines 10 to 11
CODE_DIRECTORY = "SAGEMAKER_INFERENCE_CODE_DIRECTORY"
CODE = "SAGEMAKER_INFERENCE_CODE"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets discuss these inputs offline.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will review these with Saurabh/PM since these will be customer facing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will schedule a meeting next week

build_artifacts/v2/v2.2/v2.2.0/gpu.env.in Outdated Show resolved Hide resolved
@aws-tianquaw
Copy link
Contributor

Can you move your changes under "template" folder?

For any changes/additions in cpu/gpu.env.in file, please create an issue (example) so that we'll add them to the next image major/minor releases

@gwang111
Copy link
Author

gwang111 commented Jan 6, 2025

Can you move your changes under "template" folder?

For any changes/additions in cpu/gpu.env.in file, please create an issue (example) so that we'll add them to the next image major/minor releases

Moved the code to the v3 template folder since this code will be launching in the next major version release

@gwang111 gwang111 force-pushed the inference-serving branch 10 times, most recently from 271755a to 9865a75 Compare January 10, 2025 22:31
astream() runs a provided async generator fn in an async manner.
"""

async def stream(self, generator: Generator):
Copy link

@cj-zhang cj-zhang Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we accept Iterator instead? It will work with Generators too since they're a subtype of Iterator and LangChain stream implementations return [Async]Iterator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack sounds good, will update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants