Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flytepropeller][flyteadmin] Streaming Decks V2 #6053

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Nov 27, 2024

Tracking issue

#5574

Why are the changes needed?

To enhance user visibility into Flyte Decks at different stages of workflow execution (running, failing, and succeeding), enabling better debugging and analysis.

What changes were proposed in this pull request?

Concept:

  1. propeller will turn node info to NodeExecutionEvent, and send it to admin.

nev, err := ToNodeExecutionEvent(
nCtx.NodeExecutionMetadata().GetNodeExecutionID(),
p,
nCtx.InputReader().GetInputPath().String(),
nCtx.NodeStatus(),
nCtx.ExecutionContext().GetEventVersion(),
nCtx.ExecutionContext().GetParentInfo(), nCtx.Node(),
c.clusterID,
nCtx.NodeStateReader().GetDynamicNodeState().Phase,
c.eventConfig,
targetEntity)

Life Cycle:

use new flytekit > 1.14.0

summary:

  1. NO HEAD request to be called. (save resource)
  2. use config from task template to know whether enable deck or not

details:

  1. propeller keep adding DeckURI when the task is running if FLYTE_ENABLE_DECK=true in the task template.
  2. propeller will put DeckURI to node info, and turn it to NodeExecutionEvent to flyte admin.
  3. flyte admin will add DeckURI to Closure
  4. flyte console will get DeckURI by sending request to admin.
    nativeURL = node.GetClosure().GetDeckUri()
    }
    } else {
    return nil, errors.NewFlyteAdminErrorf(codes.InvalidArgument, "unsupported source [%v]", reflect.TypeOf(req.GetSource()))
    }
    if len(nativeURL) == 0 {
    return nil, errors.NewFlyteAdminErrorf(codes.Internal, "no deckUrl found for request [%+v]", req)
    }
    ref := storage.DataReference(nativeURL)
    meta, err := s.dataStore.Head(ctx, ref)
    if err != nil {
    return nil, errors.NewFlyteAdminErrorf(codes.Internal, "failed to head object before signing url. Error: %v", err)
    }
  5. if flyte console can't get the DeckURI from the node Closure, it will not show the Flyte Deck button.

old flytekit <= 1.14.0

summary:

  1. we keep the backward compatible (show deck when succeed)

details:

  1. In the terminal state, use a HEAD request to know if the Deck URI exists or not.
    if exist, then put it to the node info.

How was this patch tested?

  1. unit test and remote execution.

python code:

from flytekit import ImageSpec, task, workflow
from flytekit.deck import Deck

flytekit_hash = "6b55930d0a77efc3594ebaac056f2c75024e61b5"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"

# Define custom image for the task
custom_image = ImageSpec(packages=[flytekit],
                            apt_packages=["git"],
                            registry="localhost:30000",
                            env={"FLYTE_SDK_LOGGING_LEVEL": 10},
                         )

@task(enable_deck=False, container_image=custom_image)
def t_no_deck():
    # Deck.publish()
    print("No Deck")

@task(enable_deck=True, container_image=custom_image)
def t_deck():
    import time
    """
    1st deck only show timeline deck
    2nd will show
    """
    for i in range(3):
        Deck.publish()
        time.sleep(1)

@task(enable_deck=True, container_image=custom_image)
def t_fail_deck():
    import time

    for i in range(3):
        Deck.publish()
        time.sleep(3)
    time.sleep(10)
    raise ValueError("Failed Deck")

@workflow
def wf():
    t_no_deck()
    t_deck()
    t_fail_deck()

if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner
    import os

    runner = CliRunner()
    path = os.path.realpath(__file__)

    result = runner.invoke(pyflyte.main,
                           ["run", path, "t_no_deck"])
    print("Local Execution: ", result.output)

    result = runner.invoke(pyflyte.main,
                           ["run", "--remote", path,"wf"])
    print("Remote Execution: ", result.output)

Setup process

single binary.

flyte: this branch
flytekit: flyteorg/flytekit#2779
flyteconsole: flyteorg/flyteconsole#890

Screenshots

flytekit branch:
flyteorg/flytekit#2779

NEW FLYTEKIT, NO DECK, RUNNING With Deck, SUCCEED, and FAILED

OSS-STREAMING-DECK-small.mov

OLD FLYTEKIT, NO DECK, RUNNING With Deck, SUCCEED, and FAILED

OSS-STREAMING-DECK-OLD-FLYTEKIT-small.mov

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

follow up questions

  1. should we support Abort phase for the streaming deck?

should we support EPhaseAbort in this file?

https://github.com/flyteorg/flyte/blob/b3330ba4430538f91ae9fc7d868a29a2e96db8bd/flytepropeller/pkg/controller/nodes/handler/transition_info.go

  1. how can we support the auto-refresh UX?

Summary by Bito

Implementation of enhanced Flyte Deck streaming functionality for improved workflow execution visibility. Introduces real-time deck URI handling in Flytepropeller and Flyteadmin, supporting deck information display across various execution states. Maintains backward compatibility with Flytekit <=1.14.0 while optimizing for newer versions through FLYTE_ENABLE_DECK configuration.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 2

Future-Outlier and others added 2 commits November 27, 2024 23:36
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Yi Cheng <[email protected]>
Co-authored-by: pingsutw  <[email protected]>
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

Attention: Patch coverage is 39.78495% with 56 lines in your changes missing coverage. Please review.

Project coverage is 37.04%. Comparing base (b8fb68d) to head (74f595f).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...lytepropeller/pkg/controller/nodes/task/handler.go 41.37% 42 Missing and 9 partials ⚠️
flyteidl/gen/pb-go/flyteidl/core/tasks.pb.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6053      +/-   ##
==========================================
+ Coverage   37.02%   37.04%   +0.01%     
==========================================
  Files        1317     1318       +1     
  Lines      132534   132630      +96     
==========================================
+ Hits        49067    49127      +60     
- Misses      79221    79246      +25     
- Partials     4246     4257      +11     
Flag Coverage Δ
unittests-datacatalog 51.58% <ø> (ø)
unittests-flyteadmin 54.34% <100.00%> (+0.10%) ⬆️
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 62.29% <ø> (-0.05%) ⬇️
unittests-flyteidl 7.23% <0.00%> (ø)
unittests-flyteplugins 53.85% <ø> (ø)
unittests-flytepropeller 42.61% <41.37%> (-0.03%) ⬇️
unittests-flytestdlib 55.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Future-Outlier <[email protected]>
switch pluginTrns.pInfo.Phase() {
case pluginCore.PhaseSuccess:
// This is to prevent the console from potentially checking the deck URI that does not exist if in final phase(PhaseSuccess).
err = pluginTrns.RemoveNonexistentDeckURI(ctx, tCtx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do a head call on the deck URI for every task that succeeds? Two thoughts here:
(1) does the flyteadmin merge algorithm then remove the deckURI from the execution metadata?
(2) this is incurring a 20-30ms performance degredation to every task execution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take a look tmr, thank you!!!

Copy link
Member Author

@Future-Outlier Future-Outlier Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do a head call on the deck URI for every task that succeeds?

yes it will do a head call by RemoteFileOutputReader

func (r RemoteFileOutputReader) DeckExists(ctx context.Context) (bool, error) {
md, err := r.store.Head(ctx, r.outPath.GetDeckPath())
if err != nil {
return false, err
}
return md.Exists(), nil
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you know the performance degradation?
did you use grafana or other performance tools?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the flyteadmin merge algorithm then remove the deckURI from the execution metadata?

flyteadmin will set the deckURI in the execution metadata to nil if the propeller removes it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Future-Outlier
Copy link
Member Author

Future-Outlier commented Nov 27, 2024

How to test it?

  1. start a new sandbox
flytectl demo start --image futureoutlier/sandbox:deck-1205-1138 --force
  1. checkout streaming deck flytekit branch
cd flytekit
gh pr checkout 2779
  1. run a failure task (show deck after it failed)
from flytekit import ImageSpec, task, workflow
from flytekit.deck import Deck

flytekit_hash = "473ae1119af6f86c26c0790dee0affa3eb29be64"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"

# Define custom image for the task
custom_image = ImageSpec(packages=[flytekit],
                            apt_packages=["git"],
                            registry="localhost:30000",
                            env={"FLYTE_SDK_LOGGING_LEVEL": 10},
                         )

@task(enable_deck=True, container_image=custom_image)
def t_deck():
    import time
    """
    1st deck only show timeline deck
    2nd will show
    """
    for i in range(5):
        Deck.publish()
        # # raise Exception("This is an exception")
        time.sleep(3)

@workflow
def wf():
    t_deck()

if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner
    import os

    runner = CliRunner()
    path = os.path.realpath(__file__)

    # result = runner.invoke(pyflyte.main,
    #                        ["run", path, "wf"])
    # print("Local Execution: ", result.output)

    result = runner.invoke(pyflyte.main,
                           ["run", "--remote", path,"wf"])
    # "--remote"
    print("Remote Execution: ", result.output)

@EngHabu
Copy link
Contributor

EngHabu commented Nov 28, 2024

Mind adding screenshots for the rendered deck and refresh to the PR description?

@Future-Outlier
Copy link
Member Author

Mind adding screenshots for the rendered deck and refresh to the PR description?

Yes no problem

@Future-Outlier
Copy link
Member Author

Mind adding screenshots for the rendered deck and refresh to the PR description?

its provided!
#6053 (comment)

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 2, 2025

Code Review Agent Run #f3ef5e

Actionable Suggestions - 1
  • flytepropeller/pkg/controller/nodes/task/handler.go - 1
    • Consider impact of removing deckPath parameter · Line 120-120
Additional Suggestions - 1
  • flytepropeller/pkg/controller/nodes/task/handler.go - 1
    • Consider removing unnecessary string cast · Line 43-43
Review Details
  • Files reviewed - 3 · Commit Range: 54aa165..65b6efe
    • flyteadmin/pkg/repositories/transformers/node_execution.go
    • flyteadmin/pkg/repositories/transformers/node_execution_test.go
    • flytepropeller/pkg/controller/nodes/task/handler.go
  • Files skipped - 0
  • Tools
    • Golangci-lint (Linter) - ✖︎ Failed
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • OWASP (Security Vulnerability) - ✔︎ Successful
    • GOVULNCHECK (Security Vulnerability) - ✖︎ Failed
    • SNYK (Security Vulnerability) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Collaborator

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Enhanced Flyte Deck Streaming Support

node_execution.go - Added DeckUri support in node execution transformers

node_execution_test.go - Added tests for DeckUri functionality

handler.go - Implemented real-time deck streaming with backward compatibility

return nil
}

func (p *pluginRequestedTransition) CacheHit(outputPath storage.DataReference, entry catalog.Entry) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider impact of removing deckPath parameter

The method signature for CacheHit has been modified to remove the deckPath parameter, but it seems this parameter may still be needed based on the usage in the code. Consider if this change could cause issues with deck path handling.

Code suggestion
Check the AI-generated fix before applying
Suggested change
func (p *pluginRequestedTransition) CacheHit(outputPath storage.DataReference, entry catalog.Entry) {
func (p *pluginRequestedTransition) CacheHit(outputPath storage.DataReference, deckPath *storage.DataReference, entry catalog.Entry) {

Code Review Run #f3ef5e


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

// - We relied on a HEAD request to check if the deck file exists, then added the URI to the event.
//
// After (new behavior):
// - If `FLYTE_ENABLE_DECK = true` is set in the task template config (requires Flytekit > 1.14.0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is no longer correct right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes super nice catch

@@ -380,6 +430,27 @@ func (t Handler) fetchPluginTaskMetrics(pluginID, taskType string) (*taskMetrics
return t.taskMetricsMap[metricNameKey], nil
}

func GetDeckStatus(ctx context.Context, tCtx *taskExecutionContext) (DeckStatus, error) {
// FLYTE_ENABLE_DECK is used when flytekit > 1.14.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this comment


metadata := template.GetMetadata()
if metadata == nil {
return DeckUnknown, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct in older versions of flytekit? didn't tasks in the past also have this field? this means that this function will always return Disabled right for older versions of flytekit. meaning the condition on line 567 won't get triggered cuz it'll be disabled instead of unknown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, you are right, thinking solution.

@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 9, 2025

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - Bito Code Review Agent didn't review this pull request automatically because it exceeded the size limit. No action is needed if you didn't intend for the agent to review it. Otherwise, you can initiate the review by typing /review in a comment below.

Comment on lines 432 to 445
// GetDeckStatus determines whether a task generates a deck based on its execution context.
//
// This function ensures backward compatibility with older Flytekit versions using the following logic:
// 1. For Flytekit > 1.14.3, the task template's metadata includes the `generates_deck` flag:
// - If `generates_deck` is set to true, it indicates that the task generates a deck, and DeckEnabled is returned.
// 2. If `generates_deck` is set to false or is not set (likely from older Flytekit versions):
// - DeckUnknown is returned as a placeholder status.
// - In terminal states, a HEAD request can be made to check if the deck file exists.
//
// In future implementations, a `DeckDisabled` status could be introduced for better performance optimization:
// - This would eliminate the need for a HEAD request in the final phase.
// - However, the tradeoff is that a new field would need to be added to FlyteIDL to support this behavior.

template, err := tCtx.tr.Read(ctx)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better comments!
cc @wild-endeavor

@Future-Outlier
Copy link
Member Author

Streaming Decks

#!/usr/bin/env bash

set -ex

ARCH="$(uname -m)"
case ${ARCH} in
x86_64|amd64)
  IMAGE_ARCH=amd64
  ;;
aarch64|arm64)
  IMAGE_ARCH=arm64
  ;;
*)
  >&2 echo "ERROR: Unsupported architecture: ${ARCH}"
  exit 1
  ;;
esac

FLYTECONSOLE_IMAGE="localhost:30000/flyteconsole:1216-2134"
IMAGE_DIGEST="$(docker manifest inspect --verbose localhost:30000/flyteconsole:1216-2134 | \
    jq --arg IMAGE_ARCH "${IMAGE_ARCH}" --raw-output \
      '.[] | select(.Descriptor.platform.architecture == $IMAGE_ARCH) | .Descriptor.digest')"

# Short circuit if we already have the correct distribution
[ -f cmd/single/dist/.digest ] && grep -Fxq ${IMAGE_DIGEST} cmd/single/dist/.digest && exit 0

# Create container from desired image
CONTAINER_ID=$(docker create localhost:30000/flyteconsole:1216-2134)
trap 'docker rm -f ${CONTAINER_ID}' EXIT

# Copy distribution
rm -rf cmd/single/dist
docker cp ${CONTAINER_ID}:/app cmd/single/dist
printf '%q' ${IMAGE_DIGEST} > cmd/single/dist/.digest

@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 13, 2025

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - Bito Code Review Agent didn't review this pull request automatically because it exceeded the size limit. No action is needed if you didn't intend for the agent to review it. Otherwise, you can initiate the review by typing /review in a comment below.

Future-Outlier and others added 4 commits January 13, 2025 23:59
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Eduardo Apolinario  <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 13, 2025

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - Bito Code Review Agent didn't review this pull request automatically because it exceeded the size limit. No action is needed if you didn't intend for the agent to review it. Otherwise, you can initiate the review by typing /review in a comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants