-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] display main.log as artifact for each step #10036
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I read that this issue was not included in the previous release as mentioned in this comment , just wanted to ask whether we have an update on this or not. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
/reopen Is readding this feature to v2 planned for future releases? It would be very helpful. |
@AnnKatrinBecker: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I'm a bit unfamiliar with the v1 version of this, did this come from argo's archive logging feature? In v2 kfp aims to be agnostic to the backing engine, so if this is an argo specific feature, we may need to consider a separate feature/proposal on how to handle persistent logging in general for kfp. |
Yes, Argo Workflows stores the logs separately and given only kfp metadata, there currently seems to be no way to obtain the logs' locations. You'd need to access the Argo Workflow object directly to obtain the info. Including this feature right into kfp would make sense and imo is much needed, as otherwise, users aren't able to view logs after the step's pod has been deleted. |
Hi @HumairAK What do you think about this ?
Thoughts:
|
/reopen |
@HumairAK: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I agree persisted logging is very important, but we should clarify if we need them as artifacts, or do users just want to see logs after pods have been removed? For example, if we have persisted logs, the log viewer could instead fetch the logs from the persisted location instead of pods (I believe this is the logic currently present in ui, though it may not be utilized in v2). Could it be confusing to see 2 different locations where logs are being shown?
Not disagreeing, but could you elaborate on specifics? I like the idea of decoupling implementation to existing solutions (or at least giving the user the option to, if we opt to natively implement),
I think we should also allow users to disable/enable not only in the SDK, but via api & ui as well. |
@sanchesoon Can you give more details on this workaround? I am currently using KFP |
Currently, the log artifact URI is not saved in MLMD. The UI is just guessing and guessing incorrectly, heh. If we store it as an output artifact in MLMD, that removes any guess work but substantially broadens the scope of the changes and will probably require design review from Google, which is doable ofc, it just delays things. |
@droctothorpe jfc... |
Another argument in favor of storing the log as an output artifact is what if an admin elects to update the keyFormat? The logs for historical runs will no longer be accessible, and there won't be a way for the UI to support both the old and new patterns at once. |
My personal preference for maturity and stability would be to go with the MLMD and google review or an alternative store if possible. I'd even investigate if creating additional table in the mlpipeline db makes sense just to store the log artifacts URIs for given Step ID. Considering all the discussion, I don't feel competent enough to suggest if the workaround with the env var in the |
I'm starting to lean in that direction as well, but what the heck do I know, I am just a Do ya'll think it make sense to submit a smaller, discrete PR that corrects the UI's incorrect guess for the time being until we can implement a broader fix that adds the log URI to MLMD? We can aim to submit the PR tomorrow (at AWS Summit all day today) to give you a sense of the scope. |
If this can be added to the KF 1.9 then maybe it makes sense to unblock this with the idea you've proposed with |
I just want to highlight what I said in #10036 (comment) again. This is very clearly an accidental regression, because KFP V1 stores the logs in an MLMD artifact (it just did not automatically forward the logs tab to the appropriate artifacts and required users to click the step and view it). The only additional complexity (which is probably why this was overlooked) is that in KFP v2 now each step is more than one container, each with its own logs. Assuming we store the logs for all steps as an artifact (even the non-executor steps) we should correctly show them separately in the logs tab probably with some kind of sub-tab for each container. |
@chensun @zijianjoy @connor-mccarthy Do you have any objections to adding the driver and executor logs to the list of output artifacts for each component? I just want to make sure that removing that functionality was not a deliberate design choice before we re-implement it in v2. Thank you! |
I think the executor outputs are being extracted from the IR, and currently the IR doesn't include information about the container logs. We need to decide if we want to include the container logs as executor outputs, which is quite complicated because (a) the path is dynamically inferred by |
POC for the interim solution: #11010. This will make it possible for anyone to implement a solution in a way that accommodates their unique I plan to join the community call tomorrow to hear from Google about the proposed target-state solution of storing logs as output artifacts. |
#11010 was just merged 🎉 . It might make make sense to keep this issue open to continue discussion about wether or not to / how to surface driver and executor logs as explicit outputs in the GUI. |
Hey, I noticed the PR is already included in the The
In the short period of time when the
Do you know what might be causing the issue? |
FYI, I was testing with kubeflow/manifests 1.9.0 and only changed the container images tags for pipeline components to |
@kromanow94, someone in the (deprecated) Kubeflow slack workspace recently DMed me reporting what looks like a very similar error and a corresponding solution:
That error is raised here. It is raised when the workflow corresponding to the run has been garbage collected. When that fails, it should move on to trying to retrieve the logs directly from the artifact store. That sequencing (K8s API lookup -> WF manifest lookup -> artifact store lookup) is defined here. Note that the third step only executes if |
Thanks, I solved the issue with the
It seems that I forgot about the env Now with those two issues resolved, there is some other issue I'm having right now. It seems the frontend uses the workflow pod name in place of the workflow name to get the logs. The Argo Workflows and ml-pipeline-ui are set with the following key format:
Now, I'm trying a scenario when I manually delete just one Workflow Pod and see if I can get the logs. I'm running that with curl with the following command and error:
Looking at logs from the
But, having a look at the storage in minio, the path seems to be different in the workflow name:
I also run the same test when both the ArgoWF and its Pods were deleted with the same result. |
I also run another scenario with one of the default pipelines. Using the
With the pipeline above, I confirm the mechanism works and I was able to get to the logs through the artifacts when the Pod was no longer available on the cluster. For my other test I used pipeline created with the following script:
Which creates Pods with following names (the name differs from example in previous comment because it's a new run):
If I understand correctly, this is because of how the Workflow Name is being provided by formatting the pod name: https://github.com/kubeflow/pipelines/blob/2.3.0/frontend/server/workflow-helper.ts#L169 https://github.com/kubeflow/pipelines/blob/2.3.0/frontend/server/workflow-helper.ts#L214 Is this because of the difference in how the KFPv1 and KFPv2 works? |
v2 pod names include suffixes like the first example you shared. Is the |
Also, in case it's related, please make note of this issue: #11019. |
That being said, continuing to do some extra validation on my end to see if I can repro the error. |
Hey @droctothorpe , were you able to have a closer look here? TBH, I'm not exactly sure if the Pipeline I created using the script from #10036 (comment) is KFP v1 or KFP v2. I deployed it using |
FYI adding the following env var to the ml-pipeline-ui might be an easier fix for this error: - name: DISABLE_GKE_METADATA
value: "true" Shout out to @boarder7395 for sharing it. |
@kromanow94 I think you might be having the same issue I did. The two changes required for everything to work are
What I missed was the second item which resulted in the issues you had above. |
Hey, I was able to get the I only had to put those two env variables for the - name: ARGO_ARCHIVE_LOGS
value: "true"
- name: DISABLE_GKE_METADATA
value: "true" Though still a workaround in code, it works. Are there any plans for the proper solution that registers the |
I started working on turning the outputs into proper artifacts. There are a lot of complex design obstacles. There's no simple way to do it that preserves backend agnosticism. We have to decide if we want to rely on AWF to archive the logs and just record the relationship between the execution and the log files in MLMD, which is extremely backend-specific, or just have the launcher capture and publish logs directly, which is kind of reinventing something that AWF already does, and also doesn't account for driver logs. Bandwidth permitting, I may author a design doc. Maybe we can discuss it in the community call. |
Amazing, thank you so much for your time and effort! |
Feature Area
/area frontend
/area backend
What feature would you like to see?
A feature similar to v1 behavior. In the details page of each step, the log appears as one of the artifacts that users can view:
In v2, this main log artifact is no longer displayed. It would be great if we could add a similar section to show the logs, or use the log artifact as a source for the "logs" panel after the pod is deleted, instead of directly pulling from the (already deleted) pod.
What is the use case or pain point?
Because all the logs of completed pods are auto deleted by Kubernetes after 24 hours, the users can no longer access the logs of earlier pipeline runs.
Is there a workaround currently?
No workaround.
/cc @kromanow94
Love this idea? Give it a 👍.
The text was updated successfully, but these errors were encountered: