Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probes test failure: index out of range #225

Open
vqmarkman opened this issue Sep 1, 2023 · 6 comments
Open

Probes test failure: index out of range #225

vqmarkman opened this issue Sep 1, 2023 · 6 comments
Labels
high high priority testing
Milestone

Comments

@vqmarkman
Copy link
Contributor

Happened in PR: https://github.com/sundeck-io/OpsCenter/actions/runs/6047474866/job/16413995185
During initial loading of the page.

probes.spec.cy.js.mp4

cc @rymurr

@vqmarkman vqmarkman added the high high priority label Sep 1, 2023
@vqmarkman vqmarkman added this to the Sprint 24 milestone Sep 1, 2023
@vqmarkman vqmarkman self-assigned this Sep 1, 2023
@vqmarkman
Copy link
Contributor Author

My conclusion is that this is OpsCenter bug and I can't reproduce it locally :(
The worst part about it is that video does not show last (most important part) of the stack trace.

@doronrosenberg , @susannahmarcus - is there any other way to debug this type of failure? Where should I be looking for the logs?

@susannahmarcus
Copy link
Contributor

susannahmarcus commented Sep 1, 2023

I can't say much since I haven't worked on this, but I can weigh in on the viewport. For future instances like this, we could make it so that it isn't cut off. Additionally, we might be able to add a check for the type of an error message and then have it spit that into the cypress log, so we can get more information.

If you'd like to work on that together, feel free to reach out and we can try some stuff out locally 😊 👍

@joshelser
Copy link
Contributor

The worst part about it is that video does not show last (most important part) of the stack trace.

Well, I think we know about where it is failing. It's inside the _call(..) method, so we can assume it's stuck on the snippet in probes.py

_ = self.snowflake.call(
            "INTERNAL.REPORT_ACTION",
            "probes",
            "list",
        )

self.snowflake here is a snowflake-snowpark-python Session instance, which, I'm guessing we're on the line https://github.com/snowflakedb/snowpark-python/blob/v1.7.0/src/snowflake/snowpark/session.py#L2447 inside of the Snowpark library.

I'd guess that we didn't get any rows back (for whatever reason) like #35 which caused this. Maybe we need to try to figure out why this Session seems to be returning bogus results sometimes?

@vqmarkman
Copy link
Contributor Author

@joshelser, the "interesting" part about this thing is ... of course this particular case (probes test) is failing in automation and is not failing locally for me. I'm not sure what the next steps here are ...

@vqmarkman vqmarkman removed their assignment Sep 1, 2023
@joshelser
Copy link
Contributor

I wonder if this is related to a2f5b67

https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/api/snowflake.snowpark.Session mentions that Session is not threadsafe. My thinking is that adding the extra chart on the index page has caused general de-stabilization of the rest of the tests (since they start by going to / and then finding the next thing to click on).

I don't know for sure how the native app framework works, but given what we're seeing, I'd believe that it falls into the bucket of "not thread safe"-ness of the Session. As a quick band-aid, could we wait for warehouse heatmap to fully load before we click past it? Maybe that would help stabilize.

@vqmarkman
Copy link
Contributor Author

vqmarkman commented Sep 1, 2023

@joshelser, here is more of what we've discovered:

  • we were able to reproduce this locally by clicking on the tabs on the left side bar
  • it does look like all sorts of random things are failing (not only on probes pages) when other pages are supposedly still being rendered.
  • I'm going to put a temporary bandaid (hardcoded sleep for each tab) just to eliminate random failures.
    (until I find out how to correctly wait in Cypress for Streamlit page to finish loading)
  • This is NOT going to fix OpsCenter, users can still get random failures while clicking on pages manually

vqmarkman added a commit to vqmarkman/OpsCenter that referenced this issue Sep 1, 2023
             Bit heavy handed, but I don't want this issue to block any further development.
             Will have a better fix when I find the solution.
vqmarkman added a commit to vqmarkman/OpsCenter that referenced this issue Sep 1, 2023
…e to issue described in issue sundeck-io#225

             Bit heavy handed, but I don't want this issue to block any further development.
             Will have a better fix when I find the solution.
joshelser pushed a commit that referenced this issue Sep 2, 2023
… described in issue #225 (#229)

Bit heavy handed, but I don't want this issue to block any further development.
             Will have a better fix when I find the solution.
@rymurr rymurr modified the milestones: Sprint 24, Sprint 25 Sep 12, 2023
@rymurr rymurr added the testing label Sep 12, 2023
@rymurr rymurr modified the milestones: Sprint 25, Sprint 26 Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high high priority testing
Projects
Status: No status
Development

No branches or pull requests

4 participants