Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about large spark application's being throttled #661

Open
KiritoCurry opened this issue Mar 7, 2020 · 0 comments
Open

Question about large spark application's being throttled #661

KiritoCurry opened this issue Mar 7, 2020 · 0 comments

Comments

@KiritoCurry
Copy link

Hi I'm kinda new to dr elephant and when I was deploying and testing it on my machines, I found large spark application logs (large than 100MB by default) will be ignored and won't show up in the UI due to the throttle behavior. There might be something I missed, but based on the code, does it mean dr elephant will skip all spark applications whose size is larger than eventLogSizeLimitMb (by default 100MB)? Is this how dataCollection.throttle() expected to behave? If my understanding is wrong, can someone tell me how the throttle works for large spark applications? If my understanding is right, is there any remarkable bottleneck on dr elephant for large spark applications? I think it's easy for a spark application log to go beyond several GB, and how dr elephant's gonna solve it? Thanks in advance for any helps and suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant