Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build time regression #14256

Open
waynexia opened this issue Jan 24, 2025 · 4 comments
Open

Build time regression #14256

waynexia opened this issue Jan 24, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@waynexia
Copy link
Member

Is your feature request related to a problem or challenge?

We observed a huge increase after upgrading datafusion GreptimeTeam/greptimedb#5417. I run a script to test the build time change day by day since v38.0, and here is the result:

Image

The build time keeps increasing and is almost doubled since v38.0. Given the codebase keeps adding new code, it's expected to see a trend of increasing, but there are still some obvious "platforms" and "jumps", which might be abnormal.

This is the raw data:

build-times.csv

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@waynexia waynexia added the enhancement New feature or request label Jan 24, 2025
@waynexia
Copy link
Member Author

A few more details:

The build command I use is cargo build --release --timings --lib --quiet

The latest timing file is (rename it to HTML, restriction from GitHub): Cargo Timing Jan 23 2025.txt

And screenshot:
Image

The datafusion crate (core) spends time for single thread compiling, which would be a major reason for slowing down the build time. I can observe this bottleneck in the downstream greptimedb project as well.

@jayzhan211
Copy link
Contributor

After physical-optimizer, datasource could be a potential target to move out of core.

@findepi
Copy link
Member

findepi commented Jan 24, 2025

datasource could be a potential target to move out of core.

related

The build time keeps increasing and is almost doubled since v38.0. Given the codebase keeps adding new code, it's expected to see a trend of increasing

Did the amount of code double since v38? i doubt that, but @waynexia can you maybe chart build time against amount of code?
Also, can you make sure there is no bias in the measurement? If you build in reverse order and run cargo clean between each steps, will you get the same results? (i believe we regressed on the build time, but let's double check facts before jumping to conclusions)

and "jumps", which might be abnormal.

The biggest jump is at the beginning of August 2024.
Can you perhaps bisect those few days to determine the particular change that caused this?

Is the X axis position based on Commit Date of the current tip commit?

@alamb
Copy link
Contributor

alamb commented Jan 24, 2025

After physical-optimizer, datasource could be a potential target to move out of core.

Yes, 100% splitting out datasource is my next thing I would love to see (and I think it will help build times massively)

Thank you @waynexia for working on this project. It will be appreciated by all 🙏

As @jayzhan211 says, Once we complete this epic (@buraksenn and @berkaysynnada are pretty close)

Then I can help organize an effort to break out the data sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants