-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] A collection of items to improve developer / CI speed #13813
Comments
If this could be improved it would be wonderful. Arrow and Datafusion are very important projects using lexical and I want to test all my code against them and other high impact projects as an additional precaution in addition to our own checks internally, so being able to run these tests locally would be a huge benefit on my end. |
Is there a way to test changes to the ci pipeline without actually checking in files to DF? |
We could in theory change the triggering rules for the the jobs to run on any changes to |
If no one beats me to this I can likely take a look at the ci pipeline during the holiday break. I still want to finish the sqlite test integration first if possible though. |
I also hope to hack on various parts of this EPIC during the break I have some other ideas (like reducing the number of distinct binaries for example) |
@Omega359 you can enable the workflow(s) on your fork |
LOL your screenshots show 37m --> 19m (which is a 2x improvement by my accounting) so that is a somewhat modest description :)
I think the hash collisions one fails so irregularly, that we should consider simply running it on main (not on all PRs) as discussed in #13845. While that does indeed defer the potential for finding issues I think the tradeoff might be worth it in this specific case |
Is your feature request related to a problem or challenge?
As DataFusion becomes more mature and gets more features and tests, the amount of time it takes to build and test the system has been increasing
This has several downsides:
Barrier to new contributors is higher
The resources required to build / link DataFusion are now very large which means some people may not be able to run them. For example @Alexhuszagh reports on #13693 (comment)
We are likely wasting lots of resources running CI tests
By my count the CI checks on the most recent checkin (link) require over 200 hours of runner time
We are lucky the ASF gets many runner credits from github, but this level is very wasteful in my opinion and likely unsustainable as we contemplate adding additional testing such as
Larger binary size
I have noticed that the datafusion-cli binary is now almost 100MB it used to be 60MB
Also, people such as @g3blv have noticed that the WASM build has increased 50%:
#9834 (comment)
See
Decreased developer productivity due to longer cycle times
As DataFusion gets bigger, I have noticed that recompiling datafusion for me takes longer and longer
cargo test ...
Describe the solution you'd like
I would like to make the building and testing of DataFusion "easier" and leaner. This ticket tracks work to improve things in this area
Describe alternatives you've considered
datafusion
core crate) #13814Additional context
No response
The text was updated successfully, but these errors were encountered: