Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for batch builders #1086

Closed
bdbaddog opened this issue Jan 2, 2018 · 0 comments
Closed

Add support for batch builders #1086

bdbaddog opened this issue Jan 2, 2018 · 0 comments

Comments

@bdbaddog
Copy link
Contributor

bdbaddog commented Jan 2, 2018

This issue was originally created at: 2005-02-28 13:08:34.
This issue was reported by: sbaranov.
sbaranov said at 2005-02-28 13:08:34

Batch builders is a major speed optimization for tools
that support batch mode, such as MSVC compiler.

sbaranov said at 2005-02-28 14:04:15

Here's a brief summary:

  1. Known issue is that some original unit tests fail due to
    Executor now using current() and get_calculator() which are
    not defined in the tests. I think this should be fixed before
    checking in.

  2. There's no new unit tests written yet for the added
    functionality. I'm looking forward to doing this.

  3. Specifying one target on the command line rebuilds all
    targets in the same batch. Seems like a very minor minor
    issue, but still should be fixed sometime in the future.

Additional changes in this patch:

  1. Fixed bug in Node.FS.Dir.current()
  2. Fixed bug in Platform.win32.escape
  3. Proposing to change one call of display() with
    progress_display() in Script.Main - see diff
  4. Requesting to add etc/.cvsignore that has one line: *.pyc -
    no diff for this one

Thanks.

stevenknight said at 2005-04-26 03:58:45

Hi Stanislov--

No, I haven't forgotten this patch. I finally had a chance
to dig into it over this past weekend while travelling.

I realize now that I think I gave you some
poorly-thought-out advice in suggesting that this should be
a builder attribute. As I was working on integrating your
patch with the recent changes, I stumbled on the fact that
the ability to batch object file builds is really better
associated with the Action, not the Builder. For example,
someone would likely use MSVC to build your C/C++ object
files but a different compiler to build Fortran object files
in the same environment, and MSVC can batch its object
builds, but the other compiler might not be able to.

I've started down the path of refactoring the internals to
let the Builder call the underlying Action to generate the
Executor, so that the Action will be able to return a
batched builder or not as appropriate. I will be CC:ing you
on the messages I send to the scons-review mailing list
covering these changes, so you're in the loop.

One other point: the mechanism for selecting the directory
instead of the individual .obj file for the /Fo argument
will end up being a more direct variable expansion, instead
of something that looks for the /Fo and changes the string
after the fact.

Thanks,

   --SK

sbaranov said at 2005-06-22 16:15:34

Steven, I integrated your patch into my (older) version of
SCons, and fixed several bugs with batch mode, including
that last issue that I mentioned in the original description: "3.
Specifying one target on the command line rebuilds all
targets in the same batch."

Steven, what else can I do to help speed up the integration of
this patch?

Stanislav

chehrlic said at 2005-10-30 11:34:17

Any news about the current state of this patch? I maybe
would test it with kdelibs...

issues@scons said at 2005-10-30 11:34:17

Converted from SourceForge tracker item 1153786

stevenknight said at 2006-05-26 04:13:20

Created an attachment (id=22)
Patch, copied from SourceForge tracker.

gregnoel said at 2008-03-16 02:45:39

Created an attachment (id=334)
Patch from issue 1381; see there for details

gregnoel said at 2008-03-16 02:47:55

*** Issue 1381 has been marked as a duplicate of this issue. ***

garyo said at 2008-03-17 20:28:15

Milestone: 2.x.

gregnoel said at 2008-04-09 23:33:21

This issue is being usurped to handle all aspects of batch builders, not only
the issues raised by the original poster, but also the batching requirements for
languages such as Java, FORTRAN, and D.

There are two primary rationales for batch builders. One is simply for better
performance: it's significantly faster for some programs to process all the
source files in a single action. The other is that some applications must
process related files together: the files contain references to each other in a
dependency cycle.

The batched builder mechanism must cater to both requirements, but note that a
facility that handles the second will also handle the first.

Here's a description of how the second requirement could be handled:

At parse time, files would be added to a batch. At build time, the files would
be scanned not only for implicit "include" dependencies but also implicit
"source" dependencies. An implicit "source" dependency is a processed result
that is referenced by the source being scanned, with the implication that the
result cannot be generated after the scanned unit is processed (viz., must be
generated before or at the same time).

Implicit source dependencies should be cached and reused in later runs in the
same way as implicit include dependencies.

After all the batch has been scanned, it undergoes additional analysis. The
directed graph formed by the batch can contain cycles which must be collapsed
into an acyclic graph in order to be processed.

Cyclic source dependencies form "strongly-connected components" [wp-scc] that
can be analyzed by any one of several algorithms; three such links are in the
Wikipedia reference. The set of nodes in a strongly-connected component is
treated as a unit: if any one of them are out-of-date, they are all considered
out-of-date. In the null case, only a single file is in a unit.

[wp-scc] http://en.wikipedia.org/wiki/Strongly_connected_component

For our purposes, Kosaraju's algorithm may be the best choice (although all of
them should be considered) as the transverse graph that is constructed can be
used for compile scheduling.

The resultant DAG is walked leaves to roots. File units that are out-of-date
and any downstream file units are collected. The files that are collected are
those that are passed to the batch command.

This description assumes that implicit source dependencies are all within the
batch. If a source dependency is outside the batch, it probably should be
assumed not to require further analysis. Probably.

dsvensson said at 2008-04-10 04:45:43

"The resultant DAG is walked leaves to roots. File units that are out-of-date
and any downstream file units are collected. The files that are collected are
those that are passed to the batch command."

What does the above mean?

Lets say I have a program:

env.Program('testprog', ['test1.vala', 'test2.vala', 'test3.vala'])

If everything is up to date, and then I edit test2.vala, all sources should be
passed to the builder. Is that what "downstream file units are collected" refer
to? In the case of Vala a scanner doesn't make sense as all the sources must
always be passed to the compiler each time it's invoked.

gregnoel said at 2008-04-10 13:20:01

The vala Builder or the vala scanner can simply mark all the sources as members
of the same cohort (a better word than "unit", why didn't I think of it before?).

Or we can add a flag to accomplish the same effect.

Or, better, create a public scanner that does it.

It's a relatively small detail, but it's worth pointing out that it will be needed.

gregnoel said at 2008-05-30 16:12:22

Bug party triage: take a shot at this in the 1.x timeframe since it's
fundamental to several other issues and would help performance, but be prepared
to push it back to 2.x if it's too hairy.

stevenknight said at 2008-12-08 17:11:07

Setting target milestone to 1.3.

I have batch builders working on a private branch that I'll be uploading to a
public SVN branch in the next day or two. There were a number of moderately
thorny corner-case issues w.r.t. Repositories, SideEffects, identifying the
changed sources in a generally useful way (I ended up adding a $CHANGED_SOURCES
variable), and creating a generalizable mechanism for separating builds into
batches that wasn't just a hard-coding of MSVC's way of doing it.

I'll be working on documentation on the branch and we plan to integrate it for
1.3 (which may end up getting renumbered 2.0, depending on how much we put in
there and whether we finally drop the Python 1.5.2 compatibility).

stevenknight said at 2008-12-10 21:42:14

This is now available in a development branch:

http://scons.tigris.org/svn/scons/branches/sgk_batch

I'll hold this issue open until I integrate it into trunk.

stevenknight said at 2009-01-09 09:25:21

Generic support for batch build actions, along with specific support for MSVC
compilation, added:

http://scons.tigris.org/ds/viewMessage.do?dsForumId=1268&dsMessageId=1014012
http://scons.tigris.org/ds/viewMessage.do?dsForumId=5062&dsMessageId=1014028

Thanks,

gregnoel said at 2009-01-09 09:44:06

I'm not sure if this point belongs in this issue or in one of the follow-on
issues (see the list of blocked issues). A batch builder for Java, FORTRAN, D,
or Vala will need (a) something very similar to the current source scanner, but
which provides a list of additional sources that must appear in the same batch,
and (b) implement Kosaraju's algorithm[1] or similar (see the references at the
bottom of the linked page) to collapse the graph produced by (a) into a DAG that
can be processed by the Taskmaster.

[1] http://en.wikipedia.org/wiki/Kosaraju%27s_algorithm

For FORTRAN, D, and probably Vala, (a) is probably sufficient, since all the
files will probably fit one command invocation. But if there are too many files
for one invocation or for Java, (b) is needed as well.

gregnoel said at 2009-04-15 12:02:29

The batch builder suffers severely from the MxN problem. It accumulates all M
sources into a common list and all the corresponding N targets into a single
list. It then checks every source against every object to see if it is out of
date, requiring MxN comparisons. This is both expensive and unnecessary, as
foo.tgt does not generally depend on bar.src.

Batch clusters provide a framework for fixing this by breaking a long list of
sources into clusters. This need be extended only slightly to associate only
related targets with a given source within the cluster. The out-of-date logic
would examine at most M sources against their corresponding target(s); as soon
as one was detected as out-of-date, the cluster itself needs to be rebuild, so
there's no reason to examine any further sources.

For cases where there really are multiple sources for multiple targets, sources
and targets can be scanned independently of each other to get a consolidated
result and a single comparison between the two to determine if the build need be
done. (There's some short circuiting that can be used; for example, an MD5
comparison can determine that a target is out of date regardless of the state of
any targets and a missing target must be rebuilt regardless of the state of any
source.) As long as the batch clustering work has this area open, this would be
a useful addition.

Votes for this issue: 4.

gregnoel attached scons-batching2.diff at 2008-03-16 02:45:39.

Patch from issue 1381; see there for details

stevenknight attached scons-batching.diff at 2008-03-25 11:08:15.

Patch, copied from SourceForge tracker.

gregnoel said this issue blocks #1923 at 2008-04-14 19:08:27.

gregnoel said this issue blocks #1949 at 2008-04-14 20:24:56.

gregnoel said this issue blocks #1888 at 2008-04-15 00:20:47.

gregnoel said this issue blocks #1766 at 2008-05-30 14:27:15.

gregnoel said this issue blocks #2147 at 2008-07-29 06:00:40.

gregnoel said this issue is duplicated by #1381 at 2008-03-16 02:47:55.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants