-
-
Notifications
You must be signed in to change notification settings - Fork 316
FromMakeToScons
Table of Contents
[TOC]
Evolution is a slow process. Getting rid of old bad habits is never easy. This article is a critique of the Make build tool. We list its shortcomings and we suggest a few more modern alternatives.
The Make build tool is with us for about 20 years. It is somewhat a sad thing to see today complex new projects considering Make as their only choice. I often get the question "What is wrong with Make?". So often that the obvious answer to the question becomes "First wrong thing is the people ignorance about Make limitations with respect to their requirements". As a matter of facts, I will start my critique as a list of wrong beliefs around this classic tool.
Several Make clones are open-source C software and all build platforms have a Make preinstalled. Make-based builds are known to be portable and they really are compared to some IDE solutions. The problem is that the Make tool, as it was originally implemented, has to rely on shell commands and on features of the file system. These two, the shell and the file system, are notorious sources of platform incompatibilities. By consequence the fact that every system has a Make is probably true but not relevant. The statement “Every system has a shell” is also true. That doesn't mean that shell scripts are portable and nobody claims that. Another problem is the fact that the original Make was lacking some fundamental features and later clones added the missing features in different ways. The obvious example is the if-then-else construct. It is not really possible to build real-life projects without if-then-else. One has to use workarounds. They are based on either the “include” directive or on the conditional macro expansion. In the best case you do get to the desired functionality but you lose the readability of the build description. Anyway, Make-based builds are not portable. They rely too much on features of the surrounding system and Make clones are incompatible with each other.
Many of us have the experience of expanding a given code base by adding a few source files or by adding a static library. In carefully designed builds, it is not a lot of work to make the new code part of the final product. You just add a short new Make-file in a new directory and rely on recursive call of Make. The problem comes when the project gets really big. In such big projects, keeping the build fast enough is a challenging task. Also changing the binary deployment of the product becomes a complex task, much more complex than what it needs to be. With typical recursive Make, the architecture of the software product is nailed down in the directory structure. Adding or removing one single piece is still OK but other restructuring is much more disruptive. A sad side effect of this is that, over time, developers come to think about the software not as a set of abstract components and their interfaces but as a set of source directories with source code and that builds up even more resistance to change. The speed issues in large builds are well documented in Peter Miller's seminal article: “Recursive Make Considered Harmful” (http://aegis.sourceforge.net/auug97.pdf). When using recursive Make, there is no simple way to guarantee that every component is visited only once during the build. Indeed, the dependency tree is split in memory between several processes that don't know very much about the others. This makes it hard to optimize for speed. Another problem is that Make uses just one namespace and recursion is never easy with only one global namespace. If you want to configure sub targets differently (meaning to pass parameters to Make processes started by Make), you'll get an error prone build set-up. But recursive Make is not the only way to implement Make. You may wonder, if it is harmful, why is it so widely used? One argument that people mention is that recursive Make is easier to set up. This is not a very solid argument. For example, John Graham Cumming demonstrates in a recent DDJ article (http://www.electric-cloud.com/resources/DDJCrossPlatformBuilds.pdf) a simple non-recursive Make system. There is also a more subtle argument in favor of recursive Make. The fact is that it allows developers to use their knowledge about the project to easily start only a smaller part of the build. In non-recursive systems, you can start a smaller part but the system will usually "collect" first all the build descriptions which is slower or has undesired side-effects or both. It is also possible to design and implement systems that are somewhere in between. Some homegrown Make-based systems use clever checks outside Make to avoid starting new Make processes when there is nothing to be built and then act recursively for the rest of the cases. Anyway, for good or bad reasons, the fact remains that Make-based builds scale up painfully.
Make tool paradigm is elegant and powerful. I have seen nice experts systems implemented only with Make-files. People like things that are elegant and powerful. Unfortunately, Make implementation is poor and dirty by today standards. Make-files syntax is obscure and, by way of consequence, more difficult then you may think. Make wins hands down the contest for the messiest namespace in all programming tools I have ever seen. File targets and command targets (also known as phony targets) share the same namespace making it close to impossible to build files with certain names. Also the environment shell variables and the Make macros share the same namespace. Make macros are covering both variable and function concepts (like in functional programming languages). Macros can be defined inside the Make-file or on the command line. All these make up for complex scope rules and for some bad surprises for both the novice user and the expert user. Make based builds heavily rely on shell environment variables and this has a lot of other shortcomings as well. The most known is the fact that the build is difficult to reproduce for another user on another build machine. A more subtle issue is that it is difficult to document the build. Most modern systems allow you to ask which are the parameters of the build and which are their meaningful values? Make doesn't help you to provide this feature despite the fact that Make-based builds definitely need it. Even without several sources for variables, one single namespace is still too messy. One namespace means that target platform-wide variables, build machine-wide variables, project-wide variables and individual user customizations are altogether next to each other. Without solid naming conventions, you certainly don't know what you can change without endangering the entire build. Another area of misleading simplicity is the fact that many Make macro's have one letter names. Many Make clones improved on that by introducing more understandable aliases for those names. Of course they did that in incompatible ways so that you cannot take profit of the improvement if you want to keep portability of the build description across Make clones.
Make and many of its clones are implemented using the C/C++ programming languages. They are speed-efficient implementations. Indeed, correctly designed Make based builds show little overhead while building. But, before you enjoy too much the raw speed of Make, you have to remember that fast is not safe and safe is not fast. Because of this fundamental contradiction, you should look suspicious to anyone claiming an amazing speed. Indeed, Make achieves some of its speed by performing only superficial checks. This kind of speed gain strikes back because you'll feel more often the need for complete builds from scratch. There is another way in which Make appears to be fast. Make expects you to know how to build a smaller part of the project that you are currently working on. This strikes back when more time is spent in the product integration across several developers.
Next to the characteristics described above, often considered strong points of Make, there are also a few characteristics that are acknowledged shortcomings of Make. Here is a short list of them.
Make-based builds are not safe and nobody claims that they are safe. The main reason for this problem is the fact that Make relies on time stamps and not on content signatures in order to detect file changes. In the local area networks, it is frequent that several computers having their own clock device are sharing the file system containing the sources for the build. When those clocks get out of sync (yes, that happens) you may get inaccurate builds, especially when running parallel builds. Moreover, Make takes the approach of not storing any stamp between builds. This is a very bad choice because it forces Make to use risky heuristics for change detection. This is how Make fails to detect that the file changed to another one older than previously (which happens quite often in virtual file systems). Not using content signatures is especially painful when an automatically generated header file is included in all the sources (like config.h in the GNU build system). Complex solutions involving dummy stamp files have been developed in order to prevent Make tool to rebuild the entire project when that central header file was regenerated with the same content as before. An even more insidious safety issue is the fact that Make does not detect changes in the environment variables that are used or in the binary executables of the tool chain that is used. This is usually compensated with proprietary logic in homegrown Make systems, which makes the build more complex and more fragile.
One cannot criticize the Make mechanism for implicit dependencies detection for a good reason: Make doesn't have such mechanism. In order to deal with the header file inclusion in C sources several separate tools exist as well as special options to some compilers. High-level tools wrapping Make use them in order to provide a more or less portable "automatic dependencies" feature. Despite the efforts and the good will of those higher-level tools, Make is blocking good solutions for automatic dependencies for the following two reasons:
- In the first place, Make doesn't really support dynamic many-to-one relationship. It does support many-to-one but not if the “many” part changes from one build to the next one. For example, Make will not detect if a new dependency has been added if that dependency is new in the list but old on disk (meaning older than the target according to its timestamp). By the way, make also lacks the support for dynamic one-to-many, which makes it in appropriate for Java builds (because with Java one single file can produce a variable number of outputs).
- Secondly, Make doesn't really support using automatic dependencies and updating those automatic dependencies in one run. This forces you to yield multiple Make calls for a complete build (did you ever wonder why the ubiquitous sequence "make depend; make; make install" has never been folded into just one Make call?).
The lack of portable if-then-else was already mentioned. There are many other idiosyncrasies in the Make-file syntax. Some of them have been fixed in later Make clones (in incompatible ways, as you may expect). Here is a list:
- First, space characters are significant in a painful way. For example, spaces at the end of the line (between a macro value and the comment sign following the macro definition) are kept. This always generated and still generates very hard to track bugs.
- Secondly, in the original Make there was no way to print out something at parsing time. Moreover, Make has a bad habit to silently ignore what it doesn't understand inside the Make-file. The combination of the two (no print and silent ignore) is a killer. It happened to me in the past to stare for a while at some Make-file code not working as expected just to discover that the incriminated code was silently and entirely ignored because of a typo somewhere above.
- In the third place, there are no "and"/"or" operators in logical expressions. This is forcing you to deeply nest the non-portable if-then-else or those portable but unreadable equivalent constructions. All this is annoying but it is far less serious compared to Make decision to rely on the shell syntax when describing the executable part of a rule. Decent build tools gained, over the years, libraries of rules to support various tool chains. Not so with the Make tool. Make has instead a hard coded set of rules rarely used in real-life sized projects. Actually, the make hard coded set of rules hampers accountability more than doing anything else. One of those rules may and sometimes do trigger by accident when the make file author doesn't intend that. Lack of good support for portable library of rules is, in my opinion, the biggest shortcoming of Make and its direct clones.
The inference process of Make may be elegant but its trace and debug features are dating back to the Stone Age. Most clones improved on that. Nevertheless, when the Make tool decides to rebuild something contrary to the user expectation, most users will find the time needed to understand that behavior not worth the effort (unless it yields an immediate fatal error when running the result of the build, of course). From my experience, I noticed that Make-file authors tend to forget the following:
- How rules are preferred when more than one rule can build the same target.
- How to inhibit and when to inhibit the default built in the rules.
- How the scope rules for macro work in general and in their own build set up. While not completely impossible, Make-based builds are tedious to track and debug. By way of consequence, Make-file authors will continue to spend too much time fixing their mistakes or, under high time pressure, they will just ignore all behavior that they don't understand.
You may argue that all the discussion about the syntax of Make-files is pointless today and that would be fair. Indeed, today many Make-files in existence are automatically generated from higher-level build descriptions and the manual maintenance burden is out. So, with these new level tools, can I forget about those Make problems and just use it? Yes and no. Yes for the syntax related problems, yes for the portability problems (to some extent) and definitely no for all the rest: reliability issues, debugging problematic builds, etc. see also the discussion on build systems below to understand how the shortcomings of Make are affecting the tools built on top of it.
The Make build tool was and still is a nice tool if not stretched beyond its limits. It fits best in projects of several dozen source files working in homogeneous environments (always the same tool chain, always the same target platform, etc.). But it cannot really keep up with today requirements of large projects.
After I tell people what is wrong with the Make tool, the next question is always the same: "if it is so awkward, then how comes that it is so widely used?". The answer does not pertain to technology; it pertains to economy. In short, the reason is that workarounds are cheaper than complete solutions. In order to displace something, the new thing has to be all that the old thing was and then some more (some more crucial features not just some more sugar). And then it has to be cheaper to top it. Despite the difficulty of being so much more, in my humble opinion, today the time of retreat has come for the Make tool. Let's look at the alternatives.
Almost like text editors, build tools are a large population. You can find many of them cataloged at the Free Software Foundation (http://directory.fsf.org/devel/build), at DMZ (http://dmoz.org/Computers/Software/Build_Management/Make_Tools) or at Google (http://directory.google.com/Top/Computers/Software/Build_Management/Make_Tools). You can also find many build tools by digging in well-known open-source repositories like SourceForge (http://sourceforge.net/softwaremap/trove_list.php?form_cat=46) or Tigris (http://construction.tigris.org/) but it is much more difficult to spot what you are looking for on those sites. I will discuss here a small selection. This selection is highly subjective. And the coverage is biased towards open source tools. I have grouped the tools in several categories to help you get a better overview.
The tools in this category are not really alternatives. They are what people currently understand as Make (virtually nobody is using today the original BSD UNIX Make). I mention three tools in this category: GNU Make, Opus Make and NMake. They all share some compatibility with the format of the original Make-files.
This tool is not widely used today but, in my opinion, Opus make is the best Make clone ever made. It has a few outstanding features that set it apart from the crowd. It has the richest set of directives allowed in the Make-file (including its own “cd”, “echo”, “copy”, “delete” and other frequent shell commands). Even more important, these directives can take effect at parsing time or at rule execution time. The latest makes up for more portable execute part in rules, greatly reducing the dependence on the shell underneath. Opus make always had logical operators in condition expressions, regular expression substitutions, possibility to trace the parsing, to stop on lines not understood and many more. With its “inference restart” command, it has support for one-pass build, a feature rarely seen around. Next to its comprehensive set of native features, Opus make also has a fair set of emulations for other Make tools. Unfortunately, its development seems to have stop back in 1998 (http://www.opussoftware.com/). A version of it is still distributed today with the IBM Rational ClearCase SCM tool.
GNU Make (http://www.gnu.org/software/make/make.html) is very likely the most widely used build tool. Being available as open-source and being an integral part of the GNU tool set made it a popular choice on many Unix-like platforms. This tool is actively maintained (last version, 3.80, came out in October 2002). The GNU Make is a vast improvement over the original Make, especially by its large set of new macros. Over the last 10 years it gained some important features (that should have been available from the beginning): print out at parsing time, stop parsing and exit, force the build, define new function-like macros, case insensitive file name comparisons on some platforms, etc. Unfortunately GNU Make also kept most of the problems of the original Make and much of the criticism that you'll hear around for Make is actually directed to the GNU Make.
NMake (http://www.research.att.com/~gsf/nmake/nmake.html) originates at and is maintained by the AT&T laboratories. The large set of open source tools coming from AT&T (Korn shell, graphviz, etc.) is built with it. NMake is open source itself. There is a commercial version from Lucent as well (http://www.bell-labs.com/project/nmake). The AT&T style source distribution packages rely on NMake and their own configuration tool, iffe. This build system from AT&T is very much Unix-centric, exactly like the GNU build system. Note that Microsoft also has a Make-clone called “nmake” bundled with their development environments (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcug98/html/asug_overview.3a.nmake_reference.asp). Take care, they are incompatible; don’t expect a Make-file wrote for one to work OK with the other.
The tools in this category move away from the old syntax of Make-files but didn't break with the Make tool spirit. That is, they are still developed in C or C++ and they are still using some kind of text file located close to the sources to describe what has to be built. I will mention 2 tools in this category: Jam and Cook.
Jam is a build tool maintained and promoted by the people providing the perforce SCM tool (http://www.perforce.com/jam/jam.html). Perforce is commercial software but Jam is fully open-source. Despite some small issues with its syntax, Jam files are expressive enough for the Jam tool to come standard with a decent database of rules. Jam was so influent that it generated a set of clones (FTJam, Boost.Jam, Boost.Jam.v2). The most interesting is Boost.Jam (http://www.boost.org/tools/build/v1/build_system.htm) from the maintainers of the Boost C++ library. This one introduces quite some syntax extensions compared to the original tool. You can tell by those extensions that the authors were C++ programmers (for example, the scoping of variables looks like C++ namespaces). Boost.Jam is not the only or the first attempt to raise the level of the build description but, more than other Make evolutions, Boost.Jam focus on real build issues. For example, it provides canned variant builds, it provides dynamic loading library abstraction for Unix and MS Windows target platforms, it cares about testing the result of the build, etc. If you ever used Make for a real-life sized project, you had to provide such things by your own effort and you will immediately understand what savings this approach of Boost.Jam brings. Unfortunately, Jam doesn't take the jump to content signatures and to command line signatures. The problems related to the weak timestamp heuristic are still present.
Cook (http://miller.emu.id.au/pmiller/software/cook/) is a build tool designed by Peter Miller with quite a long history. Like Jam, cook is open-source and support parallel builds. Like Jam, it avoids recursion. Like GNU Make and unlike Jam, it relies on a separate tool, c_incl, to get implicit dependencies in C code. The build description syntax used by Cook take some of Make-file and quite some of LISP. As a matter of facts, quite some build tools chose a LISP-like syntax. There is a good reason for that: manipulating lists (i.e. lists of dependencies) is a frequent task when describing a build. Cook can use content signatures ("fingerprints" in its parlance).
I cannot express a clear preference in this category. Boost.Jam and Cook come close. But I can say that I prefer any of them over the GNU Make. The reason is that they allow you to focus on build design and forget some of the low-level, build implementation details. Unfortunately, they didn't encounter the wide spread that they deserve. May be because they are not part of a more comprehensive build system like GNU Make is.
You may argue that my preference does not make a lot of sense. Indeed, both Cook and Jam (the original) rely on the GNU build system to get themselves built which in turn relies on GNU Make. So it is not a matter of preference, you'll have to have GNU Make. This gives me the opportunity to introduce a thorny issue: the bootstrapping. The question is how do you build when you don't have the build tool, in particular how do you build the build tool itself? One solution is to cross compile and then distribute binaries for the new platform. Another solution is to go back to shell scripts for the build description of the build tool itself (like Boost.Jam does). Or you can just rely on "good old GNU Make" like many self-proclaimed modern build tools eventually do.
Some build tools are able to generate automatically shell scripts from their native build descriptions. Of course, the generated scripts don't have the full functionality of the build tool: they are usually able to do a build from scratch and nothing more. And they usually require manual set up of the shell environment. Nevertheless, those generated shell build scripts help a lot with the bootstrapping issue.
The build tool is always just a piece in the larger software development system. Let us define more precisely the terms used below. The build tool reads a build description and then directly starts other tools to actually produce a result of the build: a report, a compiled file, etc. A build system is a set of tools, including a build tool (by the way, a good build system will support several competing build tools). A build system may produce or adapt the build descriptions before they are used by the build tool. A build system may manage in some way the build result (for example it may post on a web site the summary of an automatic build). Next to the build tool, the other important tool in a build system is the configuration tool. We call the configuration tool that one that adapts or generates automatically the build descriptions used by the build tool.
To understand the paradigm shift introduced by the build systems, it is important to first know the new requirements addressed by build systems: software distribution as a package of sources. The typical usage scenario for a build tool is C source code development. There the genuine input changes frequently and locally. Avoidance is a crucial feature in this use case. How building works elsewhere is less important. Also the addition of new components to a build has to be easy. By contrast, for software distribution, the crucial need is the possibility to customize the software. It has to be adapted to new build platforms, to new run-time platforms, to site-wide conventions, etc. The builds are far less frequent; avoidance is not really a requirement, for example. The difference in requirements had an influence on the evolutions of the build tools (some being used mostly from within build systems and some being used mostly stand alone).
The key advance with build systems is the fact that they put some executable code in the source code distribution of the software to be built. We name the executable part in the source distribution the configuration tool. This changes the nature of the source code distribution; now it is more like an installer package. "Installer" is a term stolen from binary distributions (For software distributed in binary form, it is a long-time established practice that the package has to "execute" on the system that receives the software distribution). Because now you are supposed to execute some program delivered with the sources before you start the build, the source distribution gets a chance to adapt itself to the current build machine. This is what makes build systems so helpful with source code portability. The configuration tool may change build descriptions or source code or both. Sometimes the configuration tool evolved in a complex interactive tool (see the several tools available to configure the Linux kernel compilation).
The configuration tool inspects the build platform in small steps, individual checks for one feature. Hereafter, we will call these steps probes. How results of probes are represented differs from one tool to another but most of them have a caching mechanism in place so that you don't need to run a probe each time you need the result of that probe. Probes can have various granularities, can be independent or can be dependent on the result of other probes. Roughly speaking, the available set of probes is the way a configuration tool represents its knowledge about the platform it needs to adapt to. And the result of the entire set of probes defines de facto a model for a given platform. Which hunts on the planes of operating systems (OS), with more or less support from those OSes maintainers. Given the diversity of platforms to cover (yes, the world is a mess and we cannot change that by tomorrow), the configuration tools in general take on a tremendous job. This unavoidably generates frustation in different places and at different levels. Try to be positive and not judge a configuration tool only by its failures in some cases.
Before we look to some more Make alternatives, let us first compare a few build systems.
This is the system of choice in the open-source community. It is based on the GNU Make as build tool. It has a few remarkable features that greatly help with software portability (http://sources.redhat.com/autobook). The configuration tool of GBS is a shell script named "configure". That script inspects the build platform and then produces build descriptions for GNU Make as well as C source code (actually one central header file, to be included by the C source files).
What makes GBS so successful? Certainly not the fact that it was the first attempt or the only attempt. Long before it, the IMake (http://www.oreilly.com/catalog/imake2/index.html) tool introduced higher-level build descriptions. IMake comes from the C code base of the XWindows system. IMake higher-level make files made the build description both more portable and easier to write. Yet IMake was never as successful as GBS. It is probably due to the way the knowledge about the platform was implemented for the IMake tool.
The configuration tool of GBS, meaning the "configure" shell script, is automatically generated by a tool named "autoconf" (http://www.gnu.org/software/autoconf/manual/autoconf.pdf). As you see, autoconf is not a build tool, it is a script compiler. In my opinion, autoconf is the most valuable part of GBS and it is largely responsible for GBS success. The script produced by autoconf is made very portable (it doesn't have heavy requirements on the underlying system). This makes that, unlike with IMake, with GBS you have a fair chance to successfully build on a platform that was never seen by the author of the software you build.
Equally important, the authors of GBS were the first to seriously consider the bootstrapping issue. Indeed, it is possible to use GBS to build parts of GBS. After a few iterations, you'll get a complete new version of GBS. It is not an easy process but at least it is possible. You may wonder "What is the big deal? I can already use Make to build Make!". The big difference is that, in order to build Make with Make, a version of Make has to be installed on your build machine first. By contrast, you can build autoconf with GBS on a build machine where autoconf was never installed before.
If GBS is so smart then why is not everybody using it? One reason is that GBS is very UNIX centric and C/C++ dedicated. And even if you build C programs on UNIX, GBS only works effectively if your source code complies with the GNU coding guidelines. This is a showstopper if you have a large body of code with its own coding guidelines (for example not using the central header file config.h but some other code, may be automatically generated as well).
Another reason why GBS is not used everywhere is some of the price it pays for its portability. For example, the probes of autoconf are encoded as M4 macros. M4 is a powerful text processor but its syntax is pretty low level and difficult. Difficult enough to make this a major obstacle for extending the reach of GBS. While being a god gift for less capable platforms, GBS has a much harder time to win the favor of developers on the mainstream platforms. People simply don't want to give up their convenience (when writing probes) for the sake of some obscure platform nobody heard of. Yes, I know this is not a technical argument but the world is so that non-technical arguments are sometimes more important (http://osx.freshmeat.net/articles/view/889).
Another thorny issue is the weak support for the embedded software development and the associated issues. In the embedded world, everybody is cross-compiling; the build platform and the run-time platform are separate and very different in nature. In that case, automatic inspection of the build platform before the build will not get you very far. In the embedded world, some kind of database holding the characteristics of the different platforms, turns out better. The database is painful to maintain but is the only solution that works. Autoconf has later on added some features to support cross-compilation.
Last but not least, one issue with GBS is the fact that it uses GNU Make and nothing else than GNU Make as a build tool. If you didn't get my point by now, that means that you will frequently have inconsistent builds, that you will have a hard time to debug an inconsistent or a broken build, that you will not be able to build a program called "clean" or "install", etc. All these issues with GNU Make undermine the important achievements of autoconf.
The need to configure the sources before compilation is quite old and solutions have grown in time in almost all large C/C++ code bases, many being ad-hoc solutions. Some code bases have been migrated to GBS, some kept their own because of some isolated advantages or just by lack of resources for migration. Many build tools evolved to provide some form of source configuration either as an add-on or as a part of the same tool. For example, Boost.Jam has its "feature normalization", SCons has its Configure, etc. I would like to mention here 2 stand alone solutions used with Make, iffe and metaconfig.
If you are familiar with the AT&T labs open source software you met iffe (http://www.research.att.com/~gsf/man/man1/iffe.html). This is the configuration tool of their source distribution packages. Like configure in GBS, iffe is a shell script and is distributed with the C source code. Unlike, configure, iffe is not generated. Another important difference is that iffe doesn't focus on build descriptions. It generates only source files (precisely it generates header files that your C files are supposed to include). It does that by processing input files named feature test files. The feature test files are written in a specific language that is interpreted by iffe to generate header files. Because in one configuration process several headers are generated according to your decisions to group probes in feature test files, iffe author claims that their system is more flexible than GBS and they are probably right. In the end, the fundamental mechanism to adapt the source code is the same: conditional compilation with the help of the C preprocessor.
If you are familiar with the Perl (http://www.perl.org) software source distribution you met metaconfig and the dist package (http://packages.debian.org/testing/devel/dist). Metaconfig is a shell script compiler, older than autoconf, and the configuration tool it generates is called Configure. Unlike the one generated by autoconf, Configure is mainly an interactive tool. The probes used by metaconfig are called units. These units are shell code snippets. Metaconfig was probably the very first tool to do a decent job scanning the source code base to automatically detect points of customization. By comparison, autoconf still has to rely on a helper tool, autoscan, to do this with more or less success. Automatic scanning of source code is great but it requires complying consistently with some coding guidelines decided by the configuration tool. That may not be the case for existing code.
There are several other alternatives to GBS available today, both commercial products and open-source programs. I would like to mention two of them: CMake (http://www.cmake.org) and Qmake (http://doc.trolltech.com/3.0/qmake.html). Do not to be misled by their name, they are no replacement for the Make tool, they are Make-file generators. Also do not confuse the Qmake tool from TrollTech with the qmake tool from the Sun GridEngine (http://gridengine.sunsource.net/unbranded-source/browse/checkout/gridengine/source/3rdparty/qmake/qmake.html) which is just a parallel GNU Make.
Like for GBS, the main issue addressed by these tools is the portability of the build description. CMake and Qmake have a lot in common in their design and in their spirit. They are both open-source and they are both implemented in C++ (Qmake had a predecessor, tmake, implemented in Perl). They both support more than one build tool. Most notably, they support the classic Make as well as the Microsoft IDE. They have to provide a dynamic linking library abstraction to cover both UNIX .so files and Microsoft .DLL files. They are both introducing their own high-level format for the build description. They both take basically the same approach to hierarchical builds as the original Make. Not betraying their high-level nature, both tools provide support for generating also some source code (wrappers or mock objects). My preference goes to the CMake tool. It supports a wider set of build tools and it offers a choice among command-line and graphical user interfaces.
For both these tools, you have to distribute the binary executable of the tool with the source code package and this raises a bootstrapping issue. Also, in my opinion, having the knowledge embedded in C++ classes as in CMake is a shortcoming. I certainly agree that C++ classes are much better then M4 macros, yet C++ will limit the number of contributions from the community . People who can write good C++ and want to support new toolchains are required to contribute back their changes in a consistent and systematic way. Otherwise, over time, we will get several incompatible versions floating around for the executable of the tool. Storing the knowledge about the toolchains in some kind of configuration files so that additions doesn't require C++ recompilation may out weight the disadvantage of a more complex distribution of the tool (a set of files to distribute instead of a monolithic executable). Finally, like GBS, these two build systems share the disadvantages of the build tools they use. As already mentioned, that is mainly poor checking that results in high chances to get inconsistent builds during the active development of the software.
My point out of the previous chapter is that, despite what some people hoped, a good build system will not spare you the need for a good build tool. It is certainly helpful but if the build tool is weak the build system would have a hard time to hide it. Fortunately, the community of developers was not only busy making smarter build systems but also better build tools. In the following I would describe another category of build tools. I would call this category "script based" build tools but it is not easy to find a name that will adequately describe them in two words. It is important to understand the new paradigm shift introduced by this category of tools. The approach taken by their authors says "Let's not invent a new syntax. We are not in the business of writing text parsers. Others already did that and they did it better then we will do it in a fraction of our time". They also say "Let's not write and maintain the portability layer for our build tool. Let us reuse some virtual machine already available". So these people focus on the build design and on build design related issues. This seems to me a sound choice when one aims to provide a better build tool. I will mention only two tools in this category: Ant and SCons. But you have to know that there are many others.
The Ant build tool (http://ant.apache.org/manual/index.html) is well on the way to become the next Make. It is an open-source project from the Apache developer community. Ant is a Java program and it uses XML for the build descriptions. It spread quite fast because it faced no competition (certainly not from the old Make). Too much Java centric at the beginning, Ant has grown today into a mature build tool that can put to shame many of its competitors. One sign that a piece of software is mature and fulfills a real need is the number of new projects that are taking that software as a base. And there are many software projects based on Ant. There are:
- commercial products like OpenMake (http://www.openmake.com/products.html),
- extensions to address new builds (set of new tasks in Ant parlance, like http://ant-contrib.sourceforge.net which includes support for C/C++ builds),
- extensions to allow higher level descriptions (antworks, previously centipede(http://antworks.sourceforge.net)),
- extensions to provide integration in graphical IDEs, etc. As a matter of facts, no other build tool is gathering together so much development efforts today. This gives an important head start to Ant in the race for the next Make.
Like the original Make, Ant was the primary source of inspiration for a set of new build tools like Nant (http://nant.sourceforge.net) and, more recently, MSBuild (http://channel9.msdn.com/wiki/default.aspx/MSBuild.HomePage). They don't aim at file compatibility for build descriptions but their spirit is the same: describe the build as a set of tasks to carry out, each task coded as an XML element.
Of course, Ant is no silver bullet. Ant without extensions has a scalability problem. A lot of it comes from the choice of XML as a file format. XML is verbose. XML does not have a convenient way to include a file in another file (the XML Fragments standard is not really supported by XML parsers, XML entities includes work but have limitations, etc.). You need some kind of inclusion because you want to factor out common parts in several build descriptions. Of course, an application using XML files is free to add "include" semantics to one chosen XML element (and that is what Ant finally did with the "import" task in version 1.6).
Also XML is not really a programming language. You may want to have the content of one XML element computed from sibling elements (computed strings occurs so often in build descriptions, see Make macros). Or you may want one element to be taken from the parent or the parent if not found at some place (what would correspond to variable scoping in programming languages). I say "you may want" but for projects of real-life size you will need that. Otherwise the build description grows very big and redundant. This is not a hard limitation; the Ant tool does give you the power to express whatever you want. It is just the fact that you have to go back and forth between a purely descriptive part of the build specification (the build.xml) and a purely procedural part (the Ant tasks implementation) that I don't like very much. People contributed Ant tasks like but this will not turn XML into a programming language overnight.
Another issue is that Ant has a noticeable overhead for startup and initial XML processing. This is an important psychological factor in the case of null builds (builds where everything is up-to-date and nothing has to be done). The Make tool was fast in such builds. Make was fast for a bad reason but the fact remains that people have a philosophical problem today to accept that the build is slow when nothing is to be done. Like for other build tools in this category, it would be nice to have the possibility to "compile" the build description, may be the entire dependency tree, into a speed-efficient format.
SCons (http://www.scons.org) is a build tool that uses Python (http://www.python.org) for the syntax of the build descriptions. The origin of SCons is Cons, a similar tool using Perl syntax for the build descriptions. That older tool is also available (http://www.dsmit.com/cons). SCons is my preferred tool in this category and also my preferred build tool overall. It comes preconfigured with a fair set of rules ("builders" in SCons parlance). Even more compelling, it can be extended in an easy and natural way. Unlike Ant, the build description and the extensions use the same syntax, in this case Python. SCons allows flexibility in the compromise speed versus safety (by using either content signatures or time stamps). It detects changes on the command line used to build something and this makes it outstandingly reliable compared to its competitors. There are many other features I like in SCons: the transparent use of code repositories, the wink in from cached binaries, the possibility to accurately document the build, the support for extraction of snapshots of the code and, finally, the promise to grow in a better build system (autoconf-style probes, ability to generate projects files for IDE, etc.).
Because perfection doesn't exist in the real world, you probably expect to hear also about the shortcomings of SCons. To start with, SCons is a young piece of software, at least when compared with some other tools mentioned above. Another issue is that, despite not being recursive, SCons is slow. This seems to be a price to pay when focusing on high-level build design issues. A time will come when the authors of SCons will focus on speed optimizations. Like with Perl and Java, Python allows moving later on the speed-critical parts to a native compiled code implementation (when the software is stable enough).
From another perspective, despite the fact that Python has a clear syntax, some people perceive SCons as being too low level. It is perfectly possible for the release engineer to describe the build of a 50-developer project for several target platforms in only a few dozen lines of SCons code. The release engineer will find this a great time saver but the average developer will find that code cryptic, like shell code in the early UNIX days or worse. The developers will soon ask for easier ways to add new components to the build.
An overview of build tools in one article has to be a limited one. We didn't even mention all the area where build tools needed and got improvements.
One such area is better integration with the SCM tools. This is important because it addresses directly a long-standing issue: the reliability of the build. There are quite some SCM tools providing improved build tools, both commercial (Omake in IBM Rational ClearCase, http://www.agsrhichome.bnl.gov/Controls/doc/ClearCaseEnv/2003.06/ccase/doc/win/cc_omake/wwhelp/wwhimpl/js/html/wwhelp.htm) and open-source software (Vesta, http://www.vestasys.org). But comparing the build tools integrated with the SCM systems goes far out of the scope of this discussion.
Another area we didn't touch is the automation and reporting of the build (CruiseControl (http://cruisecontrol.sourceforge.net) and Dart (http://public.kitware.com/Dart) as open source tools, VisualBuilder (http://www.visualbuild.com) and FinalBuilder (http://www.finalbuilder.com) as MS Windows commercial software). Yet another area not discussed here is the acceleration of the build when parallel build machines are available (distcc and ccache (http://distcc.samba.org) as open source tools, IncrediBuild (http://www.xoreax.com) and ElectricCloud (http://www.electric-cloud.com) as commercial software.
Finally, there is one last area that we don't want to discuss here: the software distribution tools. Many tools today allow you to connect as a client to some software repository, download the software you want, download automatically other needed parts and install everything on your system while building locally parts that need building. Those systems include a build tool or act as a build tool when needed. They have to act as a configuration tool too. Also, they have to do some form of dependency tracking, much like build tools do. The best known is probably Ports of the BSD OS (http://www.freebsd.org/ports) and its followers like Gentoo Portage (http://www.gentoo.org/index-about.html). Take a look at the A-A-P tool web site (http://www.a-a-p.org) for a list of such systems.
The final conclusion is up to you. The considerations in this paper reflect only my experience. But I hope that you have now some useful background to choose the build tool and the build system that best fit your own requirements.
- -- AdrianNeagu, April 2005