diff --git a/r-admin/Configuration-on-a-Unix-alike.html b/r-admin/Configuration-on-a-Unix-alike.html index a224ccb..7eabce6 100644 --- a/r-admin/Configuration-on-a-Unix-alike.html +++ b/r-admin/Configuration-on-a-Unix-alike.html @@ -528,7 +528,7 @@
There are several files that are part of the R sources but can be re-generated from their own sources by configuring with option --enable-maintainer-mode
and then running make
in the build directory. This requires other tools to be installed, discussed in the rest of this section.
File configure
is created from configure.ac
and the files under m4
by autoconf
and aclocal
(part of the automake package). There is a formal version requirement on autoconf
of 2.69 or later, but it is unlikely that anything other than the most recent versions2 have been thoroughly tested.
2 at the time of revision of this para in late 2021, autoconf-2.71 and automake-1.16.5.
File configure
is created from configure.ac
and the files under m4
by autoconf
and aclocal
(part of the automake package). There is a formal version requirement on autoconf
of 2.71 or later, but it is unlikely that anything other than the most recent versions2 have been thoroughly tested.
2 at the time of revision of this para in late 2021, autoconf-2.71 and automake-1.16.5. Subsequently autoconf-2.72 has been tested.
File src/include/config.h
is created by autoheader
(part of autoconf).
Grammar files *.y
are converted to C sources by an implementation of yacc
, usually bison -y
: these are found in src/main
and src/library/tools/src
. It is known that earlier versions of bison
generate code which reads (and in some cases writes) outside array bounds: bison
2.6.1 was found to be satisfactory.
The ultimate sources for package compiler are in its noweb
directory. To re-create the sources from src/library/compiler/noweb/compiler.nw
, the command notangle
is required. Some Linux distributions include this command in package noweb. It can also be installed from the sources at https://www.cs.tufts.edu/~nr/noweb/3. The package sources are only re-created even in maintainer mode if src/library/compiler/noweb/compiler.nw
has been updated.
3 The links there have proved difficult to access, in which case grab the copy made available at https://developer.r-project.org/noweb-2.11b.tgz.
Some enhanced BLASes are compiler-system-specific (Accelerate
on macOS, sunperf
on Solaris20, libessl
on IBM). The correct incantation for these is often found via --with-blas
with no value on the appropriate platforms.
20 Using the Oracle Developer Studio cc
and f95
compilers
Note that under Unix (but not under Windows) if R is compiled against a non-default BLAS and --enable-BLAS-shlib
is not used (it is the default on all platforms except AIX), then all BLAS-using packages must also be. So if R is re-built to use an enhanced BLAS then packages such as quantreg will need to be re-installed.
Debian/Ubuntu systems provide a system-specific way to switch the BLAS in use: Build R with --with-blas
to select the OS version of the reference BLAS, and then use update-alternatives
to switch between the available BLAS libraries. See https://wiki.debian.org/DebianScience/LinearAlgebraLibraries.
Fedora 33 and later offer ‘FlexiBLAS’, a similar mechanism for switching the BLAS in use (https://www.mpi-magdeburg.mpg.de/projects/flexiblas). However, rather than overriding libblas
, this requires configuring R with option --with-blas=flexiblas
. ‘Backend’ wrappers are available for the reference BLAS, ATLAS and serial, threaded and OpenMP builds of OpenBLAS and BLIS. This can be controlled from a running R session by package flexiblas.
Fedora 33 and later offer ‘FlexiBLAS’, a similar mechanism for switching the BLAS in use (https://www.mpi-magdeburg.mpg.de/projects/flexiblas). However, rather than overriding libblas
, this requires configuring R with option --with-blas=flexiblas
. ‘Backend’ wrappers are available for the reference BLAS, ATLAS and serial, threaded and OpenMP builds of OpenBLAS and BLIS, and perhaps others21. This can be controlled from a running R session by package flexiblas.
21 for example, Intel MKL not packaged by Fedora.
BLAS implementations which use parallel computations can be non-deterministic: this is known for ATLAS.
x86_64
Fedora where a path needs to be specified,
--with-blas="-L/usr/lib64/atlas -lsatlas"
--with-blas="-L/usr/lib64/atlas -ltatlas"
Distributed ATLAS libraries cannot be tuned to your machine and so are a compromise: for example Fedora tunes21 x86_64
RPMs for CPUs with SSE3 extensions, and separate RPMs may be available for specific CPU families.
21 The only way to see exactly which CPUs the distributed libraries have been tuned for is to read the atlas.spec
file.
Distributed ATLAS libraries cannot be tuned to your machine and so are a compromise: for example Fedora tunes22 x86_64
RPMs for CPUs with SSE3 extensions, and separate RPMs may be available for specific CPU families.
22 The only way to see exactly which CPUs the distributed libraries have been tuned for is to read the atlas.spec
file.
Note that building R on Linux against distributed shared libraries may need -devel
or -dev
packages installed.
Linking against multiple static libraries requires one of
--with-blas="-lf77blas -latlas"
--with-blas="-lptf77blas -lpthread -latlas"
--with-blas="-L/path/to/ATLAS/libs -lf77blas -latlas"
--with-blas="-L/path/to/ATLAS/libs -lptf77blas -lpthread -latlas"
Consult its installation guide22 for how to build ATLAS as a shared library or as a static library with position-independent code (on platforms where that matters).
According to the ATLAS FAQ23 the maximum number of threads used by multi-threaded ATLAS is set at compile time. Also, the author advises against using multi-threaded ATLAS on hyperthreaded CPUs without restricting affinities at compile-time to one virtual core per physical CPU. (For the Fedora libraries the compile-time flag specifies 4 threads.)
Consult its installation guide23 for how to build ATLAS as a shared library or as a static library with position-independent code (on platforms where that matters).
According to the ATLAS FAQ24 the maximum number of threads used by multi-threaded ATLAS is set at compile time. Also, the author advises against using multi-threaded ATLAS on hyperthreaded CPUs without restricting affinities at compile-time to one virtual core per physical CPU. (For the Fedora libraries the compile-time flag specifies 4 threads.)
--with-blas="openblas"
See see Shared BLAS for an alternative (and in many ways preferable) way to use them.
-Some platforms provide multiple builds of OpenBLAS: for example Fedora has RPMs24
24 (and more, e.g. for 64-bit ints and static versions).
Some platforms provide multiple builds of OpenBLAS: for example Fedora has RPMs25
25 (and more, e.g. for 64-bit ints and static versions).
openblas-threads
openblas-openmp openblas
For Intel processors (and perhaps others) and some distributions of Linux, there is Intel’s Math Kernel Library25. You are encouraged to read the documentation which is installed with the library before attempting to link to MKL. This includes a ‘link line advisor’ which will suggest appropriate incantations: its use is recommended. Or see https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.vpt6qp (which at the time of writing selected the Intel library for linking with GCC).
25 Nowadays known as ‘Intel oneAPI Math Kernel Library’ or even ‘oneMKL’.
There are also versions of MKL for macOS26 and Windows, but when these have been tried they did not work with the default compilers used for R on those platforms.
26 The issue for macOS has been the use of double-complex routines.
For Intel processors (and perhaps others) and some distributions of Linux, there is Intel’s Math Kernel Library26. You are encouraged to read the documentation which is installed with the library before attempting to link to MKL. This includes a ‘link line advisor’ which will suggest appropriate incantations: its use is recommended. Or see https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.vpt6qp (which at the time of writing selected the Intel library for linking with GCC).
26 Nowadays known as ‘Intel oneAPI Math Kernel Library’ or even ‘oneMKL’.
There are also versions of MKL for macOS27 and Windows, but when these have been tried they did not work with the default compilers used for R on those platforms.
27 The issue for macOS has been the use of double-complex routines.
The following examples have been used with MKL versions 10.3 to 2023.2.0, for GCC compilers on x86_64
CPUs. (See also Intel compilers.)
To use a sequential version of MKL we used
=/path/to/intel_mkl/mkl/lib/intel64
@@ -560,7 +560,7 @@ MKL_LIB_PATH
A.3.6 LAPACK
If when configuring R a system LAPACK library is found of version 3.10.0 or later (and does not contain BLAS routines) it will be used instead of compiling the LAPACK code in the package sources. This can be prevented by configuring R with --without-lapack
. Using a static liblapack.a
is not supported.
-It is assumed that -llapack
is the reference LAPACK library but on Debian/Ubuntu it can be switched, including after R is installed. On such a platform it is better to use --without-lapack
or --with-blas --with-lapack
(see below) explicitly. The known examples27 of a non-reference LAPACK library found at installation all contain BLAS routines so are not used by a default configure
run.
27 ATLAS, OpenBLAS and Accelerate.
+It is assumed that -llapack
is the reference LAPACK library but on Debian/Ubuntu it can be switched, including after R is installed. On such a platform it is better to use --without-lapack
or --with-blas --with-lapack
(see below) explicitly. The known examples28 of a non-reference LAPACK library found at installation all contain BLAS routines so are not used by a default configure
run.
28 ATLAS, OpenBLAS and Accelerate.
Provision is made for specifying an external LAPACK library with option --with-lapack
, principally to cope with BLAS libraries which contain a copy of LAPACK (such as Accelerate
on macOS and some builds of ATLAS, FlexiBLAS, MKL and OpenBLAS on ix86
/x86_64
Linux). At least LAPACK version 3.2 is required. This can only be done if --with-blas
has been used.
However, the likely performance gains are thought to be small (and may be negative). The default is not to search for a suitable LAPACK library, and this is definitely not recommended. You can specify a specific LAPACK library or a search for a generic library by the configuration option --with-lapack
without a value. The default for --with-lapack
is to check the BLAS library (for function DPSTRF
) and then look for an external library -llapack
. Sites searching for the fastest possible linear algebra may want to build a LAPACK library using the ATLAS-optimized subset of LAPACK. Similarly, OpenBLAS can be built to contain an optimized subset of LAPACK or a full LAPACK (the latter seeming to be the default).
A value for --with-lapack
can be set via the environment variable LAPACK_LIBS
, but this will only be used if --with-lapack
is specified and the BLAS library does not contain LAPACK.
@@ -569,6 +569,21 @@ with an ‘enhanced’ BLAS such as ATLAS, FlexiBLAS, MKL or OpenBLAS which contains a full LAPACK (to avoid possible conflicts), or
on Debian/Ubuntu systems to select the system liblapack
which can be switched by the ‘alternatives’ mechanism.
+If building LAPACK from its Netlib sources, be aware that make
with its supplied Makefile
will make a staiic library and R requires a shared/dynamic one. To get one, use cmake
as doxumented briefly in README.md
. Something like (to build only the double and double complex subroutines with 32-bit array indices),
+
+ mkdir build
+ cd build
+ cmake \-DCMAKE_INSTALL_PREFIX=/where/you/want/to/install \
+-DCMAKE_BUILD_TYPE:STRING=Release \
+-DBUILD_DEPRECATED=ON -DBUILD_SHARED_LIBS=ON \
+-DBUILD_INDEX64_EXT_API:BOOL=OFF \
+-DBUILD_SINGLE:BOOL=OFF -DBUILD_COMPLEX:BOOL=OFF \
+-DLAPACKE=OFF -DCBLAS=OFF \
+-S ..
+-j10 make
+This builds the reference BLAS and the reference LAPACK linked to it.
+Note that cmake
files do not provide an uninstall
target, but build/install_manifest.txt
is a list of the files installed, so you can remove them via shell commands or from R.
+If using --with-lapack
to get a generic LAPACK (or allowing the default to select one), consider also using --with-blas
(with a path if an enhanced BLAS is installed).
A.3.7 Caveats
@@ -577,20 +592,20 @@ BLAS and LAPACK libraries built with recent versions of gfortran
require calls from C/C++ to handle ‘hidden’ character lengths — R itself does so but many packages used not to and some have segfaulted. This was largely circumvented by using the Fortran flag -fno-optimize-sibling-calls
(formerly set by configure
if it detected gfortran
7 or later): however use of the R headers which include those character-length arguments is no longer optional in packages.
LAPACK 3.9.0 (and probably earlier) had a bug in which the DCOMBSSQ subroutine may cause NA to be interpreted as zero. This is fixed in the R 3.6.3 and later sources, but if you use an external LAPACK, you may need to fix it there. (The bug was corrected in 3.9.1 and the routine removed in 3.10.1.)
The code (in dlapack.f
) should read
-* ..
-* .. Executable Statements ..
-*
-IF( V1( 1 ).GE.V2( 1 ) ) THEN
- IF( V1( 1 ).NE.ZERO ) THEN
- V1( 2 ) = V1( 2 ) + ( V2( 1 ) / V1( 1 ) )**2 * V2( 2 )
-
- ELSEV1( 2 ) = V1( 2 ) + V2( 2 )
-
- END IF
- ELSEV1( 2 ) = V2( 2 ) + ( V1( 1 ) / V2( 1 ) )**2 * V1( 2 )
- V1( 1 ) = V2( 1 )
-
- END IF RETURN
+* ..
+* .. Executable Statements ..
+*
+IF( V1( 1 ).GE.V2( 1 ) ) THEN
+ IF( V1( 1 ).NE.ZERO ) THEN
+ V1( 2 ) = V1( 2 ) + ( V2( 1 ) / V1( 1 ) )**2 * V2( 2 )
+
+ ELSEV1( 2 ) = V1( 2 ) + V2( 2 )
+
+ END IF
+ ELSEV1( 2 ) = V2( 2 ) + ( V1( 1 ) / V2( 1 ) )**2 * V1( 2 )
+ V1( 1 ) = V2( 1 )
+
+ END IF RETURN
(The inner ELSE clause was missing in LAPACK 3.9.0.)
If you do use an external LAPACK, be aware of potential problems with other bugs in the LAPACK sources (or in the posted corrections to those sources), seen several times in Linux distributions over the years. We have even seen distributions with missing LAPACK routines from their liblapack
.
We rely on limited support in LAPACK for matrices with 2^{31} or more elements: it is possible that an external LAPACK will not have that support.
diff --git a/r-admin/Installing-R-under-Unix-alikes.html b/r-admin/Installing-R-under-Unix-alikes.html
index ca39582..be05799 100644
--- a/r-admin/Installing-R-under-Unix-alikes.html
+++ b/r-admin/Installing-R-under-Unix-alikes.html
@@ -601,7 +601,7 @@ <
++20 gnu++20 (from 10) c++23 gnu++23 c++2b gnu++2b (from 11)
c: c++11 gnu+11 c++14 gnu++14 c++17 gnu++17
Intel++20 gnu++20 (from 2021.1) c++2b gnu++2b (from 2022.2) c
(Those for LLVM clang++
are documented at https://clang.llvm.org/cxx_status.html, and follow g++
: -std=c++20
is supported from Clang 10, -std=c++2b
from Clang 13. Apple Clang supports -std=c++2b
from 13.1.6.)
(Those for LLVM clang++
are documented at https://clang.llvm.org/cxx_status.html, and follow g++
: -std=c++20
is supported from Clang 10, -std=c++2b
from Clang 13 and -std=c++23
from Clang 17. Apple Clang supports -std=c++2b
from 13.1.6: version 15.0.0 does not support -std=c++23
.)
‘Standards’ for g++
starting with gnu
enable ‘GNU extensions’: what those are is hard to track down.
For the use of C++11 and later in R packages see the ‘Writing R Extensions’ manual. Prior to R 3.6.0 the default C++ standard was that of the compiler used: currently it is C++17 (if available): this can be overridden by setting CXXSTD
when R is configured.
https://en.cppreference.com/w/cpp/compiler_support indicates which versions of common compilers support (parts of) which C++ standards. GCC 5 was the minimum version with sufficient C++14 support. GCC introduced C++17 support gradually, but version 7 should suffice.
diff --git a/r-admin/Platform-notes.html b/r-admin/Platform-notes.html index 2166892..1f74028 100644 --- a/r-admin/Platform-notes.html +++ b/r-admin/Platform-notes.html @@ -437,7 +437,7 @@We tried the compilers in oneAPI 2024.0.0 and 2023.x.y using (the paths do differ by compiler version)
+We tried the compilers in oneAPI 2024.0.2 and 2023.x.y using (the paths do differ by compiler version)
IP=/path/to/compilers/bin/
CC=$IP/icx
CXX=$IP/icpx
diff --git a/r-admin/search.json b/r-admin/search.json
index 649f5cb..c4796cc 100644
--- a/r-admin/search.json
+++ b/r-admin/search.json
@@ -67,7 +67,7 @@
"href": "Installing-R-under-Unix-alikes.html#other-options",
"title": "2 Installing R under Unix-alikes",
"section": "2.7 Other Options",
- "text": "2.7 Other Options\nThere are many other installation options, most of which are listed by configure --help. Almost all of those not listed elsewhere in this manual are either standard autoconf options not relevant to R or intended for specialist uses by the R developers.\nOne that may be useful when working on R itself is the option --disable-byte-compiled-packages, which ensures that the base and recommended packages are not byte-compiled. (Alternatively the (make or environment) variable R_NO_BASE_COMPILE can be set to a non-empty value for the duration of the build.)\nOption --with-internal-tzcode makes use of R’s own code and copy of the IANA database for managing timezones. This will be preferred where there are issues with the system implementation, usually involving times after 2037 or before 1916. An alternative time-zone directory8 can be used, pointed to by environment variable TZDIR: this should contain files such as Europe/London. On all tested OSes the system timezone was deduced correctly, but if necessary it can be set as the value of environment variable TZ.8 How to prepare such a directory is described in file src/extra/tzone/Notes in the R sources.\nOptions --with-internal-iswxxxxx, --with-internal-towlower and --with-internal-wcwidth were introduced in R 4.1.0. These control the replacement of the system wide-character classification (such as iswprint), case-changing (wctrans) and width (wcwidth and wcswidth) functions by ones contained in the R sources. Replacement of the classification functions has been done for many years on macOS and AIX (and Windows): option --with-internal-iswxxxxx allows this to be suppressed on those platforms or used on others. Replacing the case-changing functions was new in R 4.1.0 and the default on macOS (and on Windows since R 4.2.0). Replacement of the width functions has also been done for many years and remains the default. These options will only matter to those working with non-ASCII character data, especially in languages written in a non-Western script9 (which includes ‘symbols’ such as emoji). Note that one of those iswxxxxx is iswprint which is used to decide whether to output a character as a glyph or as a \\U{xxxxxx} escape—for example, try \"\\U1f600\", an emoji. The width functions are of most importance in East Asian locale: their values differ between such locales. (Replacing the system functions provides a degree of platform-independence (including to OS updates) but replaces it with a dependence on the R version.)9 But on Windows problems have been seen with case-changing functions on accented Latin-1 characters.\n\n2.7.1 Debugging Symbols\nBy default, configure adds a flag (usually -g) to the compilation flags for C, Fortran and CXX sources. This will slow down compilation and increase object sizes of both R and packages, so it may be a good idea to change those flags (set CFLAGS etc in config.site before configuring, or edit files Makeconf and etc/Makeconf between running configure and make).\nHaving debugging symbols available is useful both when running R under a debugger (e.g., R -d gdb) and when using sanitizers and valgrind, all things intended for experts.\nDebugging symbols (and some others) can be ‘stripped’ on installation by using\nmake install-strip\nHow well this is supported depends on the platform: it works best on those using GNU binutils. On x86_64 Linux a typical reduction in overall size was from 92MB to 66MB. On macOS debugging symbols are not by default included in .dylib and .so files, so there is negligible difference.\n\n\n2.7.2 OpenMP Support\nBy default configure searches for suitable flags10 for OpenMP support for the C, C++ (default standard) and Fortran compilers.10 for example, -fopenmp, -fiopenmp, -xopenmp or -qopenmp. This includes for clang and the Intel and Oracle compilers.\nOnly the C result is currently used for R itself, and only if MAIN_LD/DYLIB_LD were not specified. This can be overridden by specifying\nR_OPENMP_CFLAGS\nUse for packages has similar restrictions (involving SHLIB_LD and similar: note that as Fortran code is by default linked by the C (or C++) compiler, both need to support OpenMP) and can be overridden by specifying some of\nSHLIB_OPENMP_CFLAGS\nSHLIB_OPENMP_CXXFLAGS\nSHLIB_OPENMP_FFLAGS\nSetting these to an empty value will disable OpenMP for that compiler (and configuring with --disable-openmp will disable all detection11 of OpenMP). The configure detection test is to compile and link a standalone OpenMP program, which is not the same as compiling a shared object and loading it into the C program of R’s executable. Note that overridden values are not tested.11 This does not necessarily disable use of OpenMP – the configure code allows for platforms where OpenMP is used without a flag. For the flang compiler in late 2017, the Fortran runtime always used OpenMP.\n\n\n2.7.3 C++ Support\nC++ is not used by R itself, but support is provided for installing packages with C++ code via make macros defined in file etc/Makeconf (and with explanations in file config.site):\nCXX\nCXXFLAGS\nCXXPICFLAGS\nCXXSTD\n\nCXX11\nCXX11STD\nCXX11FLAGS\nCXX11PICFLAGS\n\nCXX14\nCXX14STD\nCXX14FLAGS\nCXX14PICFLAGS\n\nCXX17\nCXX17STD\nCXX17FLAGS\nCXX17PICFLAGS\n\nCXX20\nCXX20STD\nCXX20FLAGS\nCXX20PICFLAGS\n\nCXX23\nCXX23STD\nCXX23FLAGS\nCXX23PICFLAGS\nThe macros CXX etc are those used by default for C++ code. configure will attempt to set the rest suitably, choosing for CXXSTD and CXX11STD a suitable flag such as -std=c++11 for C++11 support (which is required if C++ is to be supported at all). inferred values can be overridden in file config.site or on the configure command line: user-supplied values will be tested by compiling some C++11/14/17/20/23 code.\nIt may be that there is no suitable flag for C++14/17/20/23 support with the default compiler, in which case a different compiler could be selected for CXX14/CXX17/CXX20/CXX23 with its corresponding flags.\nThe -std flag is supported by the GCC, clang++ and Intel compilers. Currently accepted values are (plus some synonyms)\ng++: c++11 gnu+11 c++14 gnu++14 c++17 gnu++17 c++2a gnu++2a (from 8)\n c++20 gnu++20 (from 10) c++23 gnu++23 c++2b gnu++2b (from 11)\nIntel: c++11 gnu+11 c++14 gnu++14 c++17 gnu++17\n c++20 gnu++20 (from 2021.1) c++2b gnu++2b (from 2022.2)\n(Those for LLVM clang++ are documented at https://clang.llvm.org/cxx_status.html, and follow g++: -std=c++20 is supported from Clang 10, -std=c++2b from Clang 13. Apple Clang supports -std=c++2b from 13.1.6.)\n‘Standards’ for g++ starting with gnu enable ‘GNU extensions’: what those are is hard to track down.\nFor the use of C++11 and later in R packages see the ‘Writing R Extensions’ manual. Prior to R 3.6.0 the default C++ standard was that of the compiler used: currently it is C++17 (if available): this can be overridden by setting CXXSTD when R is configured.\nhttps://en.cppreference.com/w/cpp/compiler_support indicates which versions of common compilers support (parts of) which C++ standards. GCC 5 was the minimum version with sufficient C++14 support. GCC introduced C++17 support gradually, but version 7 should suffice.\n\n\n2.7.4 C standards\nCompiling R requires C99 or later: C11 and C17 are minor updates, but the substantial update planned for ‘C23’ (now expected ca April 2024) will also be supported.\nAs from R 4.3.0 there is support for packages to indicate their preferred C version. Macros CC17, C17FLAGS, CC23 and C23FLAGS can be set in config.site (there are examples there). Those for C17 should support C17 or earlier and not allow C23 additions so for example bool, true and false can be used as identifiers. Those for C23 should support new types such as bool.\nSome compilers warn enthusiastically about prototypes. For most, omitting -Wstrict-prototypes in C17FLAGS suffices. However, versions 15 and later of LLVM clang and 14.0.3 and later of Apple clang warn by default in all modes if -Wall or -pedantic is used, and may need -Wno-strict-prototypes.\n\n\n2.7.5 Link-Time Optimization\nThere is support for using link-time optimization (LTO) if the toolchain supports it: configure with flag --enable-lto. When LTO is enabled it is used for compiled code in add-on packages unless the flag --enable-lto=R is used12.12 Then recommended packages installed as part of the R installation do use LTO, but not packages installed later.\nThe main benefit seen to date from LTO has been detecting long-standing bugs in the ways packages pass arguments to compiled code and between compilation units. Benchmarking in 2020 with gcc/gfortran 10 showed gains of a few percent in increased performance and reduction in installed size for builds without debug symbols, but large size reductions for some packages13 with debug symbols. (Performance and size gains are said to be most often seen in complex C++ builds.)13 A complete CRAN installation reduced from 50 to 35GB.\nWhether toolchains support LTO is often unclear: all of the C compiler, the Fortran compiler14 and linker have to support it, and support it by the same mechanism (so mixing compiler families may not work and a non-default linker may be needed). It has been supported by the GCC and LLVM projects for some years with diverging implementations.14 although there is the possibility to exclude Fortran but that misses some of the benefits.\nLTO support was added in 2011 for GCC 4.5 on Linux but was little used before 2019: compiler support has steadily improved over those years and --enable-lto=R is nowadays used for some routine CRAN checking.\nUnfortunately --enable-lto may be accepted but silently do nothing useful if some of the toolchain does not support LTO: this is less common than it once was.\nVarious macros can be set in file config.site to customize how LTO is used. If the Fortran compiler is not of the same family as the C/C++ compilers, set macro LTO_FC (probably to empty). Macro LTO_LD can be used to select an alternative linker should that be needed.\n\n\n2.7.6 LTO with GCC\nThis has been tested on Linux with gcc/gfortran 8 and later: that needed setting (e.g. in config.site)\nAR=gcc-ar\nRANLIB=gcc-ranlib\nFor non-system compilers or if those wrappers have not been installed one may need something like\nAR=\"ar --plugin=/path/to/liblto_plugin.so\"\nRANLIB=\"ranlib --plugin=/path/to/liblto_plugin.so\"\namd NM may be needed to be set analogously. (If using an LTO-enabled build to check packages, set environment variable UserNM15 to gcc-nm.)15 not NM as we found make overriding that.\nWith GCC 5 and later it is possible to parallelize parts of the LTO linking process: set the make macro LTO to something like LTO=-flto=8 (to use 8 threads), for example in file config.site.\nUnder some circumstances and for a few packages, the PIC flags have needed overriding on Linux with GCC 9 and later: e.g use in config.site:\nCPICFLAGS=-fPIC\nCXXPICFLAGS=-fPIC\nCXX11PICFLAGS=-fPIC\nCXX14PICFLAGS=-fPIC\nCXX17PICFLAGS=-fPIC\nCXX20PICFLAGS=-fPIC\nFPICFLAGS=-fPIC\nWe suggest only using these if the problem is encountered (it was not seen on CRAN with GCC 10 at the time of writing).\nNote that R may need to be re-compiled after even a minor update to the compiler (e.g. from 10.1 to 10.2) but this may not be clear from confused compiler messages.\n\n\n2.7.7 LTO with LLVM\nLLVM supports another type of LTO called ‘Thin LTO’ as well as a similar implementation to GCC, sometimes called ‘Full LTO’. (See https://clang.llvm.org/docs/ThinLTO.html.) Currently the LLVM compilers relevant to R are clang and flang for which this can be selected by setting macro LTO=-flto=thin. LLVM has\nAR=llvm-ar\nRANLIB=llvm-ranlib\n(but macOS does not, and these are not needed there). Where the linker supports a parallel backend for Thin LTO this can be specified via the macro LTO_LD: see the URL above for per-linker settings and further linking optimizations.)\nFor example, on macOS one might use\nLTO=-flto=thin\nLTO_FC=\nLTO_LD=-Wl,-mllvm,-threads=4\nto use Thin LTO with 4 threads for C/C++ code, but skip LTO for Fortran code compiled with gfortran.\nIt is said to be particularly beneficial to use -O3 for clang in conjunction with LTO.\nIt seems that flang may support LTO, but with no documentation as yet.\nThe 2020s versions of Intel’s C/C++ compilers are based on LLVM and as such support LLVM-style LTO, both ‘full’ and ‘thin’. This might use something like\nLTO=-flto=thin -flto-jobs=8\n\n\n2.7.8 LTO for package checking\nLTO effectively compiles all the source code in a package as a single compilation unit and so allows the compiler (with sufficient diagnostic flags such as -Wall) to check consistency between what are normally separate compilation units.\nWith gcc/gfortran 9.x and later16 LTO will flag inconsistencies in calls to Fortran subroutines/functions, both between Fortran source files and between Fortran and C/C++. gfortran 8.4, 9.2 and later can help understanding these by extracting C prototypes from Fortran source files with option -fc-prototypes-external, e.g. that (at the time of writing) Fortran LOGICAL corresponds to int_least32_t * in C.16 probably also 8.4 and later."
+ "text": "2.7 Other Options\nThere are many other installation options, most of which are listed by configure --help. Almost all of those not listed elsewhere in this manual are either standard autoconf options not relevant to R or intended for specialist uses by the R developers.\nOne that may be useful when working on R itself is the option --disable-byte-compiled-packages, which ensures that the base and recommended packages are not byte-compiled. (Alternatively the (make or environment) variable R_NO_BASE_COMPILE can be set to a non-empty value for the duration of the build.)\nOption --with-internal-tzcode makes use of R’s own code and copy of the IANA database for managing timezones. This will be preferred where there are issues with the system implementation, usually involving times after 2037 or before 1916. An alternative time-zone directory8 can be used, pointed to by environment variable TZDIR: this should contain files such as Europe/London. On all tested OSes the system timezone was deduced correctly, but if necessary it can be set as the value of environment variable TZ.8 How to prepare such a directory is described in file src/extra/tzone/Notes in the R sources.\nOptions --with-internal-iswxxxxx, --with-internal-towlower and --with-internal-wcwidth were introduced in R 4.1.0. These control the replacement of the system wide-character classification (such as iswprint), case-changing (wctrans) and width (wcwidth and wcswidth) functions by ones contained in the R sources. Replacement of the classification functions has been done for many years on macOS and AIX (and Windows): option --with-internal-iswxxxxx allows this to be suppressed on those platforms or used on others. Replacing the case-changing functions was new in R 4.1.0 and the default on macOS (and on Windows since R 4.2.0). Replacement of the width functions has also been done for many years and remains the default. These options will only matter to those working with non-ASCII character data, especially in languages written in a non-Western script9 (which includes ‘symbols’ such as emoji). Note that one of those iswxxxxx is iswprint which is used to decide whether to output a character as a glyph or as a \\U{xxxxxx} escape—for example, try \"\\U1f600\", an emoji. The width functions are of most importance in East Asian locale: their values differ between such locales. (Replacing the system functions provides a degree of platform-independence (including to OS updates) but replaces it with a dependence on the R version.)9 But on Windows problems have been seen with case-changing functions on accented Latin-1 characters.\n\n2.7.1 Debugging Symbols\nBy default, configure adds a flag (usually -g) to the compilation flags for C, Fortran and CXX sources. This will slow down compilation and increase object sizes of both R and packages, so it may be a good idea to change those flags (set CFLAGS etc in config.site before configuring, or edit files Makeconf and etc/Makeconf between running configure and make).\nHaving debugging symbols available is useful both when running R under a debugger (e.g., R -d gdb) and when using sanitizers and valgrind, all things intended for experts.\nDebugging symbols (and some others) can be ‘stripped’ on installation by using\nmake install-strip\nHow well this is supported depends on the platform: it works best on those using GNU binutils. On x86_64 Linux a typical reduction in overall size was from 92MB to 66MB. On macOS debugging symbols are not by default included in .dylib and .so files, so there is negligible difference.\n\n\n2.7.2 OpenMP Support\nBy default configure searches for suitable flags10 for OpenMP support for the C, C++ (default standard) and Fortran compilers.10 for example, -fopenmp, -fiopenmp, -xopenmp or -qopenmp. This includes for clang and the Intel and Oracle compilers.\nOnly the C result is currently used for R itself, and only if MAIN_LD/DYLIB_LD were not specified. This can be overridden by specifying\nR_OPENMP_CFLAGS\nUse for packages has similar restrictions (involving SHLIB_LD and similar: note that as Fortran code is by default linked by the C (or C++) compiler, both need to support OpenMP) and can be overridden by specifying some of\nSHLIB_OPENMP_CFLAGS\nSHLIB_OPENMP_CXXFLAGS\nSHLIB_OPENMP_FFLAGS\nSetting these to an empty value will disable OpenMP for that compiler (and configuring with --disable-openmp will disable all detection11 of OpenMP). The configure detection test is to compile and link a standalone OpenMP program, which is not the same as compiling a shared object and loading it into the C program of R’s executable. Note that overridden values are not tested.11 This does not necessarily disable use of OpenMP – the configure code allows for platforms where OpenMP is used without a flag. For the flang compiler in late 2017, the Fortran runtime always used OpenMP.\n\n\n2.7.3 C++ Support\nC++ is not used by R itself, but support is provided for installing packages with C++ code via make macros defined in file etc/Makeconf (and with explanations in file config.site):\nCXX\nCXXFLAGS\nCXXPICFLAGS\nCXXSTD\n\nCXX11\nCXX11STD\nCXX11FLAGS\nCXX11PICFLAGS\n\nCXX14\nCXX14STD\nCXX14FLAGS\nCXX14PICFLAGS\n\nCXX17\nCXX17STD\nCXX17FLAGS\nCXX17PICFLAGS\n\nCXX20\nCXX20STD\nCXX20FLAGS\nCXX20PICFLAGS\n\nCXX23\nCXX23STD\nCXX23FLAGS\nCXX23PICFLAGS\nThe macros CXX etc are those used by default for C++ code. configure will attempt to set the rest suitably, choosing for CXXSTD and CXX11STD a suitable flag such as -std=c++11 for C++11 support (which is required if C++ is to be supported at all). inferred values can be overridden in file config.site or on the configure command line: user-supplied values will be tested by compiling some C++11/14/17/20/23 code.\nIt may be that there is no suitable flag for C++14/17/20/23 support with the default compiler, in which case a different compiler could be selected for CXX14/CXX17/CXX20/CXX23 with its corresponding flags.\nThe -std flag is supported by the GCC, clang++ and Intel compilers. Currently accepted values are (plus some synonyms)\ng++: c++11 gnu+11 c++14 gnu++14 c++17 gnu++17 c++2a gnu++2a (from 8)\n c++20 gnu++20 (from 10) c++23 gnu++23 c++2b gnu++2b (from 11)\nIntel: c++11 gnu+11 c++14 gnu++14 c++17 gnu++17\n c++20 gnu++20 (from 2021.1) c++2b gnu++2b (from 2022.2)\n(Those for LLVM clang++ are documented at https://clang.llvm.org/cxx_status.html, and follow g++: -std=c++20 is supported from Clang 10, -std=c++2b from Clang 13 and -std=c++23 from Clang 17. Apple Clang supports -std=c++2b from 13.1.6: version 15.0.0 does not support -std=c++23.)\n‘Standards’ for g++ starting with gnu enable ‘GNU extensions’: what those are is hard to track down.\nFor the use of C++11 and later in R packages see the ‘Writing R Extensions’ manual. Prior to R 3.6.0 the default C++ standard was that of the compiler used: currently it is C++17 (if available): this can be overridden by setting CXXSTD when R is configured.\nhttps://en.cppreference.com/w/cpp/compiler_support indicates which versions of common compilers support (parts of) which C++ standards. GCC 5 was the minimum version with sufficient C++14 support. GCC introduced C++17 support gradually, but version 7 should suffice.\n\n\n2.7.4 C standards\nCompiling R requires C99 or later: C11 and C17 are minor updates, but the substantial update planned for ‘C23’ (now expected ca April 2024) will also be supported.\nAs from R 4.3.0 there is support for packages to indicate their preferred C version. Macros CC17, C17FLAGS, CC23 and C23FLAGS can be set in config.site (there are examples there). Those for C17 should support C17 or earlier and not allow C23 additions so for example bool, true and false can be used as identifiers. Those for C23 should support new types such as bool.\nSome compilers warn enthusiastically about prototypes. For most, omitting -Wstrict-prototypes in C17FLAGS suffices. However, versions 15 and later of LLVM clang and 14.0.3 and later of Apple clang warn by default in all modes if -Wall or -pedantic is used, and may need -Wno-strict-prototypes.\n\n\n2.7.5 Link-Time Optimization\nThere is support for using link-time optimization (LTO) if the toolchain supports it: configure with flag --enable-lto. When LTO is enabled it is used for compiled code in add-on packages unless the flag --enable-lto=R is used12.12 Then recommended packages installed as part of the R installation do use LTO, but not packages installed later.\nThe main benefit seen to date from LTO has been detecting long-standing bugs in the ways packages pass arguments to compiled code and between compilation units. Benchmarking in 2020 with gcc/gfortran 10 showed gains of a few percent in increased performance and reduction in installed size for builds without debug symbols, but large size reductions for some packages13 with debug symbols. (Performance and size gains are said to be most often seen in complex C++ builds.)13 A complete CRAN installation reduced from 50 to 35GB.\nWhether toolchains support LTO is often unclear: all of the C compiler, the Fortran compiler14 and linker have to support it, and support it by the same mechanism (so mixing compiler families may not work and a non-default linker may be needed). It has been supported by the GCC and LLVM projects for some years with diverging implementations.14 although there is the possibility to exclude Fortran but that misses some of the benefits.\nLTO support was added in 2011 for GCC 4.5 on Linux but was little used before 2019: compiler support has steadily improved over those years and --enable-lto=R is nowadays used for some routine CRAN checking.\nUnfortunately --enable-lto may be accepted but silently do nothing useful if some of the toolchain does not support LTO: this is less common than it once was.\nVarious macros can be set in file config.site to customize how LTO is used. If the Fortran compiler is not of the same family as the C/C++ compilers, set macro LTO_FC (probably to empty). Macro LTO_LD can be used to select an alternative linker should that be needed.\n\n\n2.7.6 LTO with GCC\nThis has been tested on Linux with gcc/gfortran 8 and later: that needed setting (e.g. in config.site)\nAR=gcc-ar\nRANLIB=gcc-ranlib\nFor non-system compilers or if those wrappers have not been installed one may need something like\nAR=\"ar --plugin=/path/to/liblto_plugin.so\"\nRANLIB=\"ranlib --plugin=/path/to/liblto_plugin.so\"\namd NM may be needed to be set analogously. (If using an LTO-enabled build to check packages, set environment variable UserNM15 to gcc-nm.)15 not NM as we found make overriding that.\nWith GCC 5 and later it is possible to parallelize parts of the LTO linking process: set the make macro LTO to something like LTO=-flto=8 (to use 8 threads), for example in file config.site.\nUnder some circumstances and for a few packages, the PIC flags have needed overriding on Linux with GCC 9 and later: e.g use in config.site:\nCPICFLAGS=-fPIC\nCXXPICFLAGS=-fPIC\nCXX11PICFLAGS=-fPIC\nCXX14PICFLAGS=-fPIC\nCXX17PICFLAGS=-fPIC\nCXX20PICFLAGS=-fPIC\nFPICFLAGS=-fPIC\nWe suggest only using these if the problem is encountered (it was not seen on CRAN with GCC 10 at the time of writing).\nNote that R may need to be re-compiled after even a minor update to the compiler (e.g. from 10.1 to 10.2) but this may not be clear from confused compiler messages.\n\n\n2.7.7 LTO with LLVM\nLLVM supports another type of LTO called ‘Thin LTO’ as well as a similar implementation to GCC, sometimes called ‘Full LTO’. (See https://clang.llvm.org/docs/ThinLTO.html.) Currently the LLVM compilers relevant to R are clang and flang for which this can be selected by setting macro LTO=-flto=thin. LLVM has\nAR=llvm-ar\nRANLIB=llvm-ranlib\n(but macOS does not, and these are not needed there). Where the linker supports a parallel backend for Thin LTO this can be specified via the macro LTO_LD: see the URL above for per-linker settings and further linking optimizations.)\nFor example, on macOS one might use\nLTO=-flto=thin\nLTO_FC=\nLTO_LD=-Wl,-mllvm,-threads=4\nto use Thin LTO with 4 threads for C/C++ code, but skip LTO for Fortran code compiled with gfortran.\nIt is said to be particularly beneficial to use -O3 for clang in conjunction with LTO.\nIt seems that flang may support LTO, but with no documentation as yet.\nThe 2020s versions of Intel’s C/C++ compilers are based on LLVM and as such support LLVM-style LTO, both ‘full’ and ‘thin’. This might use something like\nLTO=-flto=thin -flto-jobs=8\n\n\n2.7.8 LTO for package checking\nLTO effectively compiles all the source code in a package as a single compilation unit and so allows the compiler (with sufficient diagnostic flags such as -Wall) to check consistency between what are normally separate compilation units.\nWith gcc/gfortran 9.x and later16 LTO will flag inconsistencies in calls to Fortran subroutines/functions, both between Fortran source files and between Fortran and C/C++. gfortran 8.4, 9.2 and later can help understanding these by extracting C prototypes from Fortran source files with option -fc-prototypes-external, e.g. that (at the time of writing) Fortran LOGICAL corresponds to int_least32_t * in C.16 probably also 8.4 and later."
},
{
"objectID": "Installing-R-under-Unix-alikes.html#testing-an-installation",
@@ -249,7 +249,7 @@
"href": "Essential-and-useful-other-programs-under-a-Unix-alike.html#linear-algebra",
"title": "Appendix A — Essential and useful other programs under a Unix-alike",
"section": "A.3 Linear algebra",
- "text": "A.3 Linear algebra\nThe linear algebra routines in R make use of BLAS (Basic Linear Algebra Subprograms, https://netlib.org/blas/faq.html) routines, and most make use of routines from LAPACK (Linear Algebra PACKage, https://netlib.org/lapack/). The R sources contain reference (Fortran) implementations of these, but they can be replaced by external libraries, usually those tuned for speed on specific CPUs. These libraries normally contain all of the BLAS routines and some tuned LAPACK routines and perhaps the rest of LAPACK from the reference implementation. Because of the way linking works, using an external BLAS library may necessitate using the version of LAPACK it contains.\nNote that the alternative implementations will not give identical numeric results. Some differences may be benign (such the signs of SVDs and eigenvectors), but the optimized routines can be less accurate and (particularly for LAPACK) can be from older versions with fewer corrections. However, R relies on ISO/IEC 60559 compliance. This can be broken if for example the code assumes that terms with a zero factor are always zero and do not need to be computed—whereas x*0 can be NaN. The internal BLAS has been extensively patched to avoid this whereas MKL’s documentation has warned\n\nLAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or NaN values. Using these special values may cause LAPACK to return unexpected results or become unstable.\n\nSome of the external libraries are multi-threaded. One issue is that R profiling (which uses the SIGPROF signal) may cause problems, and you may want to disable profiling if you use a multi-threaded BLAS. Note that using a multi-threaded BLAS can result in taking more CPU time and even more elapsed time (occasionally dramatically so) than using a similar single-threaded BLAS. On a machine running other tasks, there can be contention for CPU caches that reduces the effectiveness of the optimization of cache use by a BLAS implementation: some people warn that this is especially problematic for hyperthreaded CPUs.\nBLAS and LAPACK routines may be used inside threaded code, for example in OpenMP sections in packages such as mgcv. The reference implementations are thread-safe but external ones may not be (even single-threaded ones): this can lead to hard-to-track-down incorrect results or segfaults.\nThere is a tendency for re-distributors of R to use ‘enhanced’ linear algebra libraries without explaining their downsides.\n\nA.3.1 BLAS\nAn external BLAS library has to be explicitly requested at configure time.\nYou can specify a particular BLAS library via a value for the configuration option --with-blas. If this is given with no =, its value is taken from the environment variable BLAS_LIBS, set for example in config.site. If neither the option nor the environment variable supply a value, a search is made for a suitable19 BLAS. If the value is not obviously a linker command (starting with a dash or giving the path to a library), it is prefixed by -l, so19 The search order is currently OpenBLAS, BLIS, ATLAS, platform-specific choices (see below) and finally a generic libblas.\n--with-blas=\"foo\"\nis an instruction to link against -lfoo to find an external BLAS (which needs to be found both at link time and run time).\nThe configure code checks that the external BLAS is complete (as of LAPACK 3.9.1: it must include all double precision and double complex routines, as well as LSAME), and appears to be usable. However, an external BLAS has to be usable from a shared object (so must contain position-independent code), and that is not checked. Also, the BLAS can be switched after configure is run, either as a symbolic link or by the mechanisms mentioned below, and this can defeat the completeness check.\nSome enhanced BLASes are compiler-system-specific (Accelerate on macOS, sunperf on Solaris20, libessl on IBM). The correct incantation for these is often found via --with-blas with no value on the appropriate platforms.20 Using the Oracle Developer Studio cc and f95 compilers\nNote that under Unix (but not under Windows) if R is compiled against a non-default BLAS and --enable-BLAS-shlib is not used (it is the default on all platforms except AIX), then all BLAS-using packages must also be. So if R is re-built to use an enhanced BLAS then packages such as quantreg will need to be re-installed.\nDebian/Ubuntu systems provide a system-specific way to switch the BLAS in use: Build R with --with-blas to select the OS version of the reference BLAS, and then use update-alternatives to switch between the available BLAS libraries. See https://wiki.debian.org/DebianScience/LinearAlgebraLibraries.\nFedora 33 and later offer ‘FlexiBLAS’, a similar mechanism for switching the BLAS in use (https://www.mpi-magdeburg.mpg.de/projects/flexiblas). However, rather than overriding libblas, this requires configuring R with option --with-blas=flexiblas. ‘Backend’ wrappers are available for the reference BLAS, ATLAS and serial, threaded and OpenMP builds of OpenBLAS and BLIS. This can be controlled from a running R session by package flexiblas.\nBLAS implementations which use parallel computations can be non-deterministic: this is known for ATLAS.\n\n\nA.3.2 ATLAS\nATLAS (https://math-atlas.sourceforge.net/) is a “tuned” BLAS that runs on a wide range of Unix-alike platforms. Unfortunately it is built by default as a static library that on some platforms may not be able to be used with shared objects such as are used in R packages. Be careful when using pre-built versions of ATLAS static libraries (they seem to work on ix86 platforms, but not always on x86_64 ones).\nATLAS contains replacements for a small number of LAPACK routines, but can be built to merge these with the reference LAPACK sources to include a full LAPACK library.\nRecent versions of ATLAS can be built as a single shared library, either libsatlas or libtatlas (serial or threaded respectively): these may even contain a full LAPACK. Such builds can be used by one of\n--with-blas=satlas\n--with-blas=tatlas\nor, as on x86_64 Fedora where a path needs to be specified,\n--with-blas=\"-L/usr/lib64/atlas -lsatlas\"\n--with-blas=\"-L/usr/lib64/atlas -ltatlas\"\nDistributed ATLAS libraries cannot be tuned to your machine and so are a compromise: for example Fedora tunes21 x86_64 RPMs for CPUs with SSE3 extensions, and separate RPMs may be available for specific CPU families.21 The only way to see exactly which CPUs the distributed libraries have been tuned for is to read the atlas.spec file.\nNote that building R on Linux against distributed shared libraries may need -devel or -dev packages installed.\nLinking against multiple static libraries requires one of\n--with-blas=\"-lf77blas -latlas\"\n--with-blas=\"-lptf77blas -lpthread -latlas\"\n--with-blas=\"-L/path/to/ATLAS/libs -lf77blas -latlas\"\n--with-blas=\"-L/path/to/ATLAS/libs -lptf77blas -lpthread -latlas\"\nConsult its installation guide22 for how to build ATLAS as a shared library or as a static library with position-independent code (on platforms where that matters).22 https://math-atlas.sourceforge.net/atlas_install/\nAccording to the ATLAS FAQ23 the maximum number of threads used by multi-threaded ATLAS is set at compile time. Also, the author advises against using multi-threaded ATLAS on hyperthreaded CPUs without restricting affinities at compile-time to one virtual core per physical CPU. (For the Fedora libraries the compile-time flag specifies 4 threads.)23 https://math-atlas.sourceforge.net/faq.html#tnum\n\n\nA.3.3 OpenBLAS and BLIS\nDr Kazushige Goto wrote a tuned BLAS for several processors and OSes, which was frozen in 2010. OpenBLAS (https://www.openblas.net/) is a descendant project with support for some later CPUs.\nThis can be used by configuring R with something like\n--with-blas=\"openblas\"\nSee see Shared BLAS for an alternative (and in many ways preferable) way to use them.\nSome platforms provide multiple builds of OpenBLAS: for example Fedora has RPMs2424 (and more, e.g. for 64-bit ints and static versions).\nopenblas\nopenblas-threads\nopenblas-openmp\nproviding shared libraries\nlibopenblas.so\nlibopenblasp.so\nlibopenblaso.so\nrespectively, each of which can be used as a shared BLAS. For the second and third the number of threads is controlled by OPENBLAS_NUM_THREADS and OMP_NUM_THREADS (as usual for OpenMP) respectively.\nThese and their Debian equivalents contain a complete LAPACK implementation.\nNote that building R on Linux against distributed libraries may need -devel or -dev packages installed.\nFor ix86 and x86_64 CPUs most distributed libraries contain several alternatives for different CPU microarchitectures with the choice being made at run time.\nAnother descendant project is BLIS (https://github.com/flame/blis). This has (in Fedora) shared libraries\nlibblis.so\nlibblisp.so\nlibbliso.so\n(p for ‘threads’, o for OpenMP as for OpenBLAS) which can also be used as a shared BLAS. The Fedora builds do not include LAPACK in the BLIS libraries.\n\n\nA.3.4 Intel MKL\nFor Intel processors (and perhaps others) and some distributions of Linux, there is Intel’s Math Kernel Library25. You are encouraged to read the documentation which is installed with the library before attempting to link to MKL. This includes a ‘link line advisor’ which will suggest appropriate incantations: its use is recommended. Or see https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.vpt6qp (which at the time of writing selected the Intel library for linking with GCC).25 Nowadays known as ‘Intel oneAPI Math Kernel Library’ or even ‘oneMKL’.\nThere are also versions of MKL for macOS26 and Windows, but when these have been tried they did not work with the default compilers used for R on those platforms.26 The issue for macOS has been the use of double-complex routines.\nThe following examples have been used with MKL versions 10.3 to 2023.2.0, for GCC compilers on x86_64 CPUs. (See also Intel compilers.)\nTo use a sequential version of MKL we used\nMKL_LIB_PATH=/path/to/intel_mkl/mkl/lib/intel64\nexport LD_LIBRARY_PATH=$MKL_LIB_PATH\nMKL=\"-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_core -lmkl_sequential\"\n./configure --with-blas=\"$MKL\" --with-lapack\nThe option --with-lapack is used since MKL contains a tuned copy of LAPACK (often older than the current version) as well as the BLAS (see LAPACK), although this can be omitted.\nThreaded MKL may be used by replacing the line defining the variable MKL by\nMKL=\"-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_core \\\n -lmkl_gnu_thread -dl -fopenmp\"\nR can also be linked against a single shared library, libmkl_rt.so, for both BLAS and LAPACK, but the correct OpenMP and MKL interface layer then has to be selected via environment variables. With 64-bit builds and the GCC compilers, we used\nexport MKL_INTERFACE_LAYER=GNU,LP64 \nexport MKL_THREADING_LAYER=GNU\nOn Debian/Ubuntu, MKL is provided by package intel-mkl-full and one can set libmkl_rt.so as the system-wide implementation of both BLAS and LAPACK during installation of the package, so that also R installed from Debian/Ubuntu package r-base would use it. It is, however, still essential to set MKL_INTERFACE_LAYER and MKL_THREADING_LAYER before running R, otherwise MKL computations will produce incorrect results. R does not have to be rebuilt to use MKL, but configure includes tests which may discover some errors such as a failure to set the correct OpenMP and MKL interface layer.\nNote that the Debian/Ubuntu distribution can be quite old (for example 2020.4 in mid-2023 when 2023.1 was current): this can be important for the LAPACK version included.\nThe default number of threads will be chosen by the OpenMP software, but can be controlled by setting OMP_NUM_THREADS or MKL_NUM_THREADS, and in recent versions seems to default to a sensible value for sole use of the machine. (Parallel MKL has not always passed make check-all, but did with MKL 2019.4 and later.)\nMKL includes a partial implementation of FFTW3, which causes trouble for applications that require some of the FFTW3 functionality unsupported in MKL. Please see the MKL manuals for description of these limitations and for instructions on how to create a custom version of MKL which excludes the FFTW3 wrappers.\nThere is Intel documentation for building R with MKL at https://www.intel.com/content/www/us/en/developer/articles/technical/using-onemkl-with-r.html: that includes\n-Wl,--no-as-needed\nwhich we have not found necessary.\n\n\nA.3.5 Shared BLAS\nThe BLAS library will be used for many of the add-on packages as well as for R itself. This means that it is better to use a shared/dynamic BLAS library, as most of a static library will be compiled into the R executable and each BLAS-using package.\nR offers the option of compiling the BLAS into a dynamic library libRblas stored in R_HOME/lib and linking both R itself and all the add-on packages against that library.\nThis is the default on all platforms except AIX unless an external BLAS is specified and found: for the latter it can be used by specifying the option --enable-BLAS-shlib, and it can always be disabled via --disable-BLAS-shlib.\nThis has both advantages and disadvantages.\n\nIt saves space by having only a single copy of the BLAS routines, which is helpful if there is an external static BLAS (as used to be standard for ATLAS).\nThere may be performance disadvantages in using a shared BLAS. Probably the most likely is when R’s internal BLAS is used and R is not built as a shared library, when it is possible to build the BLAS into R.bin (and libR.a) without using position-independent code. However, experiments showed that in many cases using a shared BLAS was as fast, provided high levels of compiler optimization are used.\nIt is easy to change the BLAS without needing to re-install R and all the add-on packages, since all references to the BLAS go through libRblas, and that can be replaced. Note though that any dynamic libraries the replacement links to will need to be found by the linker: this may need the library path to be changed in R_HOME/etc/ldpaths.\n\nAnother option to change the BLAS in use is to symlink a single dynamic BLAS library to R_HOME/lib/libRblas.so. For example, just\nmv R_HOME/lib/libRblas.so R_HOME/lib/libRblas.so.keep\nln -s /usr/lib64/libopenblasp.so.0 R_HOME/lib/libRblas.so\non x86_64 Fedora will change the BLAS used to multithreaded OpenBLAS. A similar link works for most versions of the OpenBLAS (provided the appropriate lib directory is in the run-time library path or ld.so cache). It can also be used for a single-library ATLAS, so on x86_64 Fedora either of\nln -s /usr/lib64/atlas/libsatlas.so.3 R_HOME/lib/libRblas.so\nln -s /usr/lib64/atlas/libtatlas.so.3 R_HOME/lib/libRblas.so\ncan be used with its distributed ATLAS libraries. (If you have the -devel RPMs installed you can omit the .0/.3.)\nNote that rebuilding or symlinking libRblas.so may not suffice if the intention is to use a modified LAPACK contained in an external BLAS: the latter could even cause conflicts. However, on Fedora where the OpenBLAS distribution contains a copy of LAPACK, it is the latter which is used.\n\n\nA.3.6 LAPACK\nIf when configuring R a system LAPACK library is found of version 3.10.0 or later (and does not contain BLAS routines) it will be used instead of compiling the LAPACK code in the package sources. This can be prevented by configuring R with --without-lapack. Using a static liblapack.a is not supported.\nIt is assumed that -llapack is the reference LAPACK library but on Debian/Ubuntu it can be switched, including after R is installed. On such a platform it is better to use --without-lapack or --with-blas --with-lapack (see below) explicitly. The known examples27 of a non-reference LAPACK library found at installation all contain BLAS routines so are not used by a default configure run.27 ATLAS, OpenBLAS and Accelerate.\nProvision is made for specifying an external LAPACK library with option --with-lapack, principally to cope with BLAS libraries which contain a copy of LAPACK (such as Accelerate on macOS and some builds of ATLAS, FlexiBLAS, MKL and OpenBLAS on ix86/x86_64 Linux). At least LAPACK version 3.2 is required. This can only be done if --with-blas has been used.\nHowever, the likely performance gains are thought to be small (and may be negative). The default is not to search for a suitable LAPACK library, and this is definitely not recommended. You can specify a specific LAPACK library or a search for a generic library by the configuration option --with-lapack without a value. The default for --with-lapack is to check the BLAS library (for function DPSTRF) and then look for an external library -llapack. Sites searching for the fastest possible linear algebra may want to build a LAPACK library using the ATLAS-optimized subset of LAPACK. Similarly, OpenBLAS can be built to contain an optimized subset of LAPACK or a full LAPACK (the latter seeming to be the default).\nA value for --with-lapack can be set via the environment variable LAPACK_LIBS, but this will only be used if --with-lapack is specified and the BLAS library does not contain LAPACK.\nPlease bear in mind that using --with-lapack is provided only because it is necessary on some platforms and because some users want to experiment with claimed performance improvements. In practice its main uses are without a value,\n\nwith an ‘enhanced’ BLAS such as ATLAS, FlexiBLAS, MKL or OpenBLAS which contains a full LAPACK (to avoid possible conflicts), or\non Debian/Ubuntu systems to select the system liblapack which can be switched by the ‘alternatives’ mechanism.\n\n\n\nA.3.7 Caveats\nAs with all libraries, you need to ensure that they and R were compiled with compatible compilers and flags. For example, this has meant that on Sun Sparc using the Oracle compilers the flag -dalign is needed if sunperf is to be used.\nOn some systems it has been necessary that an external BLAS/LAPACK was built with the same Fortran compiler used to build R.\nBLAS and LAPACK libraries built with recent versions of gfortran require calls from C/C++ to handle ‘hidden’ character lengths — R itself does so but many packages used not to and some have segfaulted. This was largely circumvented by using the Fortran flag -fno-optimize-sibling-calls (formerly set by configure if it detected gfortran 7 or later): however use of the R headers which include those character-length arguments is no longer optional in packages.\nLAPACK 3.9.0 (and probably earlier) had a bug in which the DCOMBSSQ subroutine may cause NA to be interpreted as zero. This is fixed in the R 3.6.3 and later sources, but if you use an external LAPACK, you may need to fix it there. (The bug was corrected in 3.9.1 and the routine removed in 3.10.1.)\nThe code (in dlapack.f) should read\n* ..\n* .. Executable Statements ..\n*\n IF( V1( 1 ).GE.V2( 1 ) ) THEN\n IF( V1( 1 ).NE.ZERO ) THEN\n V1( 2 ) = V1( 2 ) + ( V2( 1 ) / V1( 1 ) )**2 * V2( 2 )\n ELSE\n V1( 2 ) = V1( 2 ) + V2( 2 )\n END IF\n ELSE\n V1( 2 ) = V2( 2 ) + ( V1( 1 ) / V2( 1 ) )**2 * V1( 2 )\n V1( 1 ) = V2( 1 )\n END IF\n RETURN\n(The inner ELSE clause was missing in LAPACK 3.9.0.)\nIf you do use an external LAPACK, be aware of potential problems with other bugs in the LAPACK sources (or in the posted corrections to those sources), seen several times in Linux distributions over the years. We have even seen distributions with missing LAPACK routines from their liblapack.\nWe rely on limited support in LAPACK for matrices with 2^{31} or more elements: it is possible that an external LAPACK will not have that support.\nFootnotes"
+ "text": "A.3 Linear algebra\nThe linear algebra routines in R make use of BLAS (Basic Linear Algebra Subprograms, https://netlib.org/blas/faq.html) routines, and most make use of routines from LAPACK (Linear Algebra PACKage, https://netlib.org/lapack/). The R sources contain reference (Fortran) implementations of these, but they can be replaced by external libraries, usually those tuned for speed on specific CPUs. These libraries normally contain all of the BLAS routines and some tuned LAPACK routines and perhaps the rest of LAPACK from the reference implementation. Because of the way linking works, using an external BLAS library may necessitate using the version of LAPACK it contains.\nNote that the alternative implementations will not give identical numeric results. Some differences may be benign (such the signs of SVDs and eigenvectors), but the optimized routines can be less accurate and (particularly for LAPACK) can be from older versions with fewer corrections. However, R relies on ISO/IEC 60559 compliance. This can be broken if for example the code assumes that terms with a zero factor are always zero and do not need to be computed—whereas x*0 can be NaN. The internal BLAS has been extensively patched to avoid this whereas MKL’s documentation has warned\n\nLAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or NaN values. Using these special values may cause LAPACK to return unexpected results or become unstable.\n\nSome of the external libraries are multi-threaded. One issue is that R profiling (which uses the SIGPROF signal) may cause problems, and you may want to disable profiling if you use a multi-threaded BLAS. Note that using a multi-threaded BLAS can result in taking more CPU time and even more elapsed time (occasionally dramatically so) than using a similar single-threaded BLAS. On a machine running other tasks, there can be contention for CPU caches that reduces the effectiveness of the optimization of cache use by a BLAS implementation: some people warn that this is especially problematic for hyperthreaded CPUs.\nBLAS and LAPACK routines may be used inside threaded code, for example in OpenMP sections in packages such as mgcv. The reference implementations are thread-safe but external ones may not be (even single-threaded ones): this can lead to hard-to-track-down incorrect results or segfaults.\nThere is a tendency for re-distributors of R to use ‘enhanced’ linear algebra libraries without explaining their downsides.\n\nA.3.1 BLAS\nAn external BLAS library has to be explicitly requested at configure time.\nYou can specify a particular BLAS library via a value for the configuration option --with-blas. If this is given with no =, its value is taken from the environment variable BLAS_LIBS, set for example in config.site. If neither the option nor the environment variable supply a value, a search is made for a suitable19 BLAS. If the value is not obviously a linker command (starting with a dash or giving the path to a library), it is prefixed by -l, so19 The search order is currently OpenBLAS, BLIS, ATLAS, platform-specific choices (see below) and finally a generic libblas.\n--with-blas=\"foo\"\nis an instruction to link against -lfoo to find an external BLAS (which needs to be found both at link time and run time).\nThe configure code checks that the external BLAS is complete (as of LAPACK 3.9.1: it must include all double precision and double complex routines, as well as LSAME), and appears to be usable. However, an external BLAS has to be usable from a shared object (so must contain position-independent code), and that is not checked. Also, the BLAS can be switched after configure is run, either as a symbolic link or by the mechanisms mentioned below, and this can defeat the completeness check.\nSome enhanced BLASes are compiler-system-specific (Accelerate on macOS, sunperf on Solaris20, libessl on IBM). The correct incantation for these is often found via --with-blas with no value on the appropriate platforms.20 Using the Oracle Developer Studio cc and f95 compilers\nNote that under Unix (but not under Windows) if R is compiled against a non-default BLAS and --enable-BLAS-shlib is not used (it is the default on all platforms except AIX), then all BLAS-using packages must also be. So if R is re-built to use an enhanced BLAS then packages such as quantreg will need to be re-installed.\nDebian/Ubuntu systems provide a system-specific way to switch the BLAS in use: Build R with --with-blas to select the OS version of the reference BLAS, and then use update-alternatives to switch between the available BLAS libraries. See https://wiki.debian.org/DebianScience/LinearAlgebraLibraries.\nFedora 33 and later offer ‘FlexiBLAS’, a similar mechanism for switching the BLAS in use (https://www.mpi-magdeburg.mpg.de/projects/flexiblas). However, rather than overriding libblas, this requires configuring R with option --with-blas=flexiblas. ‘Backend’ wrappers are available for the reference BLAS, ATLAS and serial, threaded and OpenMP builds of OpenBLAS and BLIS, and perhaps others21. This can be controlled from a running R session by package flexiblas.21 for example, Intel MKL not packaged by Fedora.\nBLAS implementations which use parallel computations can be non-deterministic: this is known for ATLAS.\n\n\nA.3.2 ATLAS\nATLAS (https://math-atlas.sourceforge.net/) is a “tuned” BLAS that runs on a wide range of Unix-alike platforms. Unfortunately it is built by default as a static library that on some platforms may not be able to be used with shared objects such as are used in R packages. Be careful when using pre-built versions of ATLAS static libraries (they seem to work on ix86 platforms, but not always on x86_64 ones).\nATLAS contains replacements for a small number of LAPACK routines, but can be built to merge these with the reference LAPACK sources to include a full LAPACK library.\nRecent versions of ATLAS can be built as a single shared library, either libsatlas or libtatlas (serial or threaded respectively): these may even contain a full LAPACK. Such builds can be used by one of\n--with-blas=satlas\n--with-blas=tatlas\nor, as on x86_64 Fedora where a path needs to be specified,\n--with-blas=\"-L/usr/lib64/atlas -lsatlas\"\n--with-blas=\"-L/usr/lib64/atlas -ltatlas\"\nDistributed ATLAS libraries cannot be tuned to your machine and so are a compromise: for example Fedora tunes22 x86_64 RPMs for CPUs with SSE3 extensions, and separate RPMs may be available for specific CPU families.22 The only way to see exactly which CPUs the distributed libraries have been tuned for is to read the atlas.spec file.\nNote that building R on Linux against distributed shared libraries may need -devel or -dev packages installed.\nLinking against multiple static libraries requires one of\n--with-blas=\"-lf77blas -latlas\"\n--with-blas=\"-lptf77blas -lpthread -latlas\"\n--with-blas=\"-L/path/to/ATLAS/libs -lf77blas -latlas\"\n--with-blas=\"-L/path/to/ATLAS/libs -lptf77blas -lpthread -latlas\"\nConsult its installation guide23 for how to build ATLAS as a shared library or as a static library with position-independent code (on platforms where that matters).23 https://math-atlas.sourceforge.net/atlas_install/\nAccording to the ATLAS FAQ24 the maximum number of threads used by multi-threaded ATLAS is set at compile time. Also, the author advises against using multi-threaded ATLAS on hyperthreaded CPUs without restricting affinities at compile-time to one virtual core per physical CPU. (For the Fedora libraries the compile-time flag specifies 4 threads.)24 https://math-atlas.sourceforge.net/faq.html#tnum\n\n\nA.3.3 OpenBLAS and BLIS\nDr Kazushige Goto wrote a tuned BLAS for several processors and OSes, which was frozen in 2010. OpenBLAS (https://www.openblas.net/) is a descendant project with support for some later CPUs.\nThis can be used by configuring R with something like\n--with-blas=\"openblas\"\nSee see Shared BLAS for an alternative (and in many ways preferable) way to use them.\nSome platforms provide multiple builds of OpenBLAS: for example Fedora has RPMs2525 (and more, e.g. for 64-bit ints and static versions).\nopenblas\nopenblas-threads\nopenblas-openmp\nproviding shared libraries\nlibopenblas.so\nlibopenblasp.so\nlibopenblaso.so\nrespectively, each of which can be used as a shared BLAS. For the second and third the number of threads is controlled by OPENBLAS_NUM_THREADS and OMP_NUM_THREADS (as usual for OpenMP) respectively.\nThese and their Debian equivalents contain a complete LAPACK implementation.\nNote that building R on Linux against distributed libraries may need -devel or -dev packages installed.\nFor ix86 and x86_64 CPUs most distributed libraries contain several alternatives for different CPU microarchitectures with the choice being made at run time.\nAnother descendant project is BLIS (https://github.com/flame/blis). This has (in Fedora) shared libraries\nlibblis.so\nlibblisp.so\nlibbliso.so\n(p for ‘threads’, o for OpenMP as for OpenBLAS) which can also be used as a shared BLAS. The Fedora builds do not include LAPACK in the BLIS libraries.\n\n\nA.3.4 Intel MKL\nFor Intel processors (and perhaps others) and some distributions of Linux, there is Intel’s Math Kernel Library26. You are encouraged to read the documentation which is installed with the library before attempting to link to MKL. This includes a ‘link line advisor’ which will suggest appropriate incantations: its use is recommended. Or see https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.vpt6qp (which at the time of writing selected the Intel library for linking with GCC).26 Nowadays known as ‘Intel oneAPI Math Kernel Library’ or even ‘oneMKL’.\nThere are also versions of MKL for macOS27 and Windows, but when these have been tried they did not work with the default compilers used for R on those platforms.27 The issue for macOS has been the use of double-complex routines.\nThe following examples have been used with MKL versions 10.3 to 2023.2.0, for GCC compilers on x86_64 CPUs. (See also Intel compilers.)\nTo use a sequential version of MKL we used\nMKL_LIB_PATH=/path/to/intel_mkl/mkl/lib/intel64\nexport LD_LIBRARY_PATH=$MKL_LIB_PATH\nMKL=\"-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_core -lmkl_sequential\"\n./configure --with-blas=\"$MKL\" --with-lapack\nThe option --with-lapack is used since MKL contains a tuned copy of LAPACK (often older than the current version) as well as the BLAS (see LAPACK), although this can be omitted.\nThreaded MKL may be used by replacing the line defining the variable MKL by\nMKL=\"-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_core \\\n -lmkl_gnu_thread -dl -fopenmp\"\nR can also be linked against a single shared library, libmkl_rt.so, for both BLAS and LAPACK, but the correct OpenMP and MKL interface layer then has to be selected via environment variables. With 64-bit builds and the GCC compilers, we used\nexport MKL_INTERFACE_LAYER=GNU,LP64 \nexport MKL_THREADING_LAYER=GNU\nOn Debian/Ubuntu, MKL is provided by package intel-mkl-full and one can set libmkl_rt.so as the system-wide implementation of both BLAS and LAPACK during installation of the package, so that also R installed from Debian/Ubuntu package r-base would use it. It is, however, still essential to set MKL_INTERFACE_LAYER and MKL_THREADING_LAYER before running R, otherwise MKL computations will produce incorrect results. R does not have to be rebuilt to use MKL, but configure includes tests which may discover some errors such as a failure to set the correct OpenMP and MKL interface layer.\nNote that the Debian/Ubuntu distribution can be quite old (for example 2020.4 in mid-2023 when 2023.1 was current): this can be important for the LAPACK version included.\nThe default number of threads will be chosen by the OpenMP software, but can be controlled by setting OMP_NUM_THREADS or MKL_NUM_THREADS, and in recent versions seems to default to a sensible value for sole use of the machine. (Parallel MKL has not always passed make check-all, but did with MKL 2019.4 and later.)\nMKL includes a partial implementation of FFTW3, which causes trouble for applications that require some of the FFTW3 functionality unsupported in MKL. Please see the MKL manuals for description of these limitations and for instructions on how to create a custom version of MKL which excludes the FFTW3 wrappers.\nThere is Intel documentation for building R with MKL at https://www.intel.com/content/www/us/en/developer/articles/technical/using-onemkl-with-r.html: that includes\n-Wl,--no-as-needed\nwhich we have not found necessary.\n\n\nA.3.5 Shared BLAS\nThe BLAS library will be used for many of the add-on packages as well as for R itself. This means that it is better to use a shared/dynamic BLAS library, as most of a static library will be compiled into the R executable and each BLAS-using package.\nR offers the option of compiling the BLAS into a dynamic library libRblas stored in R_HOME/lib and linking both R itself and all the add-on packages against that library.\nThis is the default on all platforms except AIX unless an external BLAS is specified and found: for the latter it can be used by specifying the option --enable-BLAS-shlib, and it can always be disabled via --disable-BLAS-shlib.\nThis has both advantages and disadvantages.\n\nIt saves space by having only a single copy of the BLAS routines, which is helpful if there is an external static BLAS (as used to be standard for ATLAS).\nThere may be performance disadvantages in using a shared BLAS. Probably the most likely is when R’s internal BLAS is used and R is not built as a shared library, when it is possible to build the BLAS into R.bin (and libR.a) without using position-independent code. However, experiments showed that in many cases using a shared BLAS was as fast, provided high levels of compiler optimization are used.\nIt is easy to change the BLAS without needing to re-install R and all the add-on packages, since all references to the BLAS go through libRblas, and that can be replaced. Note though that any dynamic libraries the replacement links to will need to be found by the linker: this may need the library path to be changed in R_HOME/etc/ldpaths.\n\nAnother option to change the BLAS in use is to symlink a single dynamic BLAS library to R_HOME/lib/libRblas.so. For example, just\nmv R_HOME/lib/libRblas.so R_HOME/lib/libRblas.so.keep\nln -s /usr/lib64/libopenblasp.so.0 R_HOME/lib/libRblas.so\non x86_64 Fedora will change the BLAS used to multithreaded OpenBLAS. A similar link works for most versions of the OpenBLAS (provided the appropriate lib directory is in the run-time library path or ld.so cache). It can also be used for a single-library ATLAS, so on x86_64 Fedora either of\nln -s /usr/lib64/atlas/libsatlas.so.3 R_HOME/lib/libRblas.so\nln -s /usr/lib64/atlas/libtatlas.so.3 R_HOME/lib/libRblas.so\ncan be used with its distributed ATLAS libraries. (If you have the -devel RPMs installed you can omit the .0/.3.)\nNote that rebuilding or symlinking libRblas.so may not suffice if the intention is to use a modified LAPACK contained in an external BLAS: the latter could even cause conflicts. However, on Fedora where the OpenBLAS distribution contains a copy of LAPACK, it is the latter which is used.\n\n\nA.3.6 LAPACK\nIf when configuring R a system LAPACK library is found of version 3.10.0 or later (and does not contain BLAS routines) it will be used instead of compiling the LAPACK code in the package sources. This can be prevented by configuring R with --without-lapack. Using a static liblapack.a is not supported.\nIt is assumed that -llapack is the reference LAPACK library but on Debian/Ubuntu it can be switched, including after R is installed. On such a platform it is better to use --without-lapack or --with-blas --with-lapack (see below) explicitly. The known examples28 of a non-reference LAPACK library found at installation all contain BLAS routines so are not used by a default configure run.28 ATLAS, OpenBLAS and Accelerate.\nProvision is made for specifying an external LAPACK library with option --with-lapack, principally to cope with BLAS libraries which contain a copy of LAPACK (such as Accelerate on macOS and some builds of ATLAS, FlexiBLAS, MKL and OpenBLAS on ix86/x86_64 Linux). At least LAPACK version 3.2 is required. This can only be done if --with-blas has been used.\nHowever, the likely performance gains are thought to be small (and may be negative). The default is not to search for a suitable LAPACK library, and this is definitely not recommended. You can specify a specific LAPACK library or a search for a generic library by the configuration option --with-lapack without a value. The default for --with-lapack is to check the BLAS library (for function DPSTRF) and then look for an external library -llapack. Sites searching for the fastest possible linear algebra may want to build a LAPACK library using the ATLAS-optimized subset of LAPACK. Similarly, OpenBLAS can be built to contain an optimized subset of LAPACK or a full LAPACK (the latter seeming to be the default).\nA value for --with-lapack can be set via the environment variable LAPACK_LIBS, but this will only be used if --with-lapack is specified and the BLAS library does not contain LAPACK.\nPlease bear in mind that using --with-lapack is provided only because it is necessary on some platforms and because some users want to experiment with claimed performance improvements. In practice its main uses are without a value,\n\nwith an ‘enhanced’ BLAS such as ATLAS, FlexiBLAS, MKL or OpenBLAS which contains a full LAPACK (to avoid possible conflicts), or\non Debian/Ubuntu systems to select the system liblapack which can be switched by the ‘alternatives’ mechanism.\n\nIf building LAPACK from its Netlib sources, be aware that make with its supplied Makefile will make a staiic library and R requires a shared/dynamic one. To get one, use cmake as doxumented briefly in README.md. Something like (to build only the double and double complex subroutines with 32-bit array indices),\nmkdir build\ncd build\ncmake \\\n-DCMAKE_INSTALL_PREFIX=/where/you/want/to/install \\\n-DCMAKE_BUILD_TYPE:STRING=Release \\\n-DBUILD_DEPRECATED=ON -DBUILD_SHARED_LIBS=ON \\\n-DBUILD_INDEX64_EXT_API:BOOL=OFF \\\n-DBUILD_SINGLE:BOOL=OFF -DBUILD_COMPLEX:BOOL=OFF \\\n-DLAPACKE=OFF -DCBLAS=OFF \\\n-S ..\nmake -j10\nThis builds the reference BLAS and the reference LAPACK linked to it.\nNote that cmake files do not provide an uninstall target, but build/install_manifest.txt is a list of the files installed, so you can remove them via shell commands or from R.\nIf using --with-lapack to get a generic LAPACK (or allowing the default to select one), consider also using --with-blas (with a path if an enhanced BLAS is installed).\n\n\nA.3.7 Caveats\nAs with all libraries, you need to ensure that they and R were compiled with compatible compilers and flags. For example, this has meant that on Sun Sparc using the Oracle compilers the flag -dalign is needed if sunperf is to be used.\nOn some systems it has been necessary that an external BLAS/LAPACK was built with the same Fortran compiler used to build R.\nBLAS and LAPACK libraries built with recent versions of gfortran require calls from C/C++ to handle ‘hidden’ character lengths — R itself does so but many packages used not to and some have segfaulted. This was largely circumvented by using the Fortran flag -fno-optimize-sibling-calls (formerly set by configure if it detected gfortran 7 or later): however use of the R headers which include those character-length arguments is no longer optional in packages.\nLAPACK 3.9.0 (and probably earlier) had a bug in which the DCOMBSSQ subroutine may cause NA to be interpreted as zero. This is fixed in the R 3.6.3 and later sources, but if you use an external LAPACK, you may need to fix it there. (The bug was corrected in 3.9.1 and the routine removed in 3.10.1.)\nThe code (in dlapack.f) should read\n* ..\n* .. Executable Statements ..\n*\n IF( V1( 1 ).GE.V2( 1 ) ) THEN\n IF( V1( 1 ).NE.ZERO ) THEN\n V1( 2 ) = V1( 2 ) + ( V2( 1 ) / V1( 1 ) )**2 * V2( 2 )\n ELSE\n V1( 2 ) = V1( 2 ) + V2( 2 )\n END IF\n ELSE\n V1( 2 ) = V2( 2 ) + ( V1( 1 ) / V2( 1 ) )**2 * V1( 2 )\n V1( 1 ) = V2( 1 )\n END IF\n RETURN\n(The inner ELSE clause was missing in LAPACK 3.9.0.)\nIf you do use an external LAPACK, be aware of potential problems with other bugs in the LAPACK sources (or in the posted corrections to those sources), seen several times in Linux distributions over the years. We have even seen distributions with missing LAPACK routines from their liblapack.\nWe rely on limited support in LAPACK for matrices with 2^{31} or more elements: it is possible that an external LAPACK will not have that support.\nFootnotes"
},
{
"objectID": "Configuration-on-a-Unix-alike.html#configuration-options",
@@ -305,7 +305,7 @@
"href": "Configuration-on-a-Unix-alike.html#maintainer-mode",
"title": "Appendix B — Configuration on a Unix-alike",
"section": "B.8 Maintainer mode",
- "text": "B.8 Maintainer mode\nThere are several files that are part of the R sources but can be re-generated from their own sources by configuring with option --enable-maintainer-mode and then running make in the build directory. This requires other tools to be installed, discussed in the rest of this section.\nFile configure is created from configure.ac and the files under m4 by autoconf and aclocal (part of the automake package). There is a formal version requirement on autoconf of 2.69 or later, but it is unlikely that anything other than the most recent versions2 have been thoroughly tested.2 at the time of revision of this para in late 2021, autoconf-2.71 and automake-1.16.5.\nFile src/include/config.h is created by autoheader (part of autoconf).\nGrammar files *.y are converted to C sources by an implementation of yacc, usually bison -y: these are found in src/main and src/library/tools/src. It is known that earlier versions of bison generate code which reads (and in some cases writes) outside array bounds: bison 2.6.1 was found to be satisfactory.\nThe ultimate sources for package compiler are in its noweb directory. To re-create the sources from src/library/compiler/noweb/compiler.nw, the command notangle is required. Some Linux distributions include this command in package noweb. It can also be installed from the sources at https://www.cs.tufts.edu/~nr/noweb/3. The package sources are only re-created even in maintainer mode if src/library/compiler/noweb/compiler.nw has been updated.3 The links there have proved difficult to access, in which case grab the copy made available at https://developer.r-project.org/noweb-2.11b.tgz.\nFootnotes"
+ "text": "B.8 Maintainer mode\nThere are several files that are part of the R sources but can be re-generated from their own sources by configuring with option --enable-maintainer-mode and then running make in the build directory. This requires other tools to be installed, discussed in the rest of this section.\nFile configure is created from configure.ac and the files under m4 by autoconf and aclocal (part of the automake package). There is a formal version requirement on autoconf of 2.71 or later, but it is unlikely that anything other than the most recent versions2 have been thoroughly tested.2 at the time of revision of this para in late 2021, autoconf-2.71 and automake-1.16.5. Subsequently autoconf-2.72 has been tested.\nFile src/include/config.h is created by autoheader (part of autoconf).\nGrammar files *.y are converted to C sources by an implementation of yacc, usually bison -y: these are found in src/main and src/library/tools/src. It is known that earlier versions of bison generate code which reads (and in some cases writes) outside array bounds: bison 2.6.1 was found to be satisfactory.\nThe ultimate sources for package compiler are in its noweb directory. To re-create the sources from src/library/compiler/noweb/compiler.nw, the command notangle is required. Some Linux distributions include this command in package noweb. It can also be installed from the sources at https://www.cs.tufts.edu/~nr/noweb/3. The package sources are only re-created even in maintainer mode if src/library/compiler/noweb/compiler.nw has been updated.3 The links there have proved difficult to access, in which case grab the copy made available at https://developer.r-project.org/noweb-2.11b.tgz.\nFootnotes"
},
{
"objectID": "Platform-notes.html#x11-issues",
@@ -319,7 +319,7 @@
"href": "Platform-notes.html#linux",
"title": "Appendix C — Platform notes",
"section": "C.2 Linux",
- "text": "C.2 Linux\nLinux is the main development platform for R, so compilation from the sources is normally straightforward with the most common compilers and libraries.33 For example, glibc: other C libraries such as musl (as used by Alpine Linux) have been used but are not routinely tested.\nThis section is about the GCC compilers: gcc/gfortran/g++.\nRecall that some package management systems (such as RPM and deb) make a distinction between the user version of a package and the developer version. The latter usually has the same name but with the extension -devel or -dev: you need both versions installed. So please check the configure output to see if the expected features are detected: if for example readline is missing add the developer package. (On most systems you will also need ncurses and its developer package, although these should be dependencies of the readline package(s).) You should expect to see in the configure summary\n Interfaces supported: X11, tcltk\n External libraries: pcre2, readline, curl\n Additional capabilities: PNG, JPEG, TIFF, NLS, cairo, ICU\nWhen R has been installed from a binary distribution there are sometimes problems with missing components such as the Fortran compiler. Searching the R-help archives will normally reveal what is needed.\nIt seems that ix86 Linux accepts non-PIC code in shared libraries, but this is not necessarily so on other platforms, in particular on 64-bit CPUs such as x86_64. So care can be needed with BLAS libraries and when building R as a shared library to ensure that position-independent code is used in any static libraries (such as the Tcl/Tk libraries, libpng, libjpeg and zlib) which might be linked against. Fortunately these are normally built as shared libraries with the exception of the ATLAS BLAS libraries.\nThe default optimization settings chosen for CFLAGS etc are conservative. It is likely that using -mtune will result in significant performance improvements on recent CPUs: one possibility is to add -mtune=native for the best possible performance on the machine on which R is being installed. It is also possible to increase the optimization levels to -O3: however for many versions of the compilers this has caused problems in at least one CRAN package.\nDo not use -O3 with gcc 11.0 or 11.1: it mis-compiles code resulting in plausible but incorrect results. (This was seen in package MASS but has been worked around there as from version 3.1-57.)\nFor comments on ix86 builds (including 32-bit builds on x86_64) see the version of this manual for R 4.3.x.\nTo build a 64-bit version of R on ppc64 (also known as powerpc64) with gcc 4.1.1, Ei-ji Nakama used\nCC=\"gcc -m64\"\nCXX=\"gxx -m64\"\nFC=\"gfortran -m64\"\nCFLAGS=\"-mminimal-toc -fno-optimize-sibling-calls -g -O2\"\nFFLAGS=\"-mminimal-toc -fno-optimize-sibling-calls -g -O2\"\nthe additional flags being needed to resolve problems linking against libnmath.a and when linking R as a shared library.\nThe setting of the macro SAFE_FFLAGS may need some help. It should not need additional flags on platforms other than 68000 (not likely to be encountered) and ix86. For the latter, if the Fortran compiler is GNU (gfortran or possibly g77) the flags\n-msse2 -mfpmath=sse\nare added: earlier versions of R added -ffloat-store and this might still be needed if a ix86 CPU is encountered without SSE2 support. Note that it is a replacement for FFLAGS, so should include all the flags in that macro (except perhaps the optimization level).\nAdditional compilation flags can be specified for added safety/security checks. For example Fedora adds\n-Werror=format-security -Wp,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS\n-Fexceptions -fstack-protector-strong -fasynchronous-unwind-tables\n-fstack-clash-protection -fcf-protection\nto all the C, C++ and Fortran compiler flags (even though _GLIBCXX_ASSERTIONS is only for C++ in current GCC and glibc and none of these are documented for gfortran). Use of _GLIBCXX_ASSERTIONS will link abort and printf into almost all C++ code, and R CMD check --as-cran will warn.\n\nC.2.1 Clang\nR has been built with Linux ix86 and x86_64 C and C++ compilers (https://clang.llvm.org) based on the Clang front-ends, invoked by CC=clang CXX=clang++, together with gfortran. These take very similar options to the corresponding GCC compilers.\nThis has to be used in conjunction with a Fortran compiler: the configure code will remove -lgcc from FLIBS, which is needed for some versions of gfortran.\nThe current out-of-the-box default for clang++ is to use the C++ runtime from the installed g++. Using the runtime from the libc++ project (Fedora RPM libcxx-devel) via -stdlib=libc++ has also been tested.\nRecent versions have (optional when built) OpenMP support.44 This also needs the OpenMP runtime which has sometimes been distributed separately.\nThere are problems mixing clang 15.0.0 and later built as default on Linux to produce PIE code and gfortran 11 or later, which does not. One symptom is that configure does not detect FC_LEN_T, which can be overcome by setting\nFPIEFLAGS=-fPIE\nin config.site. (As from R 4.2.2 configure tries that value if it is unset.)\n\n\nC.2.2 flang\nThe name flang has been used for two projects: this is about the sub-project of LLVM which builds a Fortran compiler and runtime libraries. The compiler is currently named flang-new but has been announced to be renamed to flang when more nearly complete (and at some earlier point in its development was known as f18).\nThe version in LLVM 16 was able to build R on x86_64 Linux with\nFC=/path/to/flang-new\nwith the matching clang used as the C compiler, and the build passed make check-all. There is also support for aarch64 and ppc64le Linux, but these have not been tested with R.\n\n\nC.2.3 Intel compilers\nIn late 2020 Intel revamped their C/C++ compilers (and later their Fortran compiler) to use an LLVM back-end (and for the C/C++ compilers, a modified version of clang as the front-end). Those compilers are only for x86_64: the earlier (now called ‘Classic’) C/C++ compilers were discontinued in late 2023 (and are covered in the version of this manual for R 4.3.x: the Fortran compiler ifort remains part of the Fortran distribution)..\nThe compilers are now all under Intel’s ‘oneAPI’ brand. The revamped ones are icx, icpx and ifx; they are identified by the C/C++ macro __INTEL_LLVM_COMPILER (and do not define __INTEL_COMPILER: they also define __clang__ and __clang_major__).\nThe C++ compiler uses the system’s lidstdc++ as its runtime library rather than LLVM’s libc++.\nStandalone installers (which are free-of-charge) are available from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html: they are also part of the oneAPI Base and HPC (for Fortran) ToolKits.\nWe tried the compilers in oneAPI 2024.0.0 and 2023.x.y using (the paths do differ by compiler version)\nIP=/path/to/compilers/bin/\nCC=$IP/icx\nCXX=$IP/icpx\nFC=$IP/ifx\nCFLAGS=\"-O3 -fp-model precise -Wall -Wstrict-prototypes\"\nC17FLAGS=\"-O3 -fp-model precise -Wall -Wno-strict-prototypes\"\nFFLAGS=\"-O3 -fp-model precise -warn all,noexternals\"\nFCFLAGS=\"-free -O3 -fp-model precise -warn all,noexternals\"\nCXXFLAGS=\"-O3 -fp-model precise -Wall\"\nLDFLAGS=\"-L/path/to/compilers/compiler/lib -L/usr/local/lib64\"\nbut the build segfaulted in the checks (in complex arithmetic in tests/lapack.R).\nIntel document building R with MKL: for the Intel compilers this needed something like\nMKL_LIB_PATH=/path/to/intel_mkl/mkl/lib/intel64\nexport LD_LIBRARY_PATH=\"$MKL_LIB_PATH\"\nMKL=\"-L${MKL_LIB_PATH} -lmkl_intel_lp64 -lmkl_core -lmkl_sequential\"\n./configure --with-blas=\"$MKL\" --with-lapack\nand the build passed its checks with MKL 2023.2.0 (but not 2024.0 on the hardware tested). It may also be possible to use a compiler option like -qmkl=sequential.\nOne quirk is that the Intel Fortran compilers do not accept .f95 files, only .f90, for free-format Fortran. configure adds -Tf which tells the compiler this is indeed a Fortran file (and needs to immediately precede the file name), but -free is needed to say it is free-format. Hence setting the FCFLAGS macro.\nThe compilers have many options: as the C/C++ and Fortran compilers have different origins for their front-ends, there is little consistency in their options. (The C/C++ compilers support ‘all’ clang options even if undocumented for icx/icpc, such as -Wno-strict-prototypes above, However it is unclear for which version of clang: the Intel manual suggests checking icx -help.) The C/C++ compilers support clang-style LTO: it is not clear if the Fortran one does.\nFor some versions, including 2023.2.0, all CPU times in e.g. proc.time() are reported as zero. If you see this, uncomment the INTEL_ICX_FIX setting in config.site and re-build.\nThe preferred Fortran standard for ifx can be set by one of -std90, -std95, -std03, -std08 or -std18 (and variants). However, this is documented to only affect warnings on non-standard features: the default is no such warnings.\nWarning to package maintainers: the Intel Fortran compiler interprets comments intended for Visual Fortran5 like5 as the ‘Classic’ compiler has been known on Windows.\n!DEC$ ATTRIBUTES DLLEXPORT,C,REFERENCE,ALIAS:'kdenestmlcvb' :: kdenestmlcvb\nThe DLLEXPORT gives a warning but the remainder silently generates incorrectly named entry points. Such comment lines need to be removed from code for use with R (even if using Intel Fortran on Windows)."
+ "text": "C.2 Linux\nLinux is the main development platform for R, so compilation from the sources is normally straightforward with the most common compilers and libraries.33 For example, glibc: other C libraries such as musl (as used by Alpine Linux) have been used but are not routinely tested.\nThis section is about the GCC compilers: gcc/gfortran/g++.\nRecall that some package management systems (such as RPM and deb) make a distinction between the user version of a package and the developer version. The latter usually has the same name but with the extension -devel or -dev: you need both versions installed. So please check the configure output to see if the expected features are detected: if for example readline is missing add the developer package. (On most systems you will also need ncurses and its developer package, although these should be dependencies of the readline package(s).) You should expect to see in the configure summary\n Interfaces supported: X11, tcltk\n External libraries: pcre2, readline, curl\n Additional capabilities: PNG, JPEG, TIFF, NLS, cairo, ICU\nWhen R has been installed from a binary distribution there are sometimes problems with missing components such as the Fortran compiler. Searching the R-help archives will normally reveal what is needed.\nIt seems that ix86 Linux accepts non-PIC code in shared libraries, but this is not necessarily so on other platforms, in particular on 64-bit CPUs such as x86_64. So care can be needed with BLAS libraries and when building R as a shared library to ensure that position-independent code is used in any static libraries (such as the Tcl/Tk libraries, libpng, libjpeg and zlib) which might be linked against. Fortunately these are normally built as shared libraries with the exception of the ATLAS BLAS libraries.\nThe default optimization settings chosen for CFLAGS etc are conservative. It is likely that using -mtune will result in significant performance improvements on recent CPUs: one possibility is to add -mtune=native for the best possible performance on the machine on which R is being installed. It is also possible to increase the optimization levels to -O3: however for many versions of the compilers this has caused problems in at least one CRAN package.\nDo not use -O3 with gcc 11.0 or 11.1: it mis-compiles code resulting in plausible but incorrect results. (This was seen in package MASS but has been worked around there as from version 3.1-57.)\nFor comments on ix86 builds (including 32-bit builds on x86_64) see the version of this manual for R 4.3.x.\nTo build a 64-bit version of R on ppc64 (also known as powerpc64) with gcc 4.1.1, Ei-ji Nakama used\nCC=\"gcc -m64\"\nCXX=\"gxx -m64\"\nFC=\"gfortran -m64\"\nCFLAGS=\"-mminimal-toc -fno-optimize-sibling-calls -g -O2\"\nFFLAGS=\"-mminimal-toc -fno-optimize-sibling-calls -g -O2\"\nthe additional flags being needed to resolve problems linking against libnmath.a and when linking R as a shared library.\nThe setting of the macro SAFE_FFLAGS may need some help. It should not need additional flags on platforms other than 68000 (not likely to be encountered) and ix86. For the latter, if the Fortran compiler is GNU (gfortran or possibly g77) the flags\n-msse2 -mfpmath=sse\nare added: earlier versions of R added -ffloat-store and this might still be needed if a ix86 CPU is encountered without SSE2 support. Note that it is a replacement for FFLAGS, so should include all the flags in that macro (except perhaps the optimization level).\nAdditional compilation flags can be specified for added safety/security checks. For example Fedora adds\n-Werror=format-security -Wp,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS\n-Fexceptions -fstack-protector-strong -fasynchronous-unwind-tables\n-fstack-clash-protection -fcf-protection\nto all the C, C++ and Fortran compiler flags (even though _GLIBCXX_ASSERTIONS is only for C++ in current GCC and glibc and none of these are documented for gfortran). Use of _GLIBCXX_ASSERTIONS will link abort and printf into almost all C++ code, and R CMD check --as-cran will warn.\n\nC.2.1 Clang\nR has been built with Linux ix86 and x86_64 C and C++ compilers (https://clang.llvm.org) based on the Clang front-ends, invoked by CC=clang CXX=clang++, together with gfortran. These take very similar options to the corresponding GCC compilers.\nThis has to be used in conjunction with a Fortran compiler: the configure code will remove -lgcc from FLIBS, which is needed for some versions of gfortran.\nThe current out-of-the-box default for clang++ is to use the C++ runtime from the installed g++. Using the runtime from the libc++ project (Fedora RPM libcxx-devel) via -stdlib=libc++ has also been tested.\nRecent versions have (optional when built) OpenMP support.44 This also needs the OpenMP runtime which has sometimes been distributed separately.\nThere are problems mixing clang 15.0.0 and later built as default on Linux to produce PIE code and gfortran 11 or later, which does not. One symptom is that configure does not detect FC_LEN_T, which can be overcome by setting\nFPIEFLAGS=-fPIE\nin config.site. (As from R 4.2.2 configure tries that value if it is unset.)\n\n\nC.2.2 flang\nThe name flang has been used for two projects: this is about the sub-project of LLVM which builds a Fortran compiler and runtime libraries. The compiler is currently named flang-new but has been announced to be renamed to flang when more nearly complete (and at some earlier point in its development was known as f18).\nThe version in LLVM 16 was able to build R on x86_64 Linux with\nFC=/path/to/flang-new\nwith the matching clang used as the C compiler, and the build passed make check-all. There is also support for aarch64 and ppc64le Linux, but these have not been tested with R.\n\n\nC.2.3 Intel compilers\nIn late 2020 Intel revamped their C/C++ compilers (and later their Fortran compiler) to use an LLVM back-end (and for the C/C++ compilers, a modified version of clang as the front-end). Those compilers are only for x86_64: the earlier (now called ‘Classic’) C/C++ compilers were discontinued in late 2023 (and are covered in the version of this manual for R 4.3.x: the Fortran compiler ifort remains part of the Fortran distribution)..\nThe compilers are now all under Intel’s ‘oneAPI’ brand. The revamped ones are icx, icpx and ifx; they are identified by the C/C++ macro __INTEL_LLVM_COMPILER (and do not define __INTEL_COMPILER: they also define __clang__ and __clang_major__).\nThe C++ compiler uses the system’s lidstdc++ as its runtime library rather than LLVM’s libc++.\nStandalone installers (which are free-of-charge) are available from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html: they are also part of the oneAPI Base and HPC (for Fortran) ToolKits.\nWe tried the compilers in oneAPI 2024.0.2 and 2023.x.y using (the paths do differ by compiler version)\nIP=/path/to/compilers/bin/\nCC=$IP/icx\nCXX=$IP/icpx\nFC=$IP/ifx\nCFLAGS=\"-O3 -fp-model precise -Wall -Wstrict-prototypes\"\nC17FLAGS=\"-O3 -fp-model precise -Wall -Wno-strict-prototypes\"\nFFLAGS=\"-O3 -fp-model precise -warn all,noexternals\"\nFCFLAGS=\"-free -O3 -fp-model precise -warn all,noexternals\"\nCXXFLAGS=\"-O3 -fp-model precise -Wall\"\nLDFLAGS=\"-L/path/to/compilers/compiler/lib -L/usr/local/lib64\"\nbut the build segfaulted in the checks (in complex arithmetic in tests/lapack.R).\nIntel document building R with MKL: for the Intel compilers this needed something like\nMKL_LIB_PATH=/path/to/intel_mkl/mkl/lib/intel64\nexport LD_LIBRARY_PATH=\"$MKL_LIB_PATH\"\nMKL=\"-L${MKL_LIB_PATH} -lmkl_intel_lp64 -lmkl_core -lmkl_sequential\"\n./configure --with-blas=\"$MKL\" --with-lapack\nand the build passed its checks with MKL 2023.2.0 (but not 2024.0 on the hardware tested). It may also be possible to use a compiler option like -qmkl=sequential.\nOne quirk is that the Intel Fortran compilers do not accept .f95 files, only .f90, for free-format Fortran. configure adds -Tf which tells the compiler this is indeed a Fortran file (and needs to immediately precede the file name), but -free is needed to say it is free-format. Hence setting the FCFLAGS macro.\nThe compilers have many options: as the C/C++ and Fortran compilers have different origins for their front-ends, there is little consistency in their options. (The C/C++ compilers support ‘all’ clang options even if undocumented for icx/icpc, such as -Wno-strict-prototypes above, However it is unclear for which version of clang: the Intel manual suggests checking icx -help.) The C/C++ compilers support clang-style LTO: it is not clear if the Fortran one does.\nFor some versions, including 2023.2.0, all CPU times in e.g. proc.time() are reported as zero. If you see this, uncomment the INTEL_ICX_FIX setting in config.site and re-build.\nThe preferred Fortran standard for ifx can be set by one of -std90, -std95, -std03, -std08 or -std18 (and variants). However, this is documented to only affect warnings on non-standard features: the default is no such warnings.\nWarning to package maintainers: the Intel Fortran compiler interprets comments intended for Visual Fortran5 like5 as the ‘Classic’ compiler has been known on Windows.\n!DEC$ ATTRIBUTES DLLEXPORT,C,REFERENCE,ALIAS:'kdenestmlcvb' :: kdenestmlcvb\nThe DLLEXPORT gives a warning but the remainder silently generates incorrectly named entry points. Such comment lines need to be removed from code for use with R (even if using Intel Fortran on Windows)."
},
{
"objectID": "Platform-notes.html#macos",
diff --git a/r-exts/Creating-R-packages.html b/r-exts/Creating-R-packages.html
index 609ac34..efaaea0 100644
--- a/r-exts/Creating-R-packages.html
+++ b/r-exts/Creating-R-packages.html
@@ -287,7 +287,7 @@ Table of contents
1.2.3 Using pthreads
1.2.4 Compiling in sub-directories
1.2.5 Configure example
- 1.2.6 Using F9x code
+ 1.2.6 Using modern Fortran code
1.2.7 Using C++ code
1.2.8 C standards
1.2.9 Using cmake
@@ -570,7 +570,7 @@ User-defined macros.) These use the Rd format, but may not contain anything but macro definitions, comments and whitespace.
The R
and man
subdirectories may contain OS-specific subdirectories named unix
or windows
.
-The sources and headers for the compiled code are in src
, plus optionally a file Makevars
or Makefile
(or for use on Windows, with extension .win
or .ucrt
). When a package is installed using R CMD INSTALL
, make
is used to control compilation and linking into a shared object for loading into R. There are default make
variables and rules for this (determined when R is configured and recorded in R_HOME/etcR_ARCH/Makeconf
), providing support for C, C++, fixed- or free-form Fortran, Objective C and Objective C++18 with associated extensions .c
, .cc
or .cpp
, .f
, .f90
or .f95
,19 .m
, and .mm
, respectively. We recommend using .h
for headers, also for C++20 or Fortran 9x include files. (Use of extension .C
for C++ is no longer supported.) Files in the src
directory should not be hidden (start with a dot), and hidden files will under some versions of R be ignored.
18 either or both of which may not be supported on particular platforms. Their main use is on macOS, but unfortunately recent versions of the macOS SDK have removed much of the support for Objective C v1.0 and Objective C++.
19 This is not accepted by the Intel Fortran compiler.
20 Using .hpp
is not guaranteed to be portable.
+The sources and headers for the compiled code are in src
, plus optionally a file Makevars
or Makefile
(or for use on Windows, with extension .win
or .ucrt
). When a package is installed using R CMD INSTALL
, make
is used to control compilation and linking into a shared object for loading into R. There are default make
variables and rules for this (determined when R is configured and recorded in R_HOME/etcR_ARCH/Makeconf
), providing support for C, C++, fixed- or free-form Fortran, Objective C and Objective C++18 with associated extensions .c
, .cc
or .cpp
, .f
, .f90
or .f95
,19 .m
, and .mm
, respectively. We recommend using .h
for headers, also for C++20 or Fortran include files. (Use of extension .C
for C++ is no longer supported.) Files in the src
directory should not be hidden (start with a dot), and hidden files will under some versions of R be ignored.
18 either or both of which may not be supported on particular platforms. Their main use is on macOS, but unfortunately recent versions of the macOS SDK have removed much of the support for Objective C v1.0 and Objective C++.
19 This is not accepted by the Intel Fortran compiler.
20 Using .hpp
is not guaranteed to be portable.
It is not portable (and may not be possible at all) to mix all these languages in a single package. Because R itself uses it, we know that C and fixed-form Fortran can be used together, and mixing C, C++ and Fortran usually work for the platform’s native compilers.
If your code needs to depend on the platform there are certain defines which can used in C or C++. On all Windows builds (even 64-bit ones) _WIN32
will be defined: on 64-bit Windows builds also _WIN64
. On macOS __APPLE__
is defined21; for an ‘Apple Silicon’ platform, test for both __APPLE__
and __arm64__
.
21 There is also __APPLE_CC__
, but that indicates a compiler with Apple-specific features not the OS, although for historical reasons is is defined by LLVM clang
. It is used in Rinlinedfuns.h
.
The default rules can be tweaked by setting macros22 in a file src/Makevars
(see Using Makevars
). Note that this mechanism should be general enough to eliminate the need for a package-specific src/Makefile
. If such a file is to be distributed, considerable care is needed to make it general enough to work on all R platforms. If it has any targets at all, it should have an appropriate first target named all
and a (possibly empty) target clean
which removes all files generated by running make
(to be used by R CMD INSTALL --clean
and R CMD INSTALL --preclean
). There are platform-specific file names on Windows: src/Makevars.win
takes precedence over src/Makevars
and src/Makefile.win
must be used. Since R 4.2.0, src/Makevars.ucrt
takes precedence over src/Makevars.win
and src/Makefile.ucrt
takes precedence over src/Makefile.win
. src/Makevars.ucrt
and src/Makefile.ucrt
will be ignored by earlier versions of R, and hence can be used to provide content specific to UCRT or Rtools42 and newer, but the support for .ucrt
files may be removed in the future when building packages from source on the older versions of R will no longer be needed, and hence the files may be renamed back to .win
. Some make
programs require makefiles to have a complete final line, including a newline.
22 the POSIX terminology, called ‘make variables’ by GNU make.
@@ -1085,32 +1085,33 @@
RODBC
or by setting the environment variables ODBC_INCLUDE
and ODBC_LIBS
.
R assumes that source files with extension .f
are fixed-form Fortran 90 (which includes Fortran 77), and passes them to the compiler specified by macro FC
. The Fortran compiler will also accept free-form Fortran 90/95 code with extension .f90
or .f95
.
R assumes that source files with extension .f
are fixed-form Fortran 90 (which includes Fortran 77), and passes them to the compiler specified by macro FC
. The Fortran compiler will also accept free-form Fortran 90/95 code with extension .f90
or (most46) .f95
.
46 Intel compilers do not by default but this is worked around when using packages without a src/Makefile
.
The same compiler is used for both fixed-form and free-form Fortran code (with different file extensions and possibly different flags). Macro PKG_FFLAGS
can be used for package-specific flags: for the un-encountered case that both are included in a single package and that different flags are needed for the two forms, macro PKG_FCFLAGS
is also available for free-form Fortran.
The code used to build R allows a ‘Fortran 90’ compiler to be selected as FC
, so platforms might be encountered which only support Fortran 90. However, Fortran 95 is supported on all known platforms.
Most compilers specified by FC
will accept Fortran 2003, 2008 or 2018 code: such code should still use file extension .f90
or .f95
. Almost all current platforms use gfortran
where you may need to include -std=f2003
, -std=f2008
or (from version 8) -std=f2018
in PKG_FFLAGS
or PKG_FCFLAGS
: the default is ‘GNU Fortran’, currently Fortran 2018 (but Fortran 95 prior to gfortran
8) with non-standard extensions. Intel Fortran had full Fortran 2008 support from version 17.0, and some 2018 support in version 16.0 and more in version 19.0. It is good practice to describe the requirement in DESCRIPTION
s SystemRequirements
field.
Most compilers specified by FC
will accept Fortran 2003, 2008 or 2018 code: such code should still use file extension .f90
. Most current platforms use gfortran
where you might need to include -std=f2003
, -std=f2008
or (from version 8) -std=f2018
in PKG_FFLAGS
or PKG_FCFLAGS
: the default is ‘GNU Fortran’, currently Fortran 2018 (but Fortran 95 prior to gfortran
8) with non-standard extensions. The other compilers in current use (LLVM’s flang-new
and Intel’s ifx
) default to Fortran 2018.
It is good practice to describe a Fortran version requirement in DESCRIPTION
s SystemRequirements
field.
Modern versions of Fortran support modules, whereby compiling one source file creates a module file which is then included in others. (Module files typically have a .mod
extension: they do depend on the compiler used and so should never be included in a package.) This creates a dependence which make
will not know about and often causes installation with a parallel make to fail. Thus it is necessary to add explicit dependencies to src/Makevars
to tell make
the constraints on the order of compilation. For example, if file iface.f90
creates a module iface
used by files cmi.f90
and dmi.f90
then src/Makevars
needs to contain something like
cmi.o dmi.o: iface.o
Note that it is not portable (although some platforms do accept it) to define a module of the same name in multiple source files.
R can be built without a C++ compiler although one is available (but not necessarily installed) on all known R platforms. As from R 4.0.0 a C++ compiler will be selected only if it conforms to the 2011 standard (‘C++11’). A minor update46 (‘C++14’) was published in December 2014 and was used by default as from R 4.1.0 if supported. Further revisions ‘C++17’ (in December 2017) and ‘C++20’ (with many new features in December 2020) have been published since. The next revision, ‘C++23’, is expected in 2023 and several compilers already have extensive partial support for the current drafts.
46 Some changes are linked from https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations: there were also additional deprecations.
R can be built without a C++ compiler although one is available (but not necessarily installed) on all known R platforms. As from R 4.0.0 a C++ compiler will be selected only if it conforms to the 2011 standard (‘C++11’). A minor update47 (‘C++14’) was published in December 2014 and was used by default as from R 4.1.0 if supported. Further revisions ‘C++17’ (in December 2017) and ‘C++20’ (with many new features in December 2020) have been published since. The next revision, ‘C++23’, is expected in 2023/4 and several compilers already have extensive partial support for the current drafts.
47 Some changes are linked from https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations: there were also additional deprecations.
The default standard for compiling R packages was changed to C++17 in R 4.3.0 if supported (and for rather old compilers, C++14 or even C++11 would be used as the default).
-What standard a C++ compiler aims to support can be hard to determine: the value47 of __cplusplus
may help but some compilers use it to denote a standard which is partially supported and some the latest standard which is (almost) fully supported. On a Unix-alike configure
will try to identify a compiler and flags for each of the standards: this relies heavily on the reported values of __cplusplus
.
47 Values 201103L
, 201402L
, 201703L
and 202002L
are most commonly used for C++11, C++14, C++17 and C++20 respectively, but some compilers set 1L
. For C++23 all that can currently be assumed is a value greater than that for C++20: for example g++
12 uses 202100L
and clang++
(LLVM 15, Apple 14) uses 202101L
.
What standard a C++ compiler aims to support can be hard to determine: the value48 of __cplusplus
may help but some compilers use it to denote a standard which is partially supported and some the latest standard which is (almost) fully supported. On a Unix-alike configure
will try to identify a compiler and flags for each of the standards: this relies heavily on the reported values of __cplusplus
.
48 Values 201103L
, 201402L
, 201703L
and 202002L
are most commonly used for C++11, C++14, C++17 and C++20 respectively, but some compilers set 1L
. For C++23 all that can currently be assumed is a value greater than that for C++20: for example g++
12 uses 202100L
and clang++
(LLVM 15, Apple 14) uses 202101L
.
The webpage https://en.cppreference.com/w/cpp/compiler_support gives some information on which compilers are known to support recent C++ features.
C++ standards have deprecated and later removed features. Be aware that some current compilers still accept removed features in C++17 mode, such as std::unary_function
(deprecated in C++11, removed in C++17).
Different versions of R have used different default C++ standards, so for maximal portability a package should specify the standard it requires. In order to specify C++14 code in a package with a Makevars
file (or Makevars.win
or Makevars.ucrt
on Windows) should include the line
CXX_STD = CXX14
Compilation and linking will then be done with the C++14 compiler (if any). Analogously for other standards (details below). On the other hand, specifying C++1148 when the code is valid under C++14 or C++17 reduces future portability.
48 Often historically used to mean ‘not C++98’
Compilation and linking will then be done with the C++14 compiler (if any). Analogously for other standards (details below). On the other hand, specifying C++1149 when the code is valid under C++14 or C++17 reduces future portability.
49 Often historically used to mean ‘not C++98’
Packages without a src/Makevars
or src/Makefile
file may specify a C++ standard for code in the src
directory by including something like C++14
in the SystemRequirements
field of the DESCRIPTION
file, e.g.
SystemRequirements: C++14
If a package does have a src/Makevars[.win]
file then also setting the make variable CXX_STD
there is recommended, as it allows R CMD SHLIB
to work correctly in the package’s src
directory.
A requirement of C++17 or later should always be declared in the SystemRequirements
field (as well as in src/Makevars
or src/Makefile
) so this is shown on the package’s summary pages on CRAN or similar. This is also good practice for a requirement of C++14. Note that support of C++14 or C++17 is only available from R 3.4.0, so if the package has an R version requirement it needs to take that into account.
Essentially complete C++14 support is available from GCC 5, LLVM clang
3.4 and currently-used versions of Apple clang
(10.0.0 for High Sierra).
Code needing C++14 features can check for their presence via ‘SD-6 feature tests’49. Such a check could be
49 See https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations or https://en.cppreference.com/w/cpp/experimental/feature_test. It seems a reasonable assumption that any compiler promising some C++14 conformance will provide these—e.g. g++
4.9.x did but 4.8.5 did not.
Essentially complete C++14 support is available from GCC 5, LLVM clang
3.4 and currently-used versions of Apple clang
.
Code needing C++14 features can check for their presence via ‘SD-6 feature tests’50. Such a check could be
50 See https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations or https://en.cppreference.com/w/cpp/experimental/feature_test. It seems a reasonable assumption that any compiler promising some C++14 conformance will provide these—e.g. g++
4.9.x did but 4.8.5 did not.
#include <memory> // header where this is defined
#if defined(__cpp_lib_make_unique) && (__cpp_lib_make_unique >= 201304)
::make_unique;
@@ -1160,7 +1161,7 @@ using std
1.2.8 C standards
@@ -1247,7 +1248,7 @@
-Note: R CMD check
and R CMD build
run R processes with --vanilla
in which none of the user’s startup files are read. If you need R_LIBS
set (to find packages in a non-standard library) you can set it in the environment: also you can use the check and build environment files (as specified by the environment variables R_CHECK_ENVIRON
and R_BUILD_ENVIRON
; if unset, files50 ~/.R/check.Renviron
and ~/.R/build.Renviron
are used) to set environment variables when using these utilities.
50 On systems which use sub-architectures, architecture-specific versions such as ~/.R/check.Renviron.x64
take precedence.
+Note: R CMD check
and R CMD build
run R processes with --vanilla
in which none of the user’s startup files are read. If you need R_LIBS
set (to find packages in a non-standard library) you can set it in the environment: also you can use the check and build environment files (as specified by the environment variables R_CHECK_ENVIRON
and R_BUILD_ENVIRON
; if unset, files51 ~/.R/check.Renviron
and ~/.R/build.Renviron
are used) to set environment variables when using these utilities.
51 On systems which use sub-architectures, architecture-specific versions such as ~/.R/check.Renviron.x64
take precedence.
Note to Windows users: R CMD build
may make use of the Windows toolset (see the “R Installation and Administration” manual) if present and in your path, and it is required for packages which need it to install (including those with configure.win
, cleanup.win
, configure.ucrt
or cleanup.ucrt
scripts or a src
directory) and e.g. need vignettes built.
@@ -1262,7 +1263,7 @@ 51. (There may be rare false positives.)
+
The files are checked for binary executables, using a suitable version of file
if available52. (There may be rare false positives.)
The DESCRIPTION
file is checked for completeness, and some of its entries for correctness. Unless installation tests are skipped, checking is aborted if the package dependencies cannot be resolved at run time. (You may need to set R_LIBS
in the environment if dependent packages are in a separate library tree.) One check is that the package name is not that of a standard package, nor one of the defunct standard packages (ctest
, eda
, lqs
, mle
, modreg
, mva
, nls
, stepfun
and ts
). Another check is that all packages mentioned in library
or require
s or from which the NAMESPACE
file imports or are called via ::
or :::
are listed (in Depends
, Imports
, Suggests
): this is not an exhaustive check of the actual imports.
Available index information (in particular, for demos and vignettes) is checked for completeness.
The package subdirectories are checked for suitable file names and for not being empty. The checks on file names are controlled by the option --check-subdirs=value
. This defaults to default
, which runs the checks only if checking a tarball: the default can be overridden by specifying the value as yes
or no
. Further, the check on the src
directory is only run if the package does not contain a configure
script (which corresponds to the value yes-maybe
) and there is no src/Makefile
or src/Makefile.in
.
@@ -1276,19 +1277,19 @@ 52 are tested for portable (LF-only) line endings. If there is a Makefile
or Makefile.in
or Makevars
or Makevars.in
file under the src
directory, it is checked for portable line endings and the correct use of $(BLAS_LIBS)
and $(LAPACK_LIBS)
-
Compiled code is checked for symbols corresponding to functions which might terminate R or write to stdout
/stderr
instead of the console. Note that the latter might give false positives in that the symbols might be pulled in with external libraries and could never be called. Windows53 users should note that the Fortran and C++ runtime libraries are examples of such external libraries.
+C, C++ and Fortran source and header files53 are tested for portable (LF-only) line endings. If there is a Makefile
or Makefile.in
or Makevars
or Makevars.in
file under the src
directory, it is checked for portable line endings and the correct use of $(BLAS_LIBS)
and $(LAPACK_LIBS)
+Compiled code is checked for symbols corresponding to functions which might terminate R or write to stdout
/stderr
instead of the console. Note that the latter might give false positives in that the symbols might be pulled in with external libraries and could never be called. Windows54 users should note that the Fortran and C++ runtime libraries are examples of such external libraries.
Some checks are made of the contents of the inst/doc
directory. These always include checking for files that look like leftovers, and if suitable tools (such as qpdf
) are available, checking that the PDF documentation is of minimal size.
The examples provided by the package’s documentation are run. (see Writing R documentation files, for information on using \examples
to create executable example code.) If there is a file tests/Examples/pkg-Ex.Rout.save
, the output of running the examples is compared to that file.
Of course, released packages should be able to run at least their own examples. Each example is run in a ‘clean’ environment (so earlier examples cannot be assumed to have been run), and with the variables T
and F
redefined to generate an error unless they are set in the example: See section ‘Logical vectors’ in the ‘An Introduction to R’ manual for more information.
If the package sources contain a tests
directory then the tests specified in that directory are run. (Typically they will consist of a set of .R
source files and target output files .Rout.save
.) Please note that the comparison will be done in the end user’s locale, so the target output files should be ASCII if at all possible. (The command line option --test-dir=foo
may be used to specify tests in a non-standard location. For example, unusually slow tests could be placed in inst/slowTests
and then R CMD check --test-dir=inst/slowTests
would be used to run them. Other names that have been suggested are, for example, inst/testWithOracle
for tests that require Oracle to be installed, inst/randomTests
for tests which use random values and may occasionally fail by chance, etc.)
The R code in package vignettes (see Writing package vignettes) is executed, and the vignettes re-made from their sources as a check of completeness of the sources (unless there is a BuildVignettes
field in the package’s DESCRIPTION
file with a false value). If there is a target output file .Rout.save
in the vignette source directory, the output from running the code in that vignette is compared with the target output file and any differences are reported (but not recorded in the log file). (If the vignette sources are in the deprecated location inst/doc
, do mark such target output files to not be installed in .Rinstignore
.)
-If there is an error54 in executing the R code in vignette foo.ext
, a log file foo.ext.log
is created in the check directory. The vignettes are re-made in a copy of the package sources in the vign_test
subdirectory of the check directory, so for further information on errors look in directory pkgname/vign_test/vignettes
. (It is only retained if there are errors or if environment variable _R_CHECK_CLEAN_VIGN_TEST_
is set to a false value.)
+If there is an error55 in executing the R code in vignette foo.ext
, a log file foo.ext.log
is created in the check directory. The vignettes are re-made in a copy of the package sources in the vign_test
subdirectory of the check directory, so for further information on errors look in directory pkgname/vign_test/vignettes
. (It is only retained if there are errors or if environment variable _R_CHECK_CLEAN_VIGN_TEST_
is set to a false value.)
The PDF version of the package’s manual is created (to check that the Rd
files can be converted successfully). This needs LaTeX and suitable fonts and LaTeX packages to be installed. See the section ‘Making the manuals’ in the ‘R Installation and Administration’ manual’ for further details.
-Optionally (including by R CMD check --as-cran
) the HTML version of the manual is created and checked for compliance with the HTML5 standard. This requires a recent version55 of ‘HTML Tidy’, either on the path or at a location specified by environment variable R_TIDYCMD
. Up-to-date versions can be installed from http://binaries.html-tidy.org/.
+Optionally (including by R CMD check --as-cran
) the HTML version of the manual is created and checked for compliance with the HTML5 standard. This requires a recent version56 of ‘HTML Tidy’, either on the path or at a location specified by environment variable R_TIDYCMD
. Up-to-date versions can be installed from http://binaries.html-tidy.org/.
-51 A suitable file.exe
is part of the Windows toolset: it checks for gfile
if a suitable file
is not found: the latter is available in the OpenCSW collection for Solaris at https://www.opencsw.org/. The source repository is http://ftp.astron.com/pub/file/.
52 An exception is made for subdirectories with names starting win
or Win
.
53 on most other platforms such runtime libraries are dynamic, but static libraries are currently used on Windows because the toolchain is not a standard part of the OS.
54 or if option --use-valgrind
is used or environment variable _R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_
is set to a true value or if there are differences from a target output file
55 for the most comprehensive checking this should be 5.8.0 or later: any for which tidy --version
does not report a version number will be too old – this includes the 2006 version shipped with macOS.
All these tests are run with collation set to the C
locale, and for the examples and tests with environment variable LANGUAGE=en
: this is to minimize differences between platforms.
-Use R CMD check --help to obtain more information about the usage of the R package checker. A subset of the checking steps can be selected by adding command-line options. It also allows customization by setting environment variables _R_CHECK_*_
as described in section ‘Tools’ in the ‘R Internals’ manual: a set of these customizations similar to those used by CRAN can be selected by the option --as-cran
(which works best if Internet access is available). Some Windows users may need to set environment variable R_WIN_NO_JUNCTIONS
to a non-empty value. The test of cyclic declarations56in DESCRIPTION
files needs repositories (including CRAN) set: do this in ~/.Rprofile
, by e.g.
+52 A suitable file.exe
is part of the Windows toolset: it checks for gfile
if a suitable file
is not found: the latter is available in the OpenCSW collection for Solaris at https://www.opencsw.org/. The source repository is http://ftp.astron.com/pub/file/.
53 An exception is made for subdirectories with names starting win
or Win
.
54 on most other platforms such runtime libraries are dynamic, but static libraries are currently used on Windows because the toolchain is not a standard part of the OS.
55 or if option --use-valgrind
is used or environment variable _R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_
is set to a true value or if there are differences from a target output file
56 for the most comprehensive checking this should be 5.8.0 or later: any for which tidy --version
does not report a version number will be too old – this includes the 2006 version shipped with macOS.
All these tests are run with collation set to the C
locale, and for the examples and tests with environment variable LANGUAGE=en
: this is to minimize differences between platforms.
+Use R CMD check --help to obtain more information about the usage of the R package checker. A subset of the checking steps can be selected by adding command-line options. It also allows customization by setting environment variables _R_CHECK_*_
as described in section ‘Tools’ in the ‘R Internals’ manual: a set of these customizations similar to those used by CRAN can be selected by the option --as-cran
(which works best if Internet access is available). Some Windows users may need to set environment variable R_WIN_NO_JUNCTIONS
to a non-empty value. The test of cyclic declarations57in DESCRIPTION
files needs repositories (including CRAN) set: do this in ~/.Rprofile
, by e.g.
options(repos = c(CRAN="https://cran.r-project.org"))
One check customization which can be revealing is
="suppressLocalUnused=FALSE" _R_CHECK_CODETOOLS_PROFILE_
@@ -1303,7 +1304,7 @@ To exclude files from being put into the package, one can specify a list of exclude patterns in file .Rbuildignore
in the top-level source directory. These patterns should be Perl-like regular expressions (see the help for regexp
in R for the precise details), one per line, to be matched case-insensitively against the file and directory names relative to the top-level package source directory. In addition, directories from source control systems57 or from eclipse
58, directories with names check
, chm
, or ending .Rcheck
or Old
or old
and files GNUMakefile
59, Read-and-delete-me
or with base names starting with .#
, or starting and ending with #
, or ending in ~
, .bak
or .swp
, are excluded by default60. In addition, same-package tarballs (from previous builds) and their binary forms will be excluded from the top-level directory, as well as those files in the R
, demo
and man
directories which are flagged by R CMD check
as having invalid names.
57 called CVS
or .svn
or .arch-ids
or .bzr
or .git
(but not files called .git
) or .hg
.
58 called .metadata
.
59 which is an error: GNU make uses GNUmakefile
.
60 see tools:::.hidden_file_exclusions
and tools:::get_exclude_patterns()
for further excluded files and file patterns, respectively.
To exclude files from being put into the package, one can specify a list of exclude patterns in file .Rbuildignore
in the top-level source directory. These patterns should be Perl-like regular expressions (see the help for regexp
in R for the precise details), one per line, to be matched case-insensitively against the file and directory names relative to the top-level package source directory. In addition, directories from source control systems58 or from eclipse
59, directories with names check
, chm
, or ending .Rcheck
or Old
or old
and files GNUMakefile
60, Read-and-delete-me
or with base names starting with .#
, or starting and ending with #
, or ending in ~
, .bak
or .swp
, are excluded by default61. In addition, same-package tarballs (from previous builds) and their binary forms will be excluded from the top-level directory, as well as those files in the R
, demo
and man
directories which are flagged by R CMD check
as having invalid names.
58 called CVS
or .svn
or .arch-ids
or .bzr
or .git
(but not files called .git
) or .hg
.
59 called .metadata
.
60 which is an error: GNU make uses GNUmakefile
.
61 see tools:::.hidden_file_exclusions
and tools:::get_exclude_patterns()
for further excluded files and file patterns, respectively.
Use R CMD build --help to obtain more information about the usage of the R package builder.
Unless R CMD build is invoked with the --no-build-vignettes
option (or the package’s DESCRIPTION
contains BuildVignettes: no
or similar), it will attempt to (re)build the vignettes (see Writing package vignettes) in the package. To do so it installs the current package into a temporary library tree, but any dependent packages need to be installed in an available library tree (see the Note: at the top of this section).
Similarly, if the .Rd
documentation files contain any \Sexpr
macros (see Dynamic pages), the package will be temporarily installed to execute them. Post-execution binary copies of those pages containing build-time macros will be saved in build/partial.rdb
. If there are any install-time or render-time macros, a .pdf
version of the package manual will be built and installed in the build
subdirectory. (This allows CRAN or other repositories to display the manual even if they are unable to install the package.) This can be suppressed by the option --no-manual
or if package’s DESCRIPTION
contains BuildManual: no
or similar.
Package vignettes have their sources in subdirectory vignettes
of the package sources. Note that the location of the vignette sources only affects R CMD build
and R CMD check
: the tarball built by R CMD build
includes in inst/doc
the components intended to be installed.
Sweave vignette sources are normally given the file extension .Rnw
or .Rtex
, but for historical reasons extensions61 .Snw
and .Stex
are also recognized. Sweave allows the integration of LaTeX documents: see the Sweave
help page in R and the Sweave
vignette in package utils for details on the source document format.
61 and to avoid problems with case-insensitive file systems, lower-case versions of all these extensions.
Sweave vignette sources are normally given the file extension .Rnw
or .Rtex
, but for historical reasons extensions62 .Snw
and .Stex
are also recognized. Sweave allows the integration of LaTeX documents: see the Sweave
help page in R and the Sweave
vignette in package utils for details on the source document format.
62 and to avoid problems with case-insensitive file systems, lower-case versions of all these extensions.
Package vignettes are tested by R CMD check
by executing all R code chunks they contain (except those marked for non-evaluation, e.g., with option eval=FALSE
for Sweave). The R working directory for all vignette tests in R CMD check
is a copy of the vignette source directory. Make sure all files needed to run the R code in the vignette (data sets, …) are accessible by either placing them in the inst/doc
hierarchy of the source package or by using calls to system.file()
. All other files needed to re-make the vignettes (such as LaTeX style files, BibTeX input files and files for any figures not created by running the code in the vignette) must be in the vignette source directory. R CMD check
will check that vignette production has succeeded by comparing modification times of output files in inst/doc
with the source in vignettes
.
R CMD build
will automatically62 create the (PDF or HTML versions of the) vignettes in inst/doc
for distribution with the package sources. By including the vignette outputs in the package sources it is not necessary that these can be re-built at install time, i.e., the package author can use private R packages, screen snapshots and LaTeX extensions which are only available on their machine.63
62 unless inhibited by using BuildVignettes: no
in the DESCRIPTION
file.
63 provided the conditions of the package’s license are met: many, including CRAN, see the omission of source components as incompatible with an Open Source license.
R CMD build
will automatically63 create the (PDF or HTML versions of the) vignettes in inst/doc
for distribution with the package sources. By including the vignette outputs in the package sources it is not necessary that these can be re-built at install time, i.e., the package author can use private R packages, screen snapshots and LaTeX extensions which are only available on their machine.64
63 unless inhibited by using BuildVignettes: no
in the DESCRIPTION
file.
64 provided the conditions of the package’s license are met: many, including CRAN, see the omission of source components as incompatible with an Open Source license.
By default R CMD build
will run Sweave
on all Sweave vignette source files in vignettes
. If Makefile
is found in the vignette source directory, then R CMD build
will try to run make
after the Sweave
runs, otherwise texi2pdf
is run on each .tex
file produced.
The first target in the Makefile
should take care of both creation of PDF/HTML files and cleaning up afterwards (including after Sweave
), i.e., delete all files that shall not appear in the final package archive. Note that if the make
step runs R it needs to be careful to respect the environment values of R_LIBS
and R_HOME
64. Finally, if there is a Makefile
and it has a clean:
target, make clean
is run.
64 R_HOME/bin
is prepended to the PATH
so that references to R
or Rscript
in the Makefile
do make use of the currently running version of R.
The first target in the Makefile
should take care of both creation of PDF/HTML files and cleaning up afterwards (including after Sweave
), i.e., delete all files that shall not appear in the final package archive. Note that if the make
step runs R it needs to be careful to respect the environment values of R_LIBS
and R_HOME
65. Finally, if there is a Makefile
and it has a clean:
target, make clean
is run.
65 R_HOME/bin
is prepended to the PATH
so that references to R
or Rscript
in the Makefile
do make use of the currently running version of R.
All the usual caveats about including a Makefile
apply. It must be portable (no GNU extensions), use LF line endings and must work correctly with a parallel make
: too many authors have written things like
## BAD EXAMPLE
: pdf clean
@@ -1396,7 +1397,7 @@ all1.5 Package namespaces
R has a namespace management system for code in packages. This system allows the package writer to specify which variables in the package should be exported to make them available to package users, and which variables should be imported from other packages.
The namespace for a package is specified by the NAMESPACE
file in the top level package directory. This file contains namespace directives describing the imports and exports of the namespace. Additional directives register any shared objects to be loaded and any S3-style methods that are provided. Note that although the file looks like R code (and often has R-style comments) it is not processed as R code. Only very simple conditional processing of if
statements is implemented.
-Packages are loaded and attached to the search path by calling library
or require
. Only the exported variables are placed in the attached frame. Loading a package that imports variables from other packages will cause these other packages to be loaded as well (unless they have already been loaded), but they will not be placed on the search path by these implicit loads. Thus code in the package can only depend on objects in its own namespace and its imports (including the base namespace) being visible65.
65 Note that lazy-loaded datasets are not in the package’s namespace so need to be accessed via ::
, e.g. survival::survexp.us
.
+Packages are loaded and attached to the search path by calling library
or require
. Only the exported variables are placed in the attached frame. Loading a package that imports variables from other packages will cause these other packages to be loaded as well (unless they have already been loaded), but they will not be placed on the search path by these implicit loads. Thus code in the package can only depend on objects in its own namespace and its imports (including the base namespace) being visible66.
66 Note that lazy-loaded datasets are not in the package’s namespace so need to be accessed via ::
, e.g. survival::survexp.us
.
Namespaces are sealed once they are loaded. Sealing means that imports and exports cannot be changed and that internal variable bindings cannot be changed. Sealing allows a simpler implementation strategy for the namespace mechanism and allows code analysis and compilation tools to accurately identify the definition corresponding to a global variable reference in a function body.
The namespace controls the search strategy for variables used by functions in the package. If not found locally, R searches the package namespace first, then the imports, then the base namespace and then the normal search path (so the base namespace precedes the normal search rather than being at the end of it).
@@ -1438,7 +1439,7 @@
1.5.3 Load hooks
There are a number of hooks called as packages are loaded, attached, detached, and unloaded. See help(".onLoad")
for more details.
-Since loading and attaching are distinct operations, separate hooks are provided for each. These hook functions are called .onLoad
and .onAttach
. They both take arguments66 libname
and pkgname
; they should be defined in the namespace but not exported.
66 they will be called with two unnamed arguments, in that order.
+Since loading and attaching are distinct operations, separate hooks are provided for each. These hook functions are called .onLoad
and .onAttach
. They both take arguments67 libname
and pkgname
; they should be defined in the namespace but not exported.
67 they will be called with two unnamed arguments, in that order.
Packages can use a .onDetach
or .Last.lib
function (provided the latter is exported from the namespace) when detach
is called on the package. It is called with a single argument, the full path to the installed package. There is also a hook .onUnload
which is called when the namespace is unloaded (via a call to unloadNamespace
, perhaps called by detach(unload = TRUE)
) with argument the full path to the installed package’s directory. Functions .onUnload
and .onDetach
should be defined in the namespace and not exported, but .Last.lib
does need to be exported.
Packages are not likely to need .onAttach
(except perhaps for a start-up banner); code to set options and load shared objects should be placed in a .onLoad
function, or use made of the useDynLib
directive described next.
User-level hooks are also available: see the help on function setHook
.
@@ -1447,9 +1448,9 @@
1.5.4 useDynLib
-A NAMESPACE
file can contain one or more useDynLib
directives which allows shared objects that need to be loaded.67 The directive
67 NB: this will only be read in all versions of R if the package contains R code in a R
directory.
+A NAMESPACE
file can contain one or more useDynLib
directives which allows shared objects that need to be loaded.68 The directive
68 NB: this will only be read in all versions of R if the package contains R code in a R
directory.
useDynLib(foo)
-registers the shared object foo
68 for loading with library.dynam
. Loading of registered object(s) occurs after the package code has been loaded and before running the load hook function. Packages that would only need a load hook function to load a shared object can use the useDynLib
directive instead.
68 Note that this is the basename of the shared object, and the appropriate extension (.so
or .dll
) will be added.
+registers the shared object foo
69 for loading with library.dynam
. Loading of registered object(s) occurs after the package code has been loaded and before running the load hook function. Packages that would only need a load hook function to load a shared object can use the useDynLib
directive instead.
69 Note that this is the basename of the shared object, and the appropriate extension (.so
or .dll
) will be added.
The useDynLib
directive also accepts the names of the native routines that are to be used in R via the .C
, .Call
, .Fortran
and .External
interface functions. These are given as additional arguments to the directive, for example,
useDynLib(foo, myRoutine, myOtherRoutine)
By specifying these names in the useDynLib
directive, the native symbols are resolved when the package is loaded and R variables identifying these symbols are added to the package’s namespace with these names. These can be used in the .C
, .Call
, .Fortran
and .External
calls in place of the name of the routine and the PACKAGE
argument. For instance, we can call the routine myRoutine
from R with the code
@@ -1566,7 +1567,7 @@
1.5.6 Namespaces with S4 classes and methods
-Some additional steps are needed for packages which make use of formal (S4-style) classes and methods (unless these are purely used internally). The package should have Depends: methods
69 in its DESCRIPTION
and import(methods)
or importFrom(methods, ...)
plus any classes and methods which are to be exported need to be declared in the NAMESPACE
file. For example, the stats4 package has
69 Imports: methods
may suffice, but package code is little exercised without the methods package on the search path and may not be fully robust to this scenario.
+Some additional steps are needed for packages which make use of formal (S4-style) classes and methods (unless these are purely used internally). The package should have Depends: methods
70 in its DESCRIPTION
and import(methods)
or importFrom(methods, ...)
plus any classes and methods which are to be exported need to be declared in the NAMESPACE
file. For example, the stats4 package has
70 Imports: methods
may suffice, but package code is little exercised without the methods package on the search path and may not be fully robust to this scenario.
(mle) # exporting methods implicitly exports the generic
export("stats", approx, optim, pchisq, predict, qchisq, qnorm, spline)
importFrom## For these, we define methods or (AIC, BIC, nobs) an implicit generic:
@@ -1578,10 +1579,10 @@ , update, vcov)
## implicit generics which do not have any methods here
(AIC, BIC, nobs) export
show
-All S4 classes to be used outside the package need to be listed in an exportClasses
directive. Alternatively, they can be specified using exportClassPattern
70 in the same style as for exportPattern
. To export methods for generics from other packages an exportMethods
directive can be used.
70 This defaults to the same pattern as exportPattern
: use something like exportClassPattern("^$")
to override this.
+All S4 classes to be used outside the package need to be listed in an exportClasses
directive. Alternatively, they can be specified using exportClassPattern
71 in the same style as for exportPattern
. To export methods for generics from other packages an exportMethods
directive can be used.
71 This defaults to the same pattern as exportPattern
: use something like exportClassPattern("^$")
to override this.
Note that exporting methods on a generic in the namespace will also export the generic, and exporting a generic in the namespace will also export its methods. If the generic function is not local to this package, either because it was imported as a generic function or because the non-generic version has been made generic solely to add S4 methods to it (as for functions such as coef
in the example above), it can be declared via either or both of export
or exportMethods
, but the latter is clearer (and is used in the stats4 example above). In particular, for primitive functions there is no generic function, so export
would export the primitive, which makes no sense. On the other hand, if the generic is local to this package, it is more natural to export the function itself using export()
, and this must be done if an implicit generic is created without setting any methods for it (as is the case for AIC
in stats4).
A non-local generic function is only exported to ensure that calls to the function will dispatch the methods from this package (and that is not done or required when the methods are for primitive functions). For this reason, you do not need to document such implicitly created generic functions, and undoc
in package tools will not report them.
-If a package uses S4 classes and methods exported from another package, but does not import the entire namespace of the other package71, it needs to import the classes and methods explicitly, with directives
71 if it does, there will be opaque warnings about replacing imports if the classes/methods are also imported.
+If a package uses S4 classes and methods exported from another package, but does not import the entire namespace of the other package72, it needs to import the classes and methods explicitly, with directives
72 if it does, there will be opaque warnings about replacing imports if the classes/methods are also imported.
importClassesFrom(package, ...)
importMethodsFrom(package, ...)
listing the classes and functions with methods respectively. Suppose we had two small packages A and B with B using A. Then they could have NAMESPACE
files
@@ -1626,16 +1627,16 @@ 1.6 Writing portable packages
This section contains advice on writing packages to be used on multiple platforms or for distribution (for example to be submitted to a package repository such as CRAN).
Portable packages should have simple file names: use only alphanumeric ASCII characters and period (.
), and avoid those names not allowed under Windows (see Package structure).
-Many of the graphics devices are platform-specific: even X11()
(aka x11()
) which although emulated on Windows may not be available on a Unix-alike (and is not the preferred screen device on OS X). It is rarely necessary for package code or examples to open a new device, but if essential,72 use dev.new()
.
72 People use dev.new()
to open a device at a particular size: that is not portable but using dev.new(noRStudioGD = TRUE)
helps.
+Many of the graphics devices are platform-specific: even X11()
(aka x11()
) which although emulated on Windows may not be available on a Unix-alike (and is not the preferred screen device on OS X). It is rarely necessary for package code or examples to open a new device, but if essential,73 use dev.new()
.
73 People use dev.new()
to open a device at a particular size: that is not portable but using dev.new(noRStudioGD = TRUE)
helps.
Use R CMD build
to make the release .tar.gz
file.
R CMD check
provides a basic set of checks, but often further problems emerge when people try to install and use packages submitted to CRAN – many of these involve compiled code. Here are some further checks that you can do to make your package more portable.
If your package has a configure
script, provide a configure.win
or configure.ucrt
script to be used on Windows (an empty configure.win
file if no actions are needed).
-If your package has a Makevars
or Makefile
file, make sure that you use only portable make features. Such files should be LF-terminated73 (including the final line of the file) and not make use of GNU extensions. (The POSIX specification is available at https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html; anything not documented there should be regarded as an extension to be avoided. Further advice can be found at https://www.gnu.org/software/autoconf/manual/autoconf.html#Portable-Make. ) Commonly misused GNU extensions are conditional inclusions (ifeq
and the like), ${shell ...}
, ${wildcard ...}
and similar, and the use of +=
74 and :=
. Also, the use of $<
other than in implicit rules is a GNU extension, as is the $^
macro. As is the use of .PHONY
(some other makes ignore it). Unfortunately makefiles which use GNU extensions often run on other platforms but do not have the intended results.
+If your package has a Makevars
or Makefile
file, make sure that you use only portable make features. Such files should be LF-terminated74 (including the final line of the file) and not make use of GNU extensions. (The POSIX specification is available at https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html; anything not documented there should be regarded as an extension to be avoided. Further advice can be found at https://www.gnu.org/software/autoconf/manual/autoconf.html#Portable-Make. ) Commonly misused GNU extensions are conditional inclusions (ifeq
and the like), ${shell ...}
, ${wildcard ...}
and similar, and the use of +=
75 and :=
. Also, the use of $<
other than in implicit rules is a GNU extension, as is the $^
macro. As is the use of .PHONY
(some other makes ignore it). Unfortunately makefiles which use GNU extensions often run on other platforms but do not have the intended results.
Note that the -C
flag for make
is not included in the POSIX specification and is not implemented by some of the make
s which have been used with R.
The use of ${shell ...}
can be avoided by using backticks, e.g.
= `gsl-config --cflags` PKG_CPPFLAGS
-which works in all versions of make
known75 to be used with R.
+which works in all versions of make
known76 to be used with R.
If you really must require GNU make, declare it in the DESCRIPTION
file by
SystemRequirements: GNU make
and ensure that you use the value of environment variable MAKE
(and not just make
) in your scripts. (On some platforms GNU make is available under a name such as gmake
, and there SystemRequirements
is used to set MAKE
.)
@@ -1650,17 +1651,17 @@
Names of source files including =
(such as src/complex_Sig=gen.c
) will confuse some make
programs and should be avoided.
-Bash extensions also need to be avoided in shell scripts, including expressions in Makefiles (which are passed to the shell for processing). Some R platforms use strict76 Bourne shells: an earlier R toolset on Windows77 and some Unix-alike OSes use ash
(https://en.wikipedia.org/wiki/Almquist_shell, a ’lightweight shell with few builtins) or derivatives such as dash
. Beware of assuming that all the POSIX command-line utilities are available, especially on Windows where only a subset (which has changed by version of Rtools
) is provided for use with R. One particular issue is the use of echo
, for which two behaviours are allowed (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html) and both have occurred as defaults on R platforms: portable applications should use neither -n
(as the first argument) nor escape sequences. The recommended replacement for echo -n
is the command printf
. Another common issue is the construction
+Bash extensions also need to be avoided in shell scripts, including expressions in Makefiles (which are passed to the shell for processing). Some R platforms use strict77 Bourne shells: an earlier R toolset on Windows78 and some Unix-alike OSes use ash
(https://en.wikipedia.org/wiki/Almquist_shell, a ’lightweight shell with few builtins) or derivatives such as dash
. Beware of assuming that all the POSIX command-line utilities are available, especially on Windows where only a subset (which has changed by version of Rtools
) is provided for use with R. One particular issue is the use of echo
, for which two behaviours are allowed (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html) and both have occurred as defaults on R platforms: portable applications should use neither -n
(as the first argument) nor escape sequences. The recommended replacement for echo -n
is the command printf
. Another common issue is the construction
=value export FOO
which is bash
-specific (first set the variable then export it by name).
-Using test -e
(or [ -e ]
) in shell scripts is not fully portable78: -f
is normally what is intended. Flags -a
and -o
are nowadays declared obsolescent by POSIX and should not be used.
+Using test -e
(or [ -e ]
) in shell scripts is not fully portable79: -f
is normally what is intended. Flags -a
and -o
are nowadays declared obsolescent by POSIX and should not be used.
Use of ‘brace expansion’, e.g.,
-f src/*.{o,so,d} rm
is not portable.
The -o
flag for set
in shell scripts is optional in POSIX and not supported on all the platforms R is used on.
The variable OSTYPE
is shell-specific and its values are rather unpredictable and may include a version such as darwin19.0
: uname
is often what is intended (with common values Darwin
, Linux
and SunOS
).
On macOS which shell /bin/sh
invokes is user- and platform-dependent: it might be bash
version 3.2, dash
or zsh
(for new accounts it is zsh
, for accounts ported from Mojave or earlier it is usually bash
).
-Make use of the abilities of your compilers to check the standards-conformance of your code. For example, gcc
, clang
and gfortran
79 can be used with options -Wall -pedantic
to alert you to potential problems. This is particularly important for C++, where g++ -Wall -pedantic
will alert you to the use of some of the GNU extensions which fail to compile on most other C++ compilers. If R was not configured accordingly, one can achieve this via personal Makevars
files. See section ‘Customizing package compilation’ in the ‘R Installation and Administration’ manual for more information.
+Make use of the abilities of your compilers to check the standards-conformance of your code. For example, gcc
, clang
and gfortran
80 can be used with options -Wall -pedantic
to alert you to potential problems. This is particularly important for C++, where g++ -Wall -pedantic
will alert you to the use of some of the GNU extensions which fail to compile on most other C++ compilers. If R was not configured accordingly, one can achieve this via personal Makevars
files. See section ‘Customizing package compilation’ in the ‘R Installation and Administration’ manual for more information.
Portable C++ code needs to follow both the 2011, 2014 and 2017 standards or to specify C+11/14/17/20 where available (which is not the case on all R platforms). Currently C++20 support is patchy across R platforms.
If using Fortran with the GNU compiler, use the flags -std=f95 -Wall -pedantic
which reject most GNU extensions and features from later standards. (Although R only requires Fortran 90, gfortran
does not have a way to specify that standard.) Also consider -std=f2008
as some recent compilers have Fortran 2008 or even 2018 as the minimum supported standard.
As from macOS 11 (late 2020), its C compiler sets the flag -Werror=implicit-function-declaration
by default which forces stricter conformance to C99. This can be used on other platforms with gcc
or clang
. If your package has a (autoconf
-generated) configure script
, try installing it whilst using this flag, and read through the config.log
file — compilation warnings and errors can lead to features which are present not being detected. (If possible do this on several platforms.)
@@ -1674,32 +1675,32 @@ 80: these terminate the user’s R process, quite possibly losing all unsaved work. One usage that could call abort
is the assert
macro in C or C++ functions, which should never be active in production code. The normal way to ensure that is to define the macro NDEBUG
, and R CMD INSTALL
does so as part of the compilation flags. Beware of including headers (including from other packages) which could undefine it, now or in future versions. If you wish to use assert
during development. you can include -UNDEBUG
in PKG_CPPFLAGS
or #undef
it in your headers or code files. Note that your own src/Makefile
or makefiles in sub-directories may also need to define NDEBUG
.
+
Under no circumstances should your compiled code ever call abort
or exit
81: these terminate the user’s R process, quite possibly losing all unsaved work. One usage that could call abort
is the assert
macro in C or C++ functions, which should never be active in production code. The normal way to ensure that is to define the macro NDEBUG
, and R CMD INSTALL
does so as part of the compilation flags. Beware of including headers (including from other packages) which could undefine it, now or in future versions. If you wish to use assert
during development. you can include -UNDEBUG
in PKG_CPPFLAGS
or #undef
it in your headers or code files. Note that your own src/Makefile
or makefiles in sub-directories may also need to define NDEBUG
.
This applies not only to your own code but to any external software you compile in or link to.
Compiled code should not write to stdout
or stderr
and C++ and Fortran I/O should not be used. As with the previous item such calls may come from external software and may never be called, but package authors are often mistaken about that.
-Compiled code should not call the system random number generators such as rand
, drand48
and random
81, but rather use the interfaces to R’s RNGs described in Random number generation. In particular, if more than one package initializes a system RNG (e.g. via srand
), they will interfere with each other. This applies also to Fortran 90’s random_number
and random_seed
, and Fortran 2018’s random_init
. And to GNU Fortran’s rand
, irand
and srand
. Except for drand48
, what PRNG these functions use is implementation-dependent.
+Compiled code should not call the system random number generators such as rand
, drand48
and random
82, but rather use the interfaces to R’s RNGs described in Random number generation. In particular, if more than one package initializes a system RNG (e.g. via srand
), they will interfere with each other. This applies also to Fortran 90’s random_number
and random_seed
, and Fortran 2018’s random_init
. And to GNU Fortran’s rand
, irand
and srand
. Except for drand48
, what PRNG these functions use is implementation-dependent.
Nor should the C++11 random number library be used nor any other third-party random number generators such as those in GSL.
-Use of sprintf
and vsprintf
is regarded as a potential security risk and warned about on some platforms.82 R CMD check
reports if any calls are found.
+Use of sprintf
and vsprintf
is regarded as a potential security risk and warned about on some platforms.83 R CMD check
reports if any calls are found.
Errors in memory allocation and reading/writing outside arrays are very common causes of crashes (e.g., segfaults) on some machines. See Checking memory access for tools which can be used to look for this.
Many platforms will allow unsatisfied entry points in compiled code, but will crash the application (here R) if they are ever used. Some (notably Windows) will not. Looking at the output of
-pg mypkg.so nm
and checking if any of the symbols marked U
is unexpected is a good way to avoid this.
-Linkers have a lot of freedom in how to resolve entry points in dynamically-loaded code, so the results may differ by platform. One area that has caused grief is packages including copies of standard system software such as libz
(especially those already linked into R). In the case in point, entry point gzgets
was sometimes resolved against the old version compiled into the package, sometimes against the copy compiled into R and sometimes against the system dynamic library. The only safe solution is to rename the entry points in the copy in the package. We have even seen problems with entry point name myprintf
, which is a system entry point83 on some Linux systems.
+Linkers have a lot of freedom in how to resolve entry points in dynamically-loaded code, so the results may differ by platform. One area that has caused grief is packages including copies of standard system software such as libz
(especially those already linked into R). In the case in point, entry point gzgets
was sometimes resolved against the old version compiled into the package, sometimes against the copy compiled into R and sometimes against the system dynamic library. The only safe solution is to rename the entry points in the copy in the package. We have even seen problems with entry point name myprintf
, which is a system entry point84 on some Linux systems.
A related issue is the naming of libraries built as part of the package installation. macOS and Windows have case-insensitive file systems, so using
-L. -lLZ4
in PKG_LIBS
will match liblz4
. And -L.
only appends to the list of searched locations, and liblz4
might be found in an earlier-searched location (and has been). The only safe way is to give an explicit path, for example
/libLZ4.a .
Conflicts between symbols in DLLs are handled in very platform-specific ways. Good ways to avoid trouble are to make as many symbols as possible static (check with nm -pg
), and to use names which are clearly tied to your package (which also helps users if anything does go wrong). Note that symbol names starting with R_
are regarded as part of R’s namespace and should not be used in packages.
-It is good practice for DLLs to register their symbols (see Registering native routines), restrict visibility (see Controlling visibility) and not allow symbol search (see Registering native routines). It should be possible for a DLL to have only one visible symbol, R_init_pkgname
, on suitable platforms84, which would completely avoid symbol conflicts.
+It is good practice for DLLs to register their symbols (see Registering native routines), restrict visibility (see Controlling visibility) and not allow symbol search (see Registering native routines). It should be possible for a DLL to have only one visible symbol, R_init_pkgname
, on suitable platforms85, which would completely avoid symbol conflicts.
It is not portable to call compiled code in R or other packages via .Internal
, .C
, .Fortran
, .Call
or .External
, since such interfaces are subject to change without notice and will probably result in your code terminating the R process.
Do not use (hard or symbolic) file links in your package sources. Where possible R CMD build
will replace them by copies.
If you do not yourself have a Windows system, consider submitting your source package to WinBuilder (https://win-builder.r-project.org/) before distribution. If you need to check on an M1 Mac, there is a check service at https://mac.r-project.org/macbuilder/submit.html.
It is bad practice for package code to alter the search path using library
, require
or attach
and this often does not work as intended. For alternatives, see Suggested packages and with()
.
-Examples can be run interactively via example
as well as in batch mode when checking. So they should behave appropriately in both scenarios, conditioning by interactive()
the parts which need an operator or observer. For instance, progress bars85 are only appropriate in interactive use, as is displaying help pages or calling View()
(see below).
-Be careful with the order of entries in macros such as PKG_LIBS
. Some linkers will re-order the entries, and behaviour can differ between dynamic and static libraries. Generally -L
options should precede86 the libraries (typically specified by -l
options) to be found from those directories, and libraries are searched once in the order they are specified. Not all linkers allow a space after -L
.
+Examples can be run interactively via example
as well as in batch mode when checking. So they should behave appropriately in both scenarios, conditioning by interactive()
the parts which need an operator or observer. For instance, progress bars86 are only appropriate in interactive use, as is displaying help pages or calling View()
(see below).
+Be careful with the order of entries in macros such as PKG_LIBS
. Some linkers will re-order the entries, and behaviour can differ between dynamic and static libraries. Generally -L
options should precede87 the libraries (typically specified by -l
options) to be found from those directories, and libraries are searched once in the order they are specified. Not all linkers allow a space after -L
.
Care is needed with the use of LinkingTo
. This puts one or more directories on the include search path ahead of system headers but (prior to R 3.4.0) after those specified in the CPPFLAGS
macro of the R build (which normally includes -I/usr/local/include
, but most platforms ignore that and include it with the system headers).
Any confusion would be avoided by having LinkingTo
headers in a directory named after the package. In any case, name conflicts of headers and directories under package include
directories should be avoided, both between packages and between a package and system and third-party software.
-The ar
utility is often used in makefiles to make static libraries. Its modifier u
is defined by POSIX but is disabled in GNU ar
on some Linux distributions which use ‘deterministic mode’. The safest way to make a static library is to first remove any existing file of that name then use $(AR) -cr
and then $(RANLIB)
if needed (which is system-dependent: on most systems87 ar
always maintains a symbol table). The POSIX standard says options should be preceded by a hyphen (as in -cr
), although most OSes accept them without. Note that on some systems ar -cr
must have at least one file specified.
+The ar
utility is often used in makefiles to make static libraries. Its modifier u
is defined by POSIX but is disabled in GNU ar
on some Linux distributions which use ‘deterministic mode’. The safest way to make a static library is to first remove any existing file of that name then use $(AR) -cr
and then $(RANLIB)
if needed (which is system-dependent: on most systems88 ar
always maintains a symbol table). The POSIX standard says options should be preceded by a hyphen (as in -cr
), although most OSes accept them without. Note that on some systems ar -cr
must have at least one file specified.
The s
modifier (to replace a separate call to ranlib
) is required by X/OPEN but not POSIX, so ar -crs
is not portable.
For portability the AR
and RANLIB
macros should always be used – some builds require wrappers such as gcc-ar
or extra arguments to specify plugins.
The strip
utility is platform-specific (and CRAN prohibits removing debug symbols). For example the options --strip-debug
and --strip-unneeded
of the GNU version are not supported on macOS: the POSIX standard for strip
does not mention any options, and what calling it without options does is platform-dependent. Stripping a .so
file could even prevent it being dynamically loaded into R on an untested platform.
@@ -1708,7 +1709,7 @@ knitr vignette that used spaces in plot names: this caused some older versions of pandoc
to fail with a baffling error message.
Non-ASCII filenames can also cause problems (particularly in non-UTF-8 locales).
Take care in naming LaTeX macros (also known as ‘commands’) in vignette sources: if these are also defined in a future version of one of the LaTeX packages used there will be a fatal error. One instance in 2021 was package hyperref
newly defining \C
, \F
, \G
, \U
and \textapprox
. If you are confident that your definitions will be the only ones relevant you can use \renewcommand
but it is better to use names clearly associated with your package.
-Make sure that any version requirement for Java code is both declared in the SystemRequirements
field88 and tested at runtime (not least as the Java installation when the package is installed might not be the same as when the package is run and will not be for binary packages).
+Make sure that any version requirement for Java code is both declared in the SystemRequirements
field89 and tested at runtime (not least as the Java installation when the package is installed might not be the same as when the package is run and will not be for binary packages).
When specifying a minimum Java version please use the official version names, which are (confusingly)
1.1 1.2 1.3 1.4 5.0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
and as from 2018 a year.month scheme such as 18.9
is also in use. Fortunately only the integer values are likely to be relevant. If at all possible, use one of the LTS versions (8, 11, 17, 21 …) as the minimum version. The preferred form of version specification is
@@ -1731,7 +1732,7 @@
-A package with a hard-to-satisfy system requirement is by definition not portable, annoyingly so if this is not declared in the SystemRequirements
field. The most common example is the use of pandoc
, which is only available for a very limited range of platforms (and has onerous requirements to install from source) and has capabilities89 that vary by build but are not documented. Several recent versions of pandoc
for macOS did not work on R’s then target of High Sierra (and this too was undocumented). Another example is the Rust compilation system (cargo
and rustc
).
+A package with a hard-to-satisfy system requirement is by definition not portable, annoyingly so if this is not declared in the SystemRequirements
field. The most common example is the use of pandoc
, which is only available for a very limited range of platforms (and has onerous requirements to install from source) and has capabilities90 that vary by build but are not documented. Several recent versions of pandoc
for macOS did not work on R’s then target of High Sierra (and this too was undocumented). Another example is the Rust compilation system (cargo
and rustc
).
Usage of external commands should always be conditional on a test for presence (perhaps using Sys.which
), as well as declared in the SystemRequirements
field. A package should pass its checks without warnings nor errors without the external command being present.
An external command can be a (possibly optional) requirement for an imported or suggested package but needed for examples, tests or vignettes in the package itself. Such usages should always be declared and conditional.
Interpreters for scripting languages such as Perl, Python and Ruby need to be declared as system requirements and used conditionally: for example macOS 10.16 was announced not to have them (but released as macOS 11 with them); later it was announced that macOS 12.3 does not have Python 2 and only a minimal install of Python 3 is included. Python 2 has passed end-of-life and been removed from many major distributions. Support for Rust or Go cannot be assumed.
@@ -1745,7 +1746,7 @@ and as these record the time in UTC, the time represented is independent of the time zone: but how it is printed may not be. Objects of class "POSIXlt"
should have a "tzone"
attribute. Dates (e.g, birthdays) are conventionally considered independently of time zone.
If at all possible avoid any Internet access during package installation. Installation and use may well be on different machines/accounts and those allowed to install software may have no Internet access, and being self-contained helps ensure long-term reproducibility.
-73 Solaris make
did not accept CRLF-terminated Makefiles; Solaris warned about and some other make
s ignore incomplete final lines.
74 This was apparently introduced in SunOS 4, and is available elsewhere provided it is surrounded by spaces.
75 GNU make, BSD make and other variants of pmake
in FreeBSD, NetBSD and formerly in macOS, and formerly AT&T make as implemented on Solaris and ‘Distributed Make’ (dmake
), part of Oracle Developer Studio and available in other versions including from Apache OpenOffice.
76 For example, test
options -a
and -e
are not portable, and not supported in the AT&T Bourne shell used on Solaris 10/11, even though they are in the POSIX standard. Nor did Solaris support $(cmd)
.
77 as from R 4.0.0 the default is bash
.
78 it was not in the Bourne shell, and was not supported by Solaris 10.
79 https://fortranwiki.org/fortran/show/Modernizing+Old+Fortran may help explain some of the warnings from gfortran -Wall -pedantic
.
80 or where supported the variants _Exit
and _exit
.
81 This and srandom
are in any case not portable. They are in POSIX but not in the C99 standard, and not available on Windows.
82 including macOS as from version 13.
83 in libselinux
.
84 At least Linux and Windows, but not macOS.
85 except perhaps the simplest kind as used by download.file()
in non-interactive use.
86 Whereas the GNU linker reorders so -L
options are processed first, the Solaris one did not.
87 some versions of macOS did not.
88 If a Java interpreter is required directly (not via rJava) this must be declared and its presence tested like any other external command.
89 For example, the ability to handle https://
URLs.
Do be careful in what your tests (and examples) actually test. Bad practice seen in distributed packages include:
+74 Solaris make
did not accept CRLF-terminated Makefiles; Solaris warned about and some other make
s ignore incomplete final lines.
75 This was apparently introduced in SunOS 4, and is available elsewhere provided it is surrounded by spaces.
76 GNU make, BSD make and other variants of pmake
in FreeBSD, NetBSD and formerly in macOS, and formerly AT&T make as implemented on Solaris and ‘Distributed Make’ (dmake
), part of Oracle Developer Studio and available in other versions including from Apache OpenOffice.
77 For example, test
options -a
and -e
are not portable, and not supported in the AT&T Bourne shell used on Solaris 10/11, even though they are in the POSIX standard. Nor did Solaris support $(cmd)
.
78 as from R 4.0.0 the default is bash
.
79 it was not in the Bourne shell, and was not supported by Solaris 10.
80 https://fortranwiki.org/fortran/show/Modernizing+Old+Fortran may help explain some of the warnings from gfortran -Wall -pedantic
.
81 or where supported the variants _Exit
and _exit
.
82 This and srandom
are in any case not portable. They are in POSIX but not in the C99 standard, and not available on Windows.
83 including macOS as from version 13.
84 in libselinux
.
85 At least Linux and Windows, but not macOS.
86 except perhaps the simplest kind as used by download.file()
in non-interactive use.
87 Whereas the GNU linker reorders so -L
options are processed first, the Solaris one did not.
88 some versions of macOS did not.
89 If a Java interpreter is required directly (not via rJava) this must be declared and its presence tested like any other external command.
90 For example, the ability to handle https://
URLs.
Do be careful in what your tests (and examples) actually test. Bad practice seen in distributed packages include:
It is not reasonable to test the time taken by a command: you cannot know how fast or how heavily loaded an R platform might be. At best you can test a ratio of times, and even that is fraught with difficulties and not advisable: for example, the garbage collector may trigger at unpredictable times following heuristics that may change without notice.
Do not test the exact format of R messages (from R itself or from other packages): They change, and they can be translated.
@@ -1754,19 +1755,19 @@ if(interactive()) View(obj) else print(head(obj))
if(interactive()) View(obj) else str(obj)
Be careful when comparing file paths. There can be multiple paths to a single file, and some of these can be very long character strings. If possible canonicalize paths before comparisons, but study ?normalizePath
to be aware of the pitfalls.
Only test the accuracy of results if you have done a formal error analysis. Things such as checking that probabilities numerically sum to one are silly: numerical tests should always have a tolerance. That the tests on your platform achieve a particular tolerance says little about other platforms. R is configured by default to make use of long doubles where available, but they may not be available or be too slow for routine use. Most R platforms use ix86
or x86_64
CPUs: these may use extended precision registers on some but not all of their FPU instructions. Thus the achieved precision can depend on the compiler version and optimization flags—our experience is that 32-bit builds tend to be less precise than 64-bit ones. But not all platforms use those CPUs, and not all90 which use them configure them to allow the use of extended precision. In particular, current ARM CPUs do not have extended precision nor long doubles, and clang
currently has long double the same as double on all ARM CPUs. On the other hand some CPUs have higher-precision modes which may be used for long double
, notably 64-bit PowerPC and Sparc.
If you must try to establish a tolerance empirically, configure and build R with --disable-long-double
and use appropriate compiler flags (such as -ffloat-store
and -fexcess-precision=standard
for gcc
, depending on the CPU type91) to mitigate the effects of extended-precision calculations. The platform most often seen to give different numerical results is arm64
macOS, so be sure to include that in any empirical determination.
Only test the accuracy of results if you have done a formal error analysis. Things such as checking that probabilities numerically sum to one are silly: numerical tests should always have a tolerance. That the tests on your platform achieve a particular tolerance says little about other platforms. R is configured by default to make use of long doubles where available, but they may not be available or be too slow for routine use. Most R platforms use ix86
or x86_64
CPUs: these may use extended precision registers on some but not all of their FPU instructions. Thus the achieved precision can depend on the compiler version and optimization flags—our experience is that 32-bit builds tend to be less precise than 64-bit ones. But not all platforms use those CPUs, and not all91 which use them configure them to allow the use of extended precision. In particular, current ARM CPUs do not have extended precision nor long doubles, and clang
currently has long double the same as double on all ARM CPUs. On the other hand some CPUs have higher-precision modes which may be used for long double
, notably 64-bit PowerPC and Sparc.
If you must try to establish a tolerance empirically, configure and build R with --disable-long-double
and use appropriate compiler flags (such as -ffloat-store
and -fexcess-precision=standard
for gcc
, depending on the CPU type92) to mitigate the effects of extended-precision calculations. The platform most often seen to give different numerical results is arm64
macOS, so be sure to include that in any empirical determination.
Tests which involve random inputs or non-deterministic algorithms should normally set a seed or be tested for many seeds.
Tests should use options(warn = 1)
as reporting
22 warnings (use warnings() to see them) There were
is pointless, especially for automated checking systems.
If your package uses dates/times, ensure that it works in all timezones, especially those near boundaries (problems have most often be seen in Europe/London
(zero offset in Winter) and Pacific/Auckland
, near enough the International Date line) and with offsets not in whole hours (Adelaide, Chatham Islands, …). More extreme examples are Africa/Conakry
(permanent UTC), Asia/Calcutta
(no DST, permanent half-hour offset) and Pacific/Kiritimati
(no DST, more than 12 hours ahead of UTC).
90 Not doing so is the default on Windows, overridden for the R executables.
91 These are not needed for the default compiler settings on x86_64
but are likely to be needed on ix86
.
91 Not doing so is the default on Windows, overridden for the R executables.
92 These are not needed for the default compiler settings on x86_64
but are likely to be needed on ix86
.
There are a several tools available to reduce the size of PDF files: often the size can be reduced substantially with no or minimal loss in quality. Not only do large files take up space: they can stress the PDF viewer and take many minutes to print (if they can be printed at all).
qpdf
(https://qpdf.sourceforge.io/) can compress losslessly. It is fairly readily available (e.g. it has binaries for Windows and packages in Debian/Ubuntu/Fedora, and is installed as part of the CRAN macOS distribution of R). R CMD build
has an option to run qpdf
over PDF files under inst/doc
and replace them if at least 10Kb and 10% is saved. The full path to the qpdf
command can be supplied as environment variable R_QPDF
(and is on the CRAN binary of R for macOS). It seems MiKTeX does not use PDF object compression and so qpdf
can reduce considerably the sizes of files it outputs: MiKTeX’s defaults can be overridden by code in the preamble of an Sweave or LaTeX file — see how this is done for the R reference manual at https://svn.r-project.org/R/trunk/doc/manual/refman.top.
Other tools can reduce the size of PDFs containing bitmap images at excessively high resolution. These are often best re-generated (for example Sweave
defaults to 300 ppi, and 100–150 is more appropriate for a package manual). These tools include Adobe Acrobat (not Reader), Apple’s Preview92 and Ghostscript (which converts PDF to PDF by
92 Select ‘Save as’, and select ‘Reduce file size’ from the ‘Quartz filter’ menu’: this can be accessed in other ways, for example by Automator.
Other tools can reduce the size of PDFs containing bitmap images at excessively high resolution. These are often best re-generated (for example Sweave
defaults to 300 ppi, and 100–150 is more appropriate for a package manual). These tools include Adobe Acrobat (not Reader), Apple’s Preview93 and Ghostscript (which converts PDF to PDF by
93 Select ‘Save as’, and select ‘Reduce file size’ from the ‘Quartz filter’ menu’: this can be accessed in other ways, for example by Automator.
-dAutoRotatePages=/None -dPrinted=false in.pdf out.pdf ps2pdf options
and suitable options might be
-dPDFSETTINGS=/ebook
@@ -1786,11 +1787,11 @@ 1.6.3 Encoding issues
The issues in this subsection have been much alleviated by the change in R 4.2.0 to running the Windows port of R in a UTF-8 locale where available. However, Windows users might be running an earlier version of R on an earlier version of Windows which does not support UTF-8 locales.
Care is needed if your package contains non-ASCII text, and in particular if it is intended to be used in more than one locale. It is possible to mark the encoding used in the DESCRIPTION
file and in .Rd
files, as discussed elsewhere in this manual.
-First, consider carefully if you really need non-ASCII text. Some users of R will only be able to view correctly text in their native language group (e.g. Western European, Eastern European, Simplified Chinese) and ASCII.93. Other characters may not be rendered at all, rendered incorrectly, or cause your R code to give an error. For .Rd
documentation, marking the encoding and including ASCII transliterations is likely to do a reasonable job. The set of characters which is commonly supported is wider than it used to be around 2000, but non-Latin alphabets (Greek, Russian, Georgian, …) are still often problematic and those with double-width characters (Chinese, Japanese, Korean, emoji) often need specialist fonts to render correctly.
93 except perhaps some special characters such as backslash and hash which may be taken over for currency symbols.
+First, consider carefully if you really need non-ASCII text. Some users of R will only be able to view correctly text in their native language group (e.g. Western European, Eastern European, Simplified Chinese) and ASCII.94. Other characters may not be rendered at all, rendered incorrectly, or cause your R code to give an error. For .Rd
documentation, marking the encoding and including ASCII transliterations is likely to do a reasonable job. The set of characters which is commonly supported is wider than it used to be around 2000, but non-Latin alphabets (Greek, Russian, Georgian, …) are still often problematic and those with double-width characters (Chinese, Japanese, Korean, emoji) often need specialist fonts to render correctly.
94 except perhaps some special characters such as backslash and hash which may be taken over for currency symbols.
Several CRAN packages have messages in their R code in French (and a few in German). A better way to tackle this is to use the internationalization facilities discussed elsewhere in this manual.
Function showNonASCIIfile
in package tools can help in finding non-ASCII bytes in files.
There is a portable way to have arbitrary text in character strings (only) in your R code, which is to supply them in Unicode as \uxxxx
escapes (or, rarely needed except for emojis, \Uxxxxxxxx
escapes). If there are any characters not in the current encoding the parser will encode the character string as UTF-8 and mark it as such. This applies also to character strings in datasets: they can be prepared using \uxxxx
escapes or encoded in UTF-8 in a UTF-8 locale, or even converted to UTF-8 via iconv()
. If you do this, make sure you have R (>= 2.10)
(or later) in the Depends
field of the DESCRIPTION
file.
-R sessions running in non-UTF-8 locales will if possible re-encode such strings for display (and this is done by RGui
on older versions of Windows, for example). Suitable fonts will need to be selected or made available94 both for the console/terminal and graphics devices such as X11()
and windows()
. Using postscript
or pdf
will choose a default 8-bit encoding depending on the language of the UTF-8 locale, and your users would need to be told how to select the encoding
argument.
94 Typically on a Unix-alike this is done by telling fontconfig
where to find suitable fonts to select glyphs from.
+R sessions running in non-UTF-8 locales will if possible re-encode such strings for display (and this is done by RGui
on older versions of Windows, for example). Suitable fonts will need to be selected or made available95 both for the console/terminal and graphics devices such as X11()
and windows()
. Using postscript
or pdf
will choose a default 8-bit encoding depending on the language of the UTF-8 locale, and your users would need to be told how to select the encoding
argument.
95 Typically on a Unix-alike this is done by telling fontconfig
where to find suitable fonts to select glyphs from.
Note that the previous two paragraphs only apply to character strings in R code. Non-ASCII characters are particularly prevalent in comments (in the R code of the package, in examples, tests, vignettes and even in the NAMESPACE
file) but should be avoided there. Most commonly people use the Windows extensions to Latin-1 (often directional single and double quotes, ellipsis, bullet and en and em dashes) which are not supported in strict Latin-1 locales nor in CJK locales on Windows. A surprisingly common misuse is to use a right quote in don't
instead of the correct apostrophe.
Datasets can include marked UTF-8 or Latin-1 character strings. As R is nowadays unlikely to be run in a Latin-1 or Windows’ CP1252 locale, for performance reasons these should be converted to UTF-8.
If you want to run R CMD check
on a Unix-alike over a package that sets a package encoding in its DESCRIPTION
file and do not use a UTF-8 locale you may need to specify a suitable locale via environment variable R_ENCODING_LOCALES
. The default is equivalent to the value
@@ -1800,10 +1801,10 @@
1.6.4 Portable C and C++ code
Writing portable C and C++ code is mainly a matter of observing the standards (C99, C++14 or where declared C++11/17/20) and testing that extensions (such as POSIX functions) are supported. Do make maximal use of your compiler diagnostics — this typically means using flags -Wall
and -pedantic
for both C and C++ and additionally -Werror=implicit-function-declaration
and -Wstrict-prototypes
for C (on some platforms and compiler versions) these are part of -Wall
or -pedantic
).
-C++ standards: From version 3.6.0 (3.6.2 on Windows), R defaulted to C++11 where available95; from R 4.1.0 to C++14 and from R 4.3.0 to C++17 (where available). However, in earlier versions the default standard was that of the compiler used, often C++98 or C++14, and the default is likely to change in future. For maximal portability a package should either specify a standard (see Using C++ code) or be tested under all of C++11, C++98, C++14 and C++17. (Specifying C++14 or later will limit portability.)
95 which it is on all known platforms, and is required as from R 4.0.0
+C++ standards: From version 3.6.0 (3.6.2 on Windows), R defaulted to C++11 where available96; from R 4.1.0 to C++14 and from R 4.3.0 to C++17 (where available). However, in earlier versions the default standard was that of the compiler used, often C++98 or C++14, and the default is likely to change in future. For maximal portability a package should either specify a standard (see Using C++ code) or be tested under all of C++11, C++98, C++14 and C++17. (Specifying C++14 or later will limit portability.)
96 which it is on all known platforms, and is required as from R 4.0.0
Note that the ‘TR1’ C++ extensions are not part of any of these standards and the <tr1/name>
headers are not supplied by some of the compilers used for R, including on macOS. (Use the C++11 versions instead.)
-A common error is to assume recent versions of compilers or OSes. In production environments ‘long term support’ versions of OSes may be in use for many years,96 and their compilers may not be updated during that time. For example, GCC 4.8 was still in use in 2022 and could be (in RHEL 7) until 2028: that supports neither C++14 nor C++17.
96 Ubuntu provides 5 years of support (but people were running 14.04 after 7 years) and RHEL provides 10 years full support and up to 14 with extended support.
-The POSIX standards only require recently-defined functions to be declared if certain macros are defined with large enough values, and on some compiler/OS combinations97 they are not declared otherwise. So you may need to include something like one of
97 This is seen on Linux, Solaris and FreeBSD, although each has other ways to turn on all extensions, e.g. defining _GNU_SOURCE
, __EXTENSIONS__
or _BSD_SOURCE
: the GCC compilers by default define _GNU_SOURCE
unless a strict standard such as -std=c99
is used. On macOS extensions are declared unless one of these macros is given too small a value.
+A common error is to assume recent versions of compilers or OSes. In production environments ‘long term support’ versions of OSes may be in use for many years,97 and their compilers may not be updated during that time. For example, GCC 4.8 was still in use in 2022 and could be (in RHEL 7) until 2028: that supports neither C++14 nor C++17.
97 Ubuntu provides 5 years of support (but people were running 14.04 after 7 years) and RHEL provides 10 years full support and up to 14 with extended support.
+The POSIX standards only require recently-defined functions to be declared if certain macros are defined with large enough values, and on some compiler/OS combinations98 they are not declared otherwise. So you may need to include something like one of
98 This is seen on Linux, Solaris and FreeBSD, although each has other ways to turn on all extensions, e.g. defining _GNU_SOURCE
, __EXTENSIONS__
or _BSD_SOURCE
: the GCC compilers by default define _GNU_SOURCE
unless a strict standard such as -std=c99
is used. On macOS extensions are declared unless one of these macros is given too small a value.
#define _XOPEN_SOURCE 600
or
#ifdef __GLIBC__
@@ -1816,13 +1817,13 @@ 98 for sqrt(2.0)
, frequently mis-coded as sqrt(2)
.
+
A surprising common misuse is things like pow(10, -3)
: this should be the constant 1e-3
. Note that there are constants such as M_SQRT2
defined via Rmath.h
99 for sqrt(2.0)
, frequently mis-coded as sqrt(2)
.
Function fabs
is defined only for floating-point types, except in C++11 and later which have overloads for std::fabs
in <cmath>
for integer types. Function abs
is defined in C99’s <stdlib.h>
for int
and in C++’s <cstdlib>
for integer types, overloaded in <cmath>
for floating-point types. C++11 has additional overloads for std::abs
in <cmath>
for integer types. The effect of calling abs
with a floating-point type is implementation-specific: it may truncate to an integer. For clarity and to avoid compiler warnings, use abs
for integer types and fabs
for double values, and when using C++ include <cmath>
and use the std::
prefix.
-It is an error (and make little sense, although has been seen) to call macros/functions isnan
, isinf
and isfinite
for integer arguments: a few compilers give a compilation error. Function finite
is obsolete, and some compilers will warn about its use99.
-The GNU C/C++ compilers support a large number of non-portable extensions. For example, INFINITY
(which is a float value in C99 and C++11), for which R provides the portable double value R_PosInf
(and R_NegInf
for -INFINITY
). And NAN
100 is just one NaN float value: for use with R, NA_REAL
is often what is intended, but R_NaN
is also available.
+It is an error (and make little sense, although has been seen) to call macros/functions isnan
, isinf
and isfinite
for integer arguments: a few compilers give a compilation error. Function finite
is obsolete, and some compilers will warn about its use100.
+The GNU C/C++ compilers support a large number of non-portable extensions. For example, INFINITY
(which is a float value in C99 and C++11), for which R provides the portable double value R_PosInf
(and R_NegInf
for -INFINITY
). And NAN
101 is just one NaN float value: for use with R, NA_REAL
is often what is intended, but R_NaN
is also available.
Some (but not all) extensions are listed at https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html and https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Extensions.html.
Other GNU extensions which have bitten package writers are the use of non-portable characters such as $
in identifiers and use of C++ headers under ext
.
-item Including C-style headers in C++ code is not portable. Including the legacy header101 math.h
in C++ code may conflict with cmath
which may be included by other headers. In C++11, functions like sqrt
and isnan
are defined for double
arguments in math.h
and for a range of types including double
in cmath
. Similar issues have been seen for stdlib.h
and cstdlib
. Including the C++ header first used to be a sufficient workaround but for some 2016 compilers only one could be included.
+item Including C-style headers in C++ code is not portable. Including the legacy header102 math.h
in C++ code may conflict with cmath
which may be included by other headers. In C++11, functions like sqrt
and isnan
are defined for double
arguments in math.h
and for a range of types including double
in cmath
. Similar issues have been seen for stdlib.h
and cstdlib
. Including the C++ header first used to be a sufficient workaround but for some 2016 compilers only one could be included.
Be careful to include the headers which define the functions you use. Some compilers/OSes include other system headers in their headers which are not required by the standards, and so code may compile on such systems and not on others. (A prominent example is the C++ header <random>
which is indirectly included by <algorithm>
by g++
. Another issue is the C header <time.h>
which is included by other headers on Linux and Windows but not macOS.) g++
11 often needs explicit inclusion of the C++ headers <limits>
(for numeric_limits
) or <exception>
(for set_terminate
and similar), whereas earlier versions included these in other headers. g++
13 requires the explicit inclusion of <cstdint>
for types such as uint32_t
which was previously included implicitly. (For more such, see https://gcc.gnu.org/gcc-13/porting_to.html.)
Note that malloc
, calloc
, realloc
and free
are defined by C99 in the header stdlib.h
and (in the std::
namespace) by C++ header cstdlib
. Some earlier implementations used a header malloc.h
, but that is not portable and does not exist on macOS.
This also applies to types such as ssize_t
. The POSIX standards say that is declared in headers unistd.h
and sys/types.h
, and the latter is often included indirectly by other headers on some but not all systems.
@@ -1856,7 +1857,7 @@ : ignoring #pragma omp parallel [-Wunknown-pragmas] warning
uses of such pragmas should also be conditioned (or commented out if they are used in code in a package not enabling OpenMP on any platform).
Do not hardcode -lgomp
: not only is that specific to the GCC family of compilers, using the correct linker flag often sets up the run-time path to the library.
-Package authors commonly assume things are part of C/C++ when they are not: the most common example is POSIX102 function strdup
. The most common C library on Linux, glibc
, will hide the declarations of such extensions unless a ‘feature-test macro’ is defined before (almost) any system header is included. So for strdup
you need
+Package authors commonly assume things are part of C/C++ when they are not: the most common example is POSIX103 function strdup
. The most common C library on Linux, glibc
, will hide the declarations of such extensions unless a ‘feature-test macro’ is defined before (almost) any system header is included. So for strdup
you need
#define _POSIX_C_SOURCE 200809L
...
#include <string.h>
@@ -1879,13 +1880,13 @@ ++11 does not allow conversion from string literal to 'char *' ISO C
(where conversion should be to const char *
). Keyword register
was not mentioned in C++98, deprecated in C++11 and removed in C++17.
-There are quite a lot of other C++98 features deprecated in C++11 and removed in C++17, and LLVM clang
9 and later warn about them (and as from version 16 they have been removed). Examples include bind1st
/bind2nd
(use std::bind
or lambdas103) std::auto_ptr
(replaced by std::unique_ptr
), std::mem_fun_ref
and std::ptr_fun
.
+There are quite a lot of other C++98 features deprecated in C++11 and removed in C++17, and LLVM clang
9 and later warn about them (and as from version 16 they have been removed). Examples include bind1st
/bind2nd
(use std::bind
or lambdas104) std::auto_ptr
(replaced by std::unique_ptr
), std::mem_fun_ref
and std::ptr_fun
.
Later versions of standards may add reserved words: for example bool
, false
and true
became keywords in C23 and are no longer available as variable names. As noted above, C++17 uses byte
, data
, sample
and size
.
So avoid common words and keywords from other programming languages.
Be careful about including C headers in C++ code. Issues include
- Use of the
register
storage class specifier (see the previous but one item).
-- The C99 keyword
restrict
is not part of104 any C++ standard and is rejected by some C++ compilers.
+- The C99 keyword
restrict
is not part of105 any C++ standard and is rejected by some C++ compilers.
- Inclusion by such headers of C-style headers such as
math.h
(see above).
The most portable way to interface to other software with a C API is to use C code (which can normally be mixed with C++ code in a package).
@@ -1904,8 +1905,8 @@ 105 if these are used for sprintf
, vsprintf
, gets
, mktemp
, tempmam
and tmpnam
. It is highly recommended that you use safer alternatives (on any platform) but the warning can be avoided by defining _POSIX_C_SOURCE
to for example 200809L
before including the (C or C++) header which defines them. (However, this may hide other extensions.)
-
Compilers may interpret comments in source code, so it is necessary to remove any intended for a compiler to interpret. The main example has been comments for Visual Fortran (as the Intel Fortran compiler has been known on Windows106) like
+Several C entry points are warned against in their man
pages on most systems, often in very strong terms such as ‘Do not use these functions’. macOS has started to warn106 if these are used for sprintf
, vsprintf
, gets
, mktemp
, tempmam
and tmpnam
. It is highly recommended that you use safer alternatives (on any platform) but the warning can be avoided by defining _POSIX_C_SOURCE
to for example 200809L
before including the (C or C++) header which defines them. (However, this may hide other extensions.)
+Compilers may interpret comments in source code, so it is necessary to remove any intended for a compiler to interpret. The main example has been comments for Visual Fortran (as the Intel Fortran compiler has been known on Windows107) like
!DEC$ ATTRIBUTES DLLEXPORT,C,REFERENCE,ALIAS:'kdenestmlcvb' :: kdenestmlcvb
which are interpreted by Intel Fortran on all platforms (and are inappropriate for use with R on Windows). gfortran
has similar forms starting with !GCC$
.
The C++ new
operator takes argument std::size_t size
, which is unsigned. Using a signed integer type such as int
may lead to compiler warnings such as
@@ -1913,16 +1914,16 @@ 9223372036854775807 [-Walloc-size-larger-than=]
size
(especially if LTO is used). So don’t do that!
-98 often taken from the toolchain’s headers.
99 at the time of writing arm64
macOS both warned and did not supply a prototype in math.h
which resulted in a compilation error.
100 also part of C++11 and later.
101 which often is the same as the header included by the C compiler, but some compilers have wrappers for some of the C headers.
102 Although this is expected to be part of C23, full support of that is years away.
103 https://stackoverflow.com/questions/32739018/a-replacement-for-stdbind2nd
104 it is allowed but ignored in system headers.
105 when using the macOS 13 SDK with a deployment target of macOS 13.
106 and at one time as DEC Fortran, hence the DEC
.
Some additional information for C++ is available at https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Plummer.pdf by Martyn Plummer.
+99 often taken from the toolchain’s headers.
100 at the time of writing arm64
macOS both warned and did not supply a prototype in math.h
which resulted in a compilation error.
101 also part of C++11 and later.
102 which often is the same as the header included by the C compiler, but some compilers have wrappers for some of the C headers.
103 Although this is expected to be part of C23, full support of that is years away.
104 https://stackoverflow.com/questions/32739018/a-replacement-for-stdbind2nd
105 it is allowed but ignored in system headers.
106 when using the macOS 13 SDK with a deployment target of macOS 13.
107 and at one time as DEC Fortran, hence the DEC
.
Some additional information for C++ is available at https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Plummer.pdf by Martyn Plummer.
Most OSes (including all those commonly used for R) have the concept of ‘tentative definitions’ where global C variables are defined without an initializer. Traditionally the linker resolves all tentative definitions of the same variable in different object files to the same object, or to a non-tentative definition. However, gcc
10107 and LLVM clang
11108 changed their default so that tentative definitions cannot be merged and the linker will give an error if the same variable is defined in more than one object file. To avoid this, all but one of the C source files should declare the variable extern
— which means that any such variables included in header files need to be declared extern
. A commonly used idiom (including by R itself) is to define all global variables as extern
in a header, say globals.h
(and nowhere else), and then in one (and one only) source file use
Most OSes (including all those commonly used for R) have the concept of ‘tentative definitions’ where global C variables are defined without an initializer. Traditionally the linker resolves all tentative definitions of the same variable in different object files to the same object, or to a non-tentative definition. However, gcc
10108 and LLVM clang
11109 changed their default so that tentative definitions cannot be merged and the linker will give an error if the same variable is defined in more than one object file. To avoid this, all but one of the C source files should declare the variable extern
— which means that any such variables included in header files need to be declared extern
. A commonly used idiom (including by R itself) is to define all global variables as extern
in a header, say globals.h
(and nowhere else), and then in one (and one only) source file use
#define extern
# include "globals.h"
#undef extern
A cleaner approach is not to have global variables at all, but to place in a single file common variables (declared static
) followed by all the functions which make use of them: this may result in more efficient code.
The ‘modern’ behaviour can be seen109 by using compiler flag -fno-common
as part of CFLAGS
in earlier versions of gcc
and clang
.
109 In principle this could depend on the OS, but has been checked on Linux and macOS.
The ‘modern’ behaviour can be seen110 by using compiler flag -fno-common
as part of CFLAGS
in earlier versions of gcc
and clang
.
110 In principle this could depend on the OS, but has been checked on Linux and macOS.
-fno-common
is said to be particularly beneficial for ARM CPUs.
This is not pertinent to C++ which does not permit tentative definitions.
For many years almost all known R platforms used gfortran
as their Fortran compiler, but now there are LLVM and ‘classic’ flang
and the Intel compilers ifort
110 and ifx
are now free-of-change.
110 discontinued in 2023.
For many years almost all known R platforms used gfortran
as their Fortran compiler, but now there are LLVM and ‘classic’ flang
and the Intel compilers ifort
111 and ifx
are now free-of-change.
111 discontinued in 2023.
There is still a lot of Fortran code in CRAN packages which predates Fortran 77. Modern Fortran compilers are being written to target a minimum standard of Fortran 2018. and it is desirable that Fortran code in packages complies with that standard. For gfortran
this can be checked by adding -std=f2018
to FFLAGS
. The most commonly seen issues are
The use of DFLOAT
, which was superseded by DBLE
in Fortran 77. Also, use of DCMPLX
, DCONJG
, DIMAG
and similar.
The use of GNU Fortran extensions. Some are listed at https://gcc.gnu.org/onlinedocs/gfortran/Extensions-implemented-in-GNU-Fortran.html. Others which have caused problems include etime
, getpid
, isnan
111 and sizeof
.
The use of GNU Fortran extensions. Some are listed at https://gcc.gnu.org/onlinedocs/gfortran/Extensions-implemented-in-GNU-Fortran.html. Others which have caused problems include etime
, getpid
, isnan
112 and sizeof
.
One that frequently catches package writers is that it allows out-of-order declarations: in standard-conformant Fortran variables must be declared (explicitly or implicitly) before use in other declarations such as dimensions.
111 There is a portable way to do this in Fortran 2003 (ieee_is_nan()
in module ieee_arithmetic
), but that was not supported in the versions 4.x of GNU Fortran. A pretty robust alternative is to test if(my_var /= my_var)
.
Unfortunately this flags extensions such as DOUBLE COMPLEX
and COMPLEX*16
. R has tested that DOUBLE COMPLEX
works and so is preferred to COMPLEX*16
. (One can also use something like COMPLEX(KIND=KIND(0.0D0))
.)
112 There is a portable way to do this in Fortran 2003 (ieee_is_nan()
in module ieee_arithmetic
), but that was not supported in the versions 4.x of GNU Fortran. A pretty robust alternative is to test if(my_var /= my_var)
.
Unfortunately this flags extensions such as DOUBLE COMPLEX
and COMPLEX*16
. R has tested that DOUBLE COMPLEX
works and so is preferred to COMPLEX*16
. (One can also use something like COMPLEX(KIND=KIND(0.0D0))
.)
GNU Fortran 10 and later give a compilation error for the previously widespread practice of passing a Fortran array element where an array is expected, or a scalar instead of a length-one array. See https://gcc.gnu.org/gcc-10/porting_to.html. As do the Intel Fortran compilers, and they can be stricter.
The use of IMPLICIT NONE
is highly recommended – Intel compilers with -warn
will warn on variables without an explicit type.
Common non-portable constructions include
diff --git a/r-exts/Function-and-variable-index.html b/r-exts/Function-and-variable-index.html index 7ee265a..2773050 100644 --- a/r-exts/Function-and-variable-index.html +++ b/r-exts/Function-and-variable-index.html @@ -2055,7 +2055,7 @@PKG_FCFLAGS
:pow1p
:PRINTNAME
:PrintValue
:prompt
:PROTECT
:protect
:PROTECT_WITH_INDEX
:psigamma
:PutRNGstate
:qsort3
:qsort4
:R CMD build
:R CMD check
:R CMD config
:R CMD Rd2pdf
:R CMD Rdconv
:R CMD SHLIB
:R CMD Stangle
:R CMD Sweave
:RAW
:rchkusr
:Rdqagi
:Rdqags
:REAL
:Realloc
:realpr
:realpr1
:recover
:reEnc
:REprintf
:REPROTECT
:REvprintf
:revsort
:Rf_endEmbeddedR
:Rf_initEmbeddedR
:Rf_initialize_R
:Rf_KillAllDevices
:Rf_mainloop
:Riconv
:Riconv_close
:Riconv_open
:Rprintf
:Rprof
:Rprof
:Rprofmem
:rPsort
:rsort_with_index
:Rtanpi
:run_Rmainloop
:Rvprintf
:R_addhistory
:R_alloc
:R_allocLD
:R_Busy
:R_Calloc
:R_CheckUserInterrupt
:R_ChooseFile
:R_CleanTempDir
:R_CleanUp
:R_ClearerrConsole
:R_ClearExternalPtr
:R_compute_identical
:R_ContinueUnwind
:R_csort
:R_dataentry
:R_dataviewer
:R_DefParams
:R_DefParamsEx
:R_dot_Last
:R_EditFile
:R_EditFiles
:R_ExpandFileName
:R_ExternalPtrAddr
:R_ExternalPtrAddrFn
:R_ExternalPtrProtected
:R_ExternalPtrTag
:R_FINITE
:R_FlushConsole
:R_forceSymbols
:R_Free
:R_free_tmpnam
:R_GetCCallable
:R_GetCurrentEnv
:R_GetCurrentSrcref
:R_getEmbeddingDllInfo
:R_GetSrcFilename
:R_INLINE
:R_IsNaN
:R_isort
:R_LIBRARY_DIR
:R_loadhistory
:R_MakeExternalPtr
:R_MakeExternalPtrFn
:R_MakeUnwindCont
:R_MakeWeakRef
:R_MakeWeakRefC
:R_max_col
:R_NegInf
:R_NewEnv
:R_NewPreciousMSet
:R_orderVector
:R_orderVector1
:R_PACKAGE_DIR
:R_PACKAGE_DIR
:R_PACKAGE_NAME
:R_PACKAGE_NAME
:R_ParseVector
:R_PolledEvents
:R_PosInf
:R_pow
:R_pow_di
:R_PreserveInMSet
:R_PreserveObject
:R_ProcessEvents
:R_ProtectWithIndex
:R_PV
:R_qsort
:R_qsort_I
:R_qsort_int
:R_qsort_int_I
:R_ReadConsole
:R_Realloc
:R_RegisterCCallable
:R_RegisterCFinalizer
:R_RegisterCFinalizerEx
:R_RegisterFinalizer
:R_RegisterFinalizerEx
:R_registerRoutines
:R_ReleaseFromMSet
:R_ReleaseObject
:R_ReplDLLdo1
:R_ReplDLLinit
:R_Reprotect
:R_ResetConsole
:R_rsort
:R_RunExitFinalizers
:R_RunWeakRefFinalizer
:R_SaveGlobalEnv
:R_savehistory
:R_selectlist
:R_SetExternalPtrAddr
:R_SetExternalPtrProtected
:R_SetExternalPtrTag
:R_SetParams
:R_setStartTime
:R_set_command_line_arguments
:R_ShowFiles
:R_ShowMessage
:R_Srcref
:R_Suicide
:R_tmpnam
:R_tmpnam2
:R_tryCatch
:R_tryCatchError
:R_unif_index
:R_UnwindProtect
:R_useDynamicSymbols
:R_Version
:R_wait_usec
:R_WeakRefKey
:R_WeakRefValue
:R_withCallingErrorHandler
:R_WriteConsole
:R_WriteConsoleEx
:S3method
:SAFE_FFLAGS
:samin
:ScalarComplex
:ScalarInteger
:ScalarLogical
:ScalarRaw
:ScalarReal
:ScalarString
:setAttrib
:SETCAD4R
:SETCADDDR
:SETCADDR
:SETCADR
:SETCAR
:SETCDR
:setup_Rmainloop
:setVar
:SET_STRING_ELT
:SET_TAG
:SET_VECTOR_ELT
:sign
:signrank_free
:sinpi
:str2type
:STRING_ELT
:summaryRprof
:system
:system.time
:system2
:S_alloc
:S_realloc
:TAG
:tanpi
:tetragamma
:trace
:traceback
:tracemem
:translateChar
:translateCharUTF8
:trigamma
:TRUE
:type2char
:type2str
:TYPEOF
:undebug
:unif_rand
:UNPROTECT
:unprotect
:UNPROTECT_PTR
:unprotect_ptr
:untracemem
:useDynLib
:VECTOR_ELT
:vmaxget
:vmaxset
:vmmin
:warning
:warningcall
:warningcall_immediate
:wilcox_free
:Special care is needed in handling character
vector arguments in C (or C++). On entry the contents of the elements are duplicated and assigned to the elements of a char **
array, and on exit the elements of the C array are copied to create new elements of a character vector. This means that the contents of the character strings of the char **
array can be changed, including to \0
to shorten the string, but the strings cannot be lengthened. It is possible3 to allocate a new string via R_alloc
and replace an entry in the char **
array by the new string. However, when character vectors are used other than in a read-only way, the .Call
interface is much to be preferred.
3 Note that this is then not checked for over-runs by option CBoundsCheck = TRUE
.
Passing character strings to Fortran code needs even more care, is deprecated and should be avoided where possible. Only the first element of the character vector is passed in, as a fixed-length (255) character array. Up to 255 characters are passed back to a length-one character vector. How well this works (or even if it works at all) depends on the C and Fortran compilers on each platform (including on their options). Often what is being passed to Fortran is one of a small set of possible values (a factor in R terms) which could alternatively be passed as an integer code: similarly Fortran code that wants to generate diagnostic messages could pass an integer code to a C or R wrapper which would convert it to a character string.
It is possible to pass some R objects other than atomic vectors via .C
, but this is only supported for historical compatibility: use the .Call
or .External
interfaces for such objects. Any C/C++ code that includes Rinternals.h
should be called via .Call
or .External
.
.Fortran
is primarily intended for Fortran 77 code, and long precedes any support for Fortran 9x. Now current implementations of Fortran 9x support the Fortran 2003 module iso_c_binding
, a better way to interface modern Fortran code to R is to use .C
and write a C interface using use iso_c_binding
.
.Fortran
is primarily intended for Fortran 77 code, and long precedes any support for ‘modern’ Fortran. Nowadays implementations of Fortran support the Fortran 2003 module iso_c_binding
, a better way to interface modern Fortran code to R is to use .C
and write a C interface using use iso_c_binding
.
dyn.load
and dyn.unload
There are a few other numerical utility functions available as entry points.
Function:double R_pow (double x
, double y
) ¶
-Function:double R_pow_di (double x
, int i
) ¶
x
, int i
) ¶x
, double y
) ¶
: R_pow(x, y)
and R_pow_di(x, i)
compute x^y
and x^i
, respectively using R_FINITE
checks and returning the proper result (the same as R) for the cases where x
, y
or i
are 0 or missing or infinite or NaN
.
`pow1p(x, y)` computes
+`(1 + x)^y`, accurately even for small
+`x`, i.e., \|x\| \<\< 1.
x
) ¶Computes cos(pi * x)
(where pi
is 3.14159...), accurately, notably for half integer x
.
This might be provided by your platform5, in which case it is not included in Rmath.h
, but is in math.h
which Rmath.h
includes. (Ensure that neither math.h
nor cmath
is included before Rmath.h
or define
5 It is an optional C11 extension.
#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1
#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1
before the first inclusion.)
The C code underlying optim
can be accessed directly. The user needs to supply a function to compute the function to be minimized, of the type
typedef double optimfn(int n, double *par, void *ex);
typedef double optimfn(int n, double *par, void *ex);
where the first argument is the number of parameters in the second argument. The third argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.
Some of the methods also require a gradient function
-typedef void optimgr(int n, double *par, double *gr, void *ex);
typedef void optimgr(int n, double *par, double *gr, void *ex);
which passes back the gradient in the gr
argument. No function is provided for finite-differencing, nor for approximating the Hessian at the result.
The interfaces (defined in header R_ext/Applic.h
) are
Nelder Mead:
-void nmmin(int n, double *xin, double *x, double *Fmin, optimfn fn,
-int *fail, double abstol, double intol, void *ex,
- double alpha, double beta, double gamma, int trace,
- int *fncount, int maxit);
void nmmin(int n, double *xin, double *x, double *Fmin, optimfn fn,
+int *fail, double abstol, double intol, void *ex,
+ double alpha, double beta, double gamma, int trace,
+ int *fncount, int maxit);
BFGS:
-void vmmin(int n, double *x, double *Fmin,
-, optimgr gr, int maxit, int trace,
- optimfn fnint *mask, double abstol, double reltol, int nREPORT,
- void *ex, int *fncount, int *grcount, int *fail);
void vmmin(int n, double *x, double *Fmin,
+, optimgr gr, int maxit, int trace,
+ optimfn fnint *mask, double abstol, double reltol, int nREPORT,
+ void *ex, int *fncount, int *grcount, int *fail);
Conjugate gradients:
-void cgmin(int n, double *xin, double *x, double *Fmin,
-, optimgr gr, int *fail, double abstol,
- optimfn fndouble intol, void *ex, int type, int trace,
- int *fncount, int *grcount, int maxit);
void cgmin(int n, double *xin, double *x, double *Fmin,
+, optimgr gr, int *fail, double abstol,
+ optimfn fndouble intol, void *ex, int type, int trace,
+ int *fncount, int *grcount, int maxit);
Limited-memory BFGS with bounds:
-void lbfgsb(int n, int lmm, double *x, double *lower,
-double *upper, int *nbd, double *Fmin, optimfn fn,
- , int *fail, void *ex, double factr,
- optimgr grdouble pgtol, int *fncount, int *grcount,
- int maxit, char *msg, int trace, int nREPORT);
void lbfgsb(int n, int lmm, double *x, double *lower,
+double *upper, int *nbd, double *Fmin, optimfn fn,
+ , int *fail, void *ex, double factr,
+ optimgr grdouble pgtol, int *fncount, int *grcount,
+ int maxit, char *msg, int trace, int nREPORT);
Simulated annealing:
-void samin(int n, double *x, double *Fmin, optimfn fn, int maxit,
-int tmax, double temp, int trace, void *ex);
void samin(int n, double *x, double *Fmin, optimfn fn, int maxit,
+int tmax, double temp, int trace, void *ex);
Many of the arguments are common to the various methods. n
is the number of parameters, x
or xin
is the starting parameters on entry and x
the final parameters on exit, with final value returned in Fmin
. Most of the other parameters can be found from the help page for optim
: see the source code src/appl/lbfgsb.c
for the values of nbd
, which specifies which bounds are to be used.
The C code underlying integrate
can be accessed directly. The user needs to supply a vectorizing C function to compute the function to be integrated, of the type
typedef void integr_fn(double *x, int n, void *ex);
typedef void integr_fn(double *x, int n, void *ex);
where x[]
is both input and output and has length n
, i.e., a C function, say fn
, of type integr_fn
must basically do for(i in 1:n) x[i] := f(x[i], ex)
. The vectorization requirement can be used to speed up the integrand instead of calling it n
times. Note that in the current implementation built on QUADPACK, n
will be either 15 or 21. The ex
argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.
There are interfaces (defined in header R_ext/Applic.h
) for integrals over finite and infinite intervals (or “ranges” or “integration boundaries”).
Finite:
-void Rdqags(integr_fn f, void *ex, double *a, double *b,
-double *epsabs, double *epsrel,
- double *result, double *abserr, int *neval, int *ier,
- int *limit, int *lenw, int *last,
- int *iwork, double *work);
Infinite:
-void Rdqagi(integr_fn f, void *ex, double *bound, int *inf,
+void Rdqags(integr_fn f, void *ex, double *a, double *b,
double *epsabs, double *epsrel,
double *result, double *abserr, int *neval, int *ier,
int *limit, int *lenw, int *last,
int *iwork, double *work);
+Infinite:
+void Rdqagi(integr_fn f, void *ex, double *bound, int *inf,
+double *epsabs, double *epsrel,
+ double *result, double *abserr, int *neval, int *ier,
+ int *limit, int *lenw, int *last,
+ int *iwork, double *work);
Only the 3rd and 4th argument differ for the two integrators; for the finite range integral using Rdqags
, a
and b
are the integration interval bounds, whereas for an infinite range integral using Rdqagi
, bound
is the finite bound of the integration (if the integral is not doubly-infinite) and inf
is a code indicating the kind of integration range,
@@ -1231,9 +1235,9 @@
f
and ex
define the integrand function, see above; epsabs
and epsrel
specify the absolute and relative accuracy requested, result
, abserr
and last
are the output components value
, abs.err
and subdivisions
of the R function integrate, where neval
gives the number of integrand function evaluations, and the error code ier
is translated to R’s integrate() $ message
, look at that function definition. limit
corresponds to integrate(..., subdivisions = *)
. It seems you should always define the two work arrays and the length of the second one as
-= 4 * limit;
- lenw = (int *) R_alloc(limit, sizeof(int));
- iwork = (double *) R_alloc(lenw, sizeof(double)); work
+= 4 * limit;
+ lenw = (int *) R_alloc(limit, sizeof(int));
+ iwork = (double *) R_alloc(lenw, sizeof(double)); work
The comments in the source code in src/appl/integrate.c
give more details, particularly about reasons for failure (ier >= 1
).
@@ -1347,39 +1351,39 @@
6.12 Condition handling and cleanup code
Three functions are available for establishing condition handlers from within C code:
-#include <Rinternals.h>
-
-(SEXP (*fun)(void *data), void *data,
- SEXP R_tryCatchError(*hndlr)(SEXP cond, void *hdata), void *hdata);
- SEXP
-(SEXP (*fun)(void *data), void *data,
- SEXP R_tryCatch,
- SEXP(*hndlr)(SEXP cond, void *hdata), void *hdata,
- SEXP void (*clean)(void *cdata), void *cdata);
- (SEXP (*fun)(void *data), void *data,
- SEXP R_withCallingErrorHandler(*hndlr)(SEXP cond, void *hdata), void *hdata) SEXP
+#include <Rinternals.h>
+
+(SEXP (*fun)(void *data), void *data,
+ SEXP R_tryCatchError(*hndlr)(SEXP cond, void *hdata), void *hdata);
+ SEXP
+(SEXP (*fun)(void *data), void *data,
+ SEXP R_tryCatch,
+ SEXP(*hndlr)(SEXP cond, void *hdata), void *hdata,
+ SEXP void (*clean)(void *cdata), void *cdata);
+ (SEXP (*fun)(void *data), void *data,
+ SEXP R_withCallingErrorHandler(*hndlr)(SEXP cond, void *hdata), void *hdata) SEXP
R_tryCatchError
establishes an exiting handler for conditions inheriting form class error
.
R_tryCatch
can be used to establish a handler for other conditions and to register a cleanup action. The conditions to be handled are specified as a character vector (STRSXP
). A NULL
pointer can be passed as fun
or clean
if condition handling or cleanup are not needed.
These are currently implemented using the R-level tryCatch
mechanism so are subject to some overhead.
R_withCallingErrorHandler
establishes a calling handler for conditions inheriting form class error
. It establishes the handler without calling back into R and will therefore be more efficient.
The function R_UnwindProtect
can be used to ensure that a cleanup action takes place on ordinary return as well as on a non-local transfer of control, which R implements as a longjmp
.
-(SEXP (*fun)(void *data), void *data,
- SEXP R_UnwindProtectvoid (*clean)(void *data, Rboolean jump), void *cdata,
- ); SEXP cont
+(SEXP (*fun)(void *data), void *data,
+ SEXP R_UnwindProtectvoid (*clean)(void *data, Rboolean jump), void *cdata,
+ ); SEXP cont
R_UnwindProtect
can be used in two ways. The simper usage, suitable for use in C code, passes NULL
for the cont
argument. R_UnwindProtect
will call fun(data)
. If fun
returns a value, then R_UnwindProtect
calls clean(cleandata, FALSE)
before returning the value returned by fun
. If fun
executes a non-local transfer of control, then clean(cleandata, TRUE)
is called, and the non-local transfer of control is resumed.
The second use pattern, suitable to support C++ stack unwinding, uses two additional functions:
-();
- SEXP R_MakeUnwindContvoid R_ContinueUnwind(SEXP cont); NORET
+();
+ SEXP R_MakeUnwindContvoid R_ContinueUnwind(SEXP cont); NORET
R_MakeUnwindCont
allocates a continuation token cont
to pass to R_UnwindProtect
. This token should be protected with PROTECT
before calling R_UnwindProtect
. When the clean
function is called with jump == TRUE
, indicating that R is executing a non-local transfer of control, it can throw a C++ exception to a C++ catch
outside the C++ code to be unwound, and then use the continuation token in the a call R_ContinueUnwind(cont)
to resume the non-local transfer of control within R.
6.13 Allowing interrupts
No part of R can be interrupted whilst running long computations in compiled code, so programmers should make provision for the code to be interrupted at suitable points by calling from C
-#include <R_ext/Utils.h>
-
-void R_CheckUserInterrupt(void);
+#include <R_ext/Utils.h>
+
+void R_CheckUserInterrupt(void);
and from Fortran
-rchkusr() subroutine
+rchkusr() subroutine
These check if the user has requested an interrupt, and if so branch to R’s error signaling functions.
Note that it is possible that the code behind one of the entry points defined here if called from your C or Fortran code could be interruptible or generate an error and so not return to your code.
@@ -1388,67 +1392,67 @@ Header file Rconfig.h
(included by R.h
) is used to define platform-specific macros that are mainly for use in other header files. The macro WORDS_BIGENDIAN
is defined on big-endian6 systems (e.g. most OSes on Sparc and PowerPC hardware) and not on little-endian systems (nowadays all the commoner R platforms). It can be useful when manipulating binary files. NB: these macros apply only to the C compiler used to build R, not necessarily to another C or C++ compiler.
Header file Rversion.h
(not included by R.h
) defines a macro R_VERSION
giving the version number encoded as an integer, plus a macro R_Version
to do the encoding. This can be used to test if the version of R is late enough, or to include back-compatibility features. For protection against very old versions of R which did not have this macro, use a construction such as
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
-...
- #endif
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
+...
+ #endif
More detailed information is available in the macros R_MAJOR
, R_MINOR
, R_YEAR
, R_MONTH
and R_DAY
: see the header file Rversion.h
for their format. Note that the minor version includes the patchlevel (as in 2.2
).
Packages which use alloca
need to ensure it is defined: as it is part of neither C nor POSIX there is no standard way to do so. One can use
#include <Rconfig.h> // for HAVE_ALLOCA_H
-#ifdef __GNUC__
-// this covers gcc, clang, icc
-# undef alloca
-# define alloca(x) __builtin_alloca((x))
-#elif defined(HAVE_ALLOCA_H)
-// needed for native compilers on Solaris and AIX
-# include <alloca.h>
-#endif
#include <Rconfig.h> // for HAVE_ALLOCA_H
+#ifdef __GNUC__
+// this covers gcc, clang, icc
+# undef alloca
+# define alloca(x) __builtin_alloca((x))
+#elif defined(HAVE_ALLOCA_H)
+// needed for native compilers on Solaris and AIX
+# include <alloca.h>
+#endif
(and this should be included before standard C headers such as stdlib.h
, since on some platforms these include malloc.h
which may have a conflicting definition), which suffices for known R platforms.
The C99 keyword inline
should be recognized by all compilers nowadays used to build R. Portable code which might be used with earlier versions of R can be written using the macro R_INLINE
(defined in file Rconfig.h
included by R.h
), as for example from package cluster
#include <R.h>
-
-static R_INLINE int ind_2(int l, int j)
-{
-...
-}
#include <R.h>
+
+static R_INLINE int ind_2(int l, int j)
+{
+...
+}
Be aware that using inlining with functions in more than one compilation unit is almost impossible to do portably, see https://www.greenend.org.uk/rjk/tech/inline.html, so this usage is for static
functions as in the example. All the R configure code has checked is that R_INLINE
can be used in a single C file with the compiler used to build R. We recommend that packages making extensive use of inlining include their own configure code.
Header R_ext/Visibility.h
has some definitions for controlling the visibility of entry points. These are only effective when HAVE_VISIBILITY_ATTRIBUTE
is defined – this is checked when R is configured and recorded in header Rconfig.h
(included by R_ext/Visibility.h
). It is often defined on modern Unix-alikes with a recent compiler7, but not supported on macOS nor Windows. Minimizing the visibility of symbols in a shared library will both speed up its loading (unlikely to be significant) and reduce the possibility of linking to other entry points of the same name.
7 It is defined by the Intel compilers, but also hides unsatisfied references and so cannot be used with R. It was not supported by the AIX nor Solaris compilers.
C/C++ entry points prefixed by attribute_hidden
will not be visible in the shared object. There is no comparable mechanism for Fortran entry points, but there is a more comprehensive scheme used by, for example package stats. Most compilers which allow control of visibility will allow control of visibility for all symbols via a flag, and where known the flag is encapsulated in the macros C_VISIBILITY
, CXX_VISIBILITY
8 and F_VISIBILITY
for C, C++ and Fortran compilers.9 These are defined in etc/Makeconf
and so available for normal compilation of package code. For example, src/Makevars
could include some of
8 This applies to the compiler for the default C++ dialect (currently C++11) and not necessarily to other dialects.
9 In some cases Fortran compilers accept the flag but do not actually hide their symbols.
=$(C_VISIBILITY)
- PKG_CFLAGS=$(CXX_VISIBILITY)
- PKG_CXXFLAGS=$(F_VISIBILITY) PKG_FFLAGS
=$(C_VISIBILITY)
+ PKG_CFLAGS=$(CXX_VISIBILITY)
+ PKG_CXXFLAGS=$(F_VISIBILITY) PKG_FFLAGS
This would end up with no visible entry points, which would be pointless. However, the effect of the flags can be overridden by using the attribute_visible
prefix. A shared object which registers its entry points needs only for have one visible entry point, its initializer, so for example package stats has
void attribute_visible R_init_stats(DllInfo *dll)
-{
-(dll, CEntries, CallEntries, FortEntries, NULL);
- R_registerRoutines(dll, FALSE);
- R_useDynamicSymbols...
-}
void attribute_visible R_init_stats(DllInfo *dll)
+{
+(dll, CEntries, CallEntries, FortEntries, NULL);
+ R_registerRoutines(dll, FALSE);
+ R_useDynamicSymbols...
+}
Because the C_VISIBILITY
mechanism is only useful in conjunction with attribute_visible
, it is not enabled unless HAVE_VISIBILITY_ATTRIBUTE
is defined. The usual visibility flag is -fvisibility=hidden
: some compilers also support -fvisibility-inlines-hidden
which can be used by overriding C_VISIBILITY
and CXX_VISIBILITY
in config.site
when building R, or editing etc/Makeconf
in the R installation.
Note that configure
only checks that visibility attributes and flags are accepted, not that they actually hide symbols.
The visibility mechanism is not available on Windows, but there is an equally effective way to control which entry points are visible, by supplying a definitions file pkgnme/src/pkgname-win.def
: only entry points listed in that file will be visible. Again using stats as an example, it has
.dll
- LIBRARY stats
- EXPORTS R_init_stats
.dll
+ LIBRARY stats
+ EXPORTS R_init_stats
It is possible to build Mathlib
, the R set of mathematical functions documented in Rmath.h
, as a standalone library libRmath
under both Unix-alikes and Windows. (This includes the functions documented in Numerical analysis subroutines as from that header file.)
The library is not built automatically when R is installed, but can be built in the directory src/nmath/standalone
in the R sources: see the file README
there. To use the code in your own C program include
#define MATHLIB_STANDALONE
-#include <Rmath.h>
#define MATHLIB_STANDALONE
+#include <Rmath.h>
and link against -lRmath
(and perhaps -lm
). There is an example file test.c
.
A little care is needed to use the random-number routines. You will need to supply the uniform random number generator
-double unif_rand(void)
double unif_rand(void)
or use the one supplied (and with a dynamic library or DLL you will have to use the one supplied, which is the Marsaglia-multicarry with an entry points
-(unsigned int, unsigned int) set_seed
(unsigned int, unsigned int) set_seed
to set its seeds and
-(unsigned int *, unsigned int *) get_seed
(unsigned int *, unsigned int *) get_seed
to read the seeds).
All of these alternatives apart from the first (an int
) are three pointers, so the union occupies three words.
The vector types are RAWSXP
, CHARSXP
, LGLSXP
, INTSXP
, REALSXP
, CPLXSXP
, STRSXP
, VECSXP
, EXPRSXP
and WEAKREFSXP
. Remember that such types are a VECTOR_SEXPREC
, which again consists of the header and the same three pointers, but followed by two integers giving the length and ‘true length’3 of the vector, and then followed by the data (aligned as required: on most 32-bit systems with a 24-byte VECTOR_SEXPREC
node the data can follow immediately after the node). The data are a block of memory of the appropriate length to store ‘true length’ elements (rounded up to a multiple of 8 bytes, with the 8-byte blocks being the ‘Vcells’ referred in the documentation for gc()
).
3 The only current use is for hash tables of environments (VECSXP
s), where length
is the size of the table and truelength
is the number of primary slots in use, for the reference hash tables in serialization (VECSXP
s), and for ‘growable’ vectors (atomic vectors, VECSXP
s and EXPRSXP
s) which are created by slightly over-committing when enlarging a vector during subassignment, so that some number of the following enlargements during subassignment can be performed in place), where truelength
is the number of slots in use.
The vector types are RAWSXP
, CHARSXP
, LGLSXP
, INTSXP
, REALSXP
, CPLXSXP
, STRSXP
, VECSXP
, EXPRSXP
and WEAKREFSXP
. Remember that such types are a VECTOR_SEXPREC
, which again consists of the header and the same three pointers, but followed by two integers giving the length and ‘true length’3 of the vector, and then followed by the data (aligned as required: on most 32-bit systems with a 28-byte VECTOR_SEXPREC
node the data can follow immediately after the node). The data are a block of memory of the appropriate length to store ‘true length’ elements (rounded up to a multiple of 8 bytes, with the 8-byte blocks being the ‘Vcells’ referred in the documentation for gc()
).
3 The only current use is for hash tables of environments (VECSXP
s), where length
is the size of the table and truelength
is the number of primary slots in use, for the reference hash tables in serialization (VECSXP
s), and for ‘growable’ vectors (atomic vectors, VECSXP
s and EXPRSXP
s) which are created by slightly over-committing when enlarging a vector during subassignment, so that some number of the following enlargements during subassignment can be performed in place), where truelength
is the number of slots in use.
The ‘data’ for the various types are given in the table below. A lot of this is interpretation, i.e. the types are not checked.
NILSXP