You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The row tensor product in the %X%-operator (https://github.com/boost-R/mboost/blob/master/R/bl.R#L1111) may fail for large matrices due to a stack overflow or "long vectors not supported yet"-error. This is particularly relevant when utilizing mboost for functional regression model estimation with FDboost.
I do not know the exact number of rows and columns for each matrix to trigger this problem, but the calculation definitely fails with number of rows in the range of 10^5 and at least one matrix with number of columns in the range of 10^2. In my case, for example, the dimension are c(4e5, 500) and c(4e5, 25).
An alternative approach, which does not crash for matrices of this size, could be
I have run some benchmarks to compare those with the original implementation. Both alternatives are not as fast as the original row tensor product in the case of sparse matrices for X1 and X2 (the results for 1000 rows and 500 / 20 columns are, for example,
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# sapply 1301.6774 1336.169 1368.185 1347.906 1375.014 1799.021 100 c
# lapply 1260.4451 1273.533 1306.315 1293.116 1318.661 1706.904 100 b
# original 973.6239 1066.606 1117.302 1111.325 1143.363 1526.042 100 a
) but seem to be (often much) faster for all non-sparse cases. Additionally the apply-versions do not fail for very large matrices and may even be parallalized (yielding a multiple faster computation in the order of number of used cores). Using the data.table-package one may further speed up the calculations by replacing do.call("rbind",...) with rbindlist.
Another more complicated solution could be a new Matrix-like class specifically for the representation of row tensor products, working and operating without the explicit calculation of the product itself.
The text was updated successfully, but these errors were encountered:
where KhatriRao is the KhatriRao-function in the Matrix package (calculating the eponymous matrix product). For smaller models, this function may on average be a bit slower than the original approach, but shows better worst case performance in simulations and -- most important -- does not fail to build the row-wise tensor product for large matrices. @sbrockhaus
The row tensor product in the
%X%
-operator (https://github.com/boost-R/mboost/blob/master/R/bl.R#L1111) may fail for large matrices due to a stack overflow or "long vectors not supported yet"-error. This is particularly relevant when utilizing mboost for functional regression model estimation with FDboost.I do not know the exact number of rows and columns for each matrix to trigger this problem, but the calculation definitely fails with number of rows in the range of
10^5
and at least one matrix with number of columns in the range of10^2
. In my case, for example, the dimension arec(4e5, 500)
andc(4e5, 25)
.An alternative approach, which does not crash for matrices of this size, could be
or
I have run some benchmarks to compare those with the original implementation. Both alternatives are not as fast as the original row tensor product in the case of sparse matrices for
X1
andX2
(the results for 1000 rows and 500 / 20 columns are, for example,) but seem to be (often much) faster for all non-sparse cases. Additionally the apply-versions do not fail for very large matrices and may even be parallalized (yielding a multiple faster computation in the order of number of used cores). Using the data.table-package one may further speed up the calculations by replacing
do.call("rbind",...)
withrbindlist
.Another more complicated solution could be a new Matrix-like class specifically for the representation of row tensor products, working and operating without the explicit calculation of the product itself.
The text was updated successfully, but these errors were encountered: