Replies: 7 comments 6 replies
-
Have you think about using sparse matrices instead of dense ones?, those sizes are already big |
Beta Was this translation helpful? Give feedback.
-
I wanted to share some images that show the consistency of the LSPG approach for different strategies. As you can see in the attached images, the LSPG approach exhibits good consistency across different strategies. I hope these images help to illustrate the robustness of the LSPG approach with the new strategies. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
In addition to the earlier comparisons, I also conducted additional tests by integrating the Eigen library into both the Elemental Galerkin of the master and an older B&S implementation. The results are presented in the following figure: It is evident that incorporating the Eigen library results in a significant improvement in the total computation time. Interestingly, the old ROM B&S, which was less complex and utilized pragma instead of block for each, outperformed the newer one. These observations suggest that integrating the Eigen library can lead to substantial computational improvements. |
Beta Was this translation helpful? Give feedback.
-
FYI @pooyan-dadvand |
Beta Was this translation helpful? Give feedback.
-
Comparison of Galerkin Ublas (Elemental) vs Galerkin Eigen (Global)In the previous approach, the Galerkin reduced order model (ROM) system of equations was built element by element using the following expression: However, when constructing the ROM system for a basis
This approach utilizes Eigen library instead of Ublas for better performance. The comparison of the two approaches is shown in the figure below: From the results, it can be seen that the Galerkin Eigen (Global) approach is significantly faster for larger basis sizes. Therefore, it is recommended to use this approach when building the ROM system for a basis with a large number of modes. |
Beta Was this translation helpful? Give feedback.
-
Identifying the Performance Bottleneck in
BuildROM
After profiling the
BuildROM
function, I found that the majority of the time was being spent on the second matrix product in the following three lines:As we use more modes, the
BuildROM
function takes longer to execute due to this bottleneck. It is worth noting that theGetPhiElemental
function, which I initially suspected to be the bottleneck, was not the issue.These are the formulas:
$\mathbf{Aux}^e = \mathbf{J}^e\mathbf{\Phi}^e$
First:
Second:
$\mathbf{A}^e = \mathbf{\Phi}^{eT}\mathbf{Aux}^e \longleftarrow \text{Expensive}$
Third:
$\mathbf{b}^e = \mathbf{\Phi}^{eT}\mathbf{R}^e$
For example, a test was conducted with
ndofs=9
andnromdofs=384
, and the average times for each element were:The total
BuildROM
time was 4.79021 sec for all 6144 elements. It was found that the second product takes up most of the time due to its matrix dimensions. Forndofs=9
andnromdofs=384
, the dimensions of the matrices involved are:Conclusion
In conclusion, after profiling the
BuildROM
function, we found that the majority of the time was being spent on the second matrix product. This was confirmed by measuring the time for each of the three products individually, and it was found that the second product took significantly longer than the other two.We ran the time measurement with a
KRATOS_INFO_IF()
and I made that it did not add significant time to the overall execution. Furthermore, the profiling was done with a single thread, so we do not need to worry about parallelism.It seems that there is no obvious way to improve the performance of the second matrix product as the product seems to be already optimized, and as we increase the number of modes, the
BuildROM
time will increase significantly. This may become a bottleneck in larger simulations, and we should be aware of its impact on performance.Further Steps
To continue improving the performance of the BuildROM function, we plan to take the following steps:
By taking these steps, we hope to further optimize the performance of the BuildROM function and reduce its impact on larger simulations.
Beta Was this translation helpful? Give feedback.
All reactions