An issue regarding the use of LAMMPS with the deep potential trained using MACE #44

wxwth · 2024-12-10T14:20:38Z

Dear Dr. Zeng,

I am sorry to bother you for an issue regarding the use of LAMMPS (internally installed within DeePMD-kit 3.0.0) with the deep potential trained using MACE implemented in DeePMD-kit 3.0.0.

Following the detailed guidance provided by Bohrium and the WeChat post by DeepModeling, I added the following command in my job submission script for LAMMPS:

export DP_PLUGIN_PATH=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/libdeepmd_gnn.so

The file libdeepmd_gnn.so does exist in the dictionary /gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/. Additionally, I successfully trained a deep potential using MACE in DeePMD-kit 3.0.0, indicating that there should not be any issues with the DeePMD-kit itself.

However, an error occurs, stating that the libdeepmd_gnn.so file cannot be found:

ERROR on proc 28: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: DeePMD-kit Error: /gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/libdeepmd_gnn.so is not found! You can add the library directory to LD_LIBRARY_PATH (/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/work/source/lmp/pair_deepmd.cpp:539)

I requested someone else to submit the job on another cluster, but they encountered the same error. My job submission script for LAMMPS is attached for reference. Could you kindly provide any expert suggestions to resolve this issue?

Thank you for your time, and I am looking forward to your reply.

Sincerely,
Xu

myjob.sh.txt

The text was updated successfully, but these errors were encountered:

njzjz · 2024-12-10T23:50:51Z

Setting the environment variable LD_DEBUG=libs would print more helpful information.

wxwth · 2024-12-11T17:02:23Z

Dear Dr. Zeng,

Thank you for your reply. Enclosed is the output file generated with the environment variable LD_DEBUG=libs. I am sorry that I cannot find the solution to this issue. Could you kindly take a look? It should be noted that this simulation was performed on a single core to avoid redundant output. Thank you for your time!

Sincerely,
Xu

slurm-1937867.out.txt

njzjz · 2024-12-11T19:24:41Z

It seems that libcuda.so.1 is not found, which is the last file it tries to find before the error is thrown. The error message could be improved, though.

3463222:	find library=libcuda.so.1 [0]; searching
   3463222:	 search path=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/x86_64/x86_64:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/x86_64:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/x86_64:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/x86_64/x86_64:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/x86_64:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/x86_64:/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib		(RPATH from file /gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/libdeepmd_gnn.so)
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/x86_64/x86_64/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/x86_64/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/x86_64/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/tls/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/x86_64/x86_64/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/x86_64/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/x86_64/libcuda.so.1
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/lib/python3.12/site-packages/deepmd_gnn/lib/libcuda.so.1
   3463222:	 search path=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/bin/../lib		(RPATH from file lmp)
   3463222:	  trying file=/gpfs/software/deepmd-kit-3.0.0-cuda12.6_gpu/bin/../lib/libcuda.so.1
   3463222:	 search path=/gpfs/softs/intel/oneapi/mpi/2021.2.0/libfabric/lib:/gpfs/softs/intel/oneapi/mpi/2021.2.0/lib/release:/gpfs/softs/intel/oneapi/mpi/2021.2.0/lib:/gpfs/softs/intel/oneapi/mkl/2021.2.0/lib/intel64:/gpfs/softs/intel/oneapi/tbb/2021.2.0/lib/intel64/gcc4.8:/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/lib:/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/lib/x64:/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/lib/emu:/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin:/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64:/gpfs/softs/intel/oneapi/debugger/10.1.1/dep/lib:/gpfs/softs/intel/oneapi/debugger/10.1.1/gdb/intel64/lib		(LD_LIBRARY_PATH)
   3463222:	  trying file=/gpfs/softs/intel/oneapi/mpi/2021.2.0/libfabric/lib/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/mpi/2021.2.0/lib/release/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/mpi/2021.2.0/lib/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/mkl/2021.2.0/lib/intel64/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/tbb/2021.2.0/lib/intel64/gcc4.8/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/lib/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/lib/x64/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/lib/emu/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/debugger/10.1.1/dep/lib/libcuda.so.1
   3463222:	  trying file=/gpfs/softs/intel/oneapi/debugger/10.1.1/gdb/intel64/lib/libcuda.so.1
   3463222:	 search cache=/etc/ld.so.cache
   3463222:	 search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64		(system search path)
   3463222:	  trying file=/lib64/tls/libcuda.so.1
   3463222:	  trying file=/lib64/libcuda.so.1
   3463222:	  trying file=/usr/lib64/tls/libcuda.so.1
   3463222:	  trying file=/usr/lib64/libcuda.so.1
   3463222:

wxwth · 2024-12-11T21:15:36Z

Dear Dr. Zeng,

Thank you for your reply. I am confused about the requirement for the libcuda.so.1 file, as my job submission script indicates that I am running LAMMPS simulations using only the CPU, not the GPU.

Besides, I can only find the libcuda.so file in the /gpfs/softs/cuda/12.6.2/targets/x86_64-linux/lib/stubs/ dictionary of my cluster, but the libcuda.so.1 file is not present in the corresponding CUDA-12.6.2 directory.

Could you kindly clarify if it is still possible to perform the LAMMPS simulations (with the deep potential trained using MACE implemented in DeePMD-kit 3.0.0) under these conditions? Thank you for your time!

Sincerely,
Xu

njzjz · 2024-12-11T21:46:51Z

Does your pytorch link to libcuda.so.1? libdeepmd_gnn.so does not explicitly link to CUDA.

Fix #44.

xref: deepmodeling/deepmd-gnn#44

xref: deepmodeling/deepmd-gnn#44  ## Summary by CodeRabbit - **New Features** - Enhanced error messages for library loading failures on non-Windows platforms. - Updated thread management environment variable checks for improved compatibility. - Added support for mixed types in tensor input handling, allowing for more flexible configurations. - **Bug Fixes** - Improved error reporting for dynamic library loading issues.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

xref: deepmodeling/deepmd-gnn#44  ## Summary by CodeRabbit - **New Features** - Enhanced error messages for library loading failures on non-Windows platforms. - Updated thread management environment variable checks for improved compatibility. - Added support for mixed types in tensor input handling, allowing for more flexible configurations. - **Bug Fixes** - Improved error reporting for dynamic library loading issues.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit cfe17a3)

xref: njzjz/deepmd-gnn#44  ## Summary by CodeRabbit - **New Features** - Enhanced error messages for library loading failures on non-Windows platforms. - Updated thread management environment variable checks for improved compatibility. - Added support for mixed types in tensor input handling, allowing for more flexible configurations. - **Bug Fixes** - Improved error reporting for dynamic library loading issues.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit cfe17a3)

* change property.npy to any name * Init branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change | to Union * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change sub_var_name default to [] * Solve pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * solve scanning github * fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete useless file * Solve some UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve precommit * slove pre * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve dptest UT, dpatomicmodel UT, code scannisang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete param and * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve UT fail caused by task_dim and property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix permutation error * Add property bias UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover rcond doc * recover blank * Change code according according to coderabbitai * solve pre-commit * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change apply_bias doc * update the version compatibility * feat (tf/pt): add atomic weights to tensor loss (deepmodeling#4466) Interfaces are of particular interest in many studies. However, the configurations in the training set to represent the interface normally also include large parts of the bulk material. As a result, the final model would prefer the bulk information while the interfacial information is less learnt. It is difficult to simply improve the proportion of interfaces in the configurations since the electronic structures of the interface might only be reasonable with a certain thickness of bulk materials. Therefore, I wonder whether it is possible to define weights for atomic quantities in loss functions. This allows us to add higher weights for the atomic information for the regions of interest and probably makes the model "more focused" on the region of interest. In this PR, I add the keyword `enable_atomic_weight` to the loss function of the tensor model. In principle, it could be generalised to any atomic quantity, e.g., atomic forces. I would like to know the developers' comments/suggestions about this feature. I can add support for other loss functions and finish unit tests once we agree on this feature. Best.  ## Summary by CodeRabbit - **New Features** - Introduced an optional parameter for atomic weights in loss calculations, enhancing flexibility in the `TensorLoss` class. - Added a suite of unit tests for the `TensorLoss` functionality, ensuring consistency between TensorFlow and PyTorch implementations. - **Bug Fixes** - Updated logic for local loss calculations to ensure correct application of atomic weights based on user input. - **Documentation** - Improved clarity of documentation for several function arguments, including the addition of a new argument related to atomic weights.  * delete sub_var_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover to property key * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix conflict * Fix UT * Add document of property fitting * Delete checkpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add get_property_name to DeepEvalBackend * pd: fix learning rate setting when resume (deepmodeling#4480) "When resuming training, there is no need to add `self.start_step` to the step count because Paddle uses `lr_sche.last_epoch` as the input for `step`, which already records the `start_step` steps." learning rate are correct after fixing ![22AD6874B74E437E9B133D75ABCC02FE](https://github.com/user-attachments/assets/1ad0ce71-6e1c-4de5-87dc-0daca1f6f038)  ## Summary by CodeRabbit - **New Features** - Enhanced training process with improved optimizer configuration and learning rate adjustments. - Refined logging of training and validation results for clarity. - Improved model saving logic to preserve the latest state during interruptions. - Enhanced tensorboard logging for detailed tracking of training metrics. - **Bug Fixes** - Corrected lambda function for learning rate scheduler to reference warmup steps accurately. - **Chores** - Streamlined data loading and handling for efficient training across different tasks.  * docs: update deepmd-gnn URL (deepmodeling#4482)  ## Summary by CodeRabbit - **Documentation** - Updated guidelines for creating and integrating new models in the DeePMD-kit framework. - Added new sections on descriptors, fitting networks, and model requirements. - Enhanced unit testing section with instructions for regression tests. - Updated URL for the DeePMD-GNN plugin to reflect new repository location.  Signed-off-by: Jinzhe Zeng <[email protected]> * docs: update DPA-2 citation (deepmodeling#4483)  ## Summary by CodeRabbit - **New Features** - Updated references in the bibliography for the DPA-2 model to include a new article entry for 2024. - Added a new reference for an attention-based descriptor. - **Bug Fixes** - Corrected reference links in documentation to point to updated DOI links instead of arXiv. - **Documentation** - Revised entries in the credits and model documentation to reflect the latest citations and details. - Enhanced clarity and detail in fine-tuning documentation for TensorFlow and PyTorch implementations.  --------- Signed-off-by: Jinzhe Zeng <[email protected]> * docs: fix a minor typo on the title of `install-from-c-library.md` (deepmodeling#4484)  ## Summary by CodeRabbit - **Documentation** - Updated formatting of the installation guide for the pre-compiled C library. - Icons for TensorFlow and JAX are now displayed together in the header. - Retained all installation instructions and compatibility notes.  Signed-off-by: Jinzhe Zeng <[email protected]> * fix: print dlerror if dlopen fails (deepmodeling#4485) xref: deepmodeling/deepmd-gnn#44  ## Summary by CodeRabbit - **New Features** - Enhanced error messages for library loading failures on non-Windows platforms. - Updated thread management environment variable checks for improved compatibility. - Added support for mixed types in tensor input handling, allowing for more flexible configurations. - **Bug Fixes** - Improved error reporting for dynamic library loading issues.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * change doc to py * Add out_bias out_std doc * change bias method to compute_stats_do_not_distinguish_types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change var_name to property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change logic of extensive bias * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add doc for neww added parameter * change doc for compute_stats_do_not_distinguish_types * try to fix dptest * change all property to property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete key 'property' completely * Fix UT * Fix dptest UT * pd: fix oom error (deepmodeling#4493) Paddle use `MemoryError` rather than `RuntimeError` used in pytorch, now I can test DPA-1 and DPA-2 in 16G V100... ![image](https://github.com/user-attachments/assets/42ead773-bf26-4195-8f67-404b151371de)  ## Summary by CodeRabbit - **Bug Fixes** - Improved detection of out-of-memory (OOM) errors to enhance application stability. - Ensured cached memory is cleared upon OOM errors, preventing potential memory leaks.  * pd: add missing `dp.eval()` in pd backend (deepmodeling#4488) Switch to eval mode when evaluating model, otherwise `self.training` will be `True`, backward graph will be created and cause OOM  ## Summary by CodeRabbit - **New Features** - Enhanced model evaluation state management to ensure correct behavior during evaluation. - **Bug Fixes** - Improved type consistency in the `normalize_coord` function for better computational accuracy.  * [pre-commit.ci] pre-commit autoupdate (deepmodeling#4497)  updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.4](astral-sh/ruff-pre-commit@v0.8.3...v0.8.4)  Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Delete attribute * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete property_name in serialize --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chenqqian Zhang <[email protected]> Co-authored-by: Jia-Xin Zhu <[email protected]> Co-authored-by: HydrogenSulfate <[email protected]> Co-authored-by: Jinzhe Zeng <[email protected]>

* Refactor property (#37) * change property.npy to any name * Init branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change | to Union * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change sub_var_name default to [] * Solve pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * solve scanning github * fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete useless file * Solve some UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve precommit * slove pre * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve dptest UT, dpatomicmodel UT, code scannisang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete param and * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve UT fail caused by task_dim and property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix permutation error * Add property bias UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover rcond doc * recover blank * Change code according according to coderabbitai * solve pre-commit * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change apply_bias doc * update the version compatibility * feat (tf/pt): add atomic weights to tensor loss (deepmodeling#4466) Interfaces are of particular interest in many studies. However, the configurations in the training set to represent the interface normally also include large parts of the bulk material. As a result, the final model would prefer the bulk information while the interfacial information is less learnt. It is difficult to simply improve the proportion of interfaces in the configurations since the electronic structures of the interface might only be reasonable with a certain thickness of bulk materials. Therefore, I wonder whether it is possible to define weights for atomic quantities in loss functions. This allows us to add higher weights for the atomic information for the regions of interest and probably makes the model "more focused" on the region of interest. In this PR, I add the keyword `enable_atomic_weight` to the loss function of the tensor model. In principle, it could be generalised to any atomic quantity, e.g., atomic forces. I would like to know the developers' comments/suggestions about this feature. I can add support for other loss functions and finish unit tests once we agree on this feature. Best.  ## Summary by CodeRabbit - **New Features** - Introduced an optional parameter for atomic weights in loss calculations, enhancing flexibility in the `TensorLoss` class. - Added a suite of unit tests for the `TensorLoss` functionality, ensuring consistency between TensorFlow and PyTorch implementations. - **Bug Fixes** - Updated logic for local loss calculations to ensure correct application of atomic weights based on user input. - **Documentation** - Improved clarity of documentation for several function arguments, including the addition of a new argument related to atomic weights.  * delete sub_var_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover to property key * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix conflict * Fix UT * Add document of property fitting * Delete checkpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add get_property_name to DeepEvalBackend * pd: fix learning rate setting when resume (deepmodeling#4480) "When resuming training, there is no need to add `self.start_step` to the step count because Paddle uses `lr_sche.last_epoch` as the input for `step`, which already records the `start_step` steps." learning rate are correct after fixing ![22AD6874B74E437E9B133D75ABCC02FE](https://github.com/user-attachments/assets/1ad0ce71-6e1c-4de5-87dc-0daca1f6f038)  ## Summary by CodeRabbit - **New Features** - Enhanced training process with improved optimizer configuration and learning rate adjustments. - Refined logging of training and validation results for clarity. - Improved model saving logic to preserve the latest state during interruptions. - Enhanced tensorboard logging for detailed tracking of training metrics. - **Bug Fixes** - Corrected lambda function for learning rate scheduler to reference warmup steps accurately. - **Chores** - Streamlined data loading and handling for efficient training across different tasks.  * docs: update deepmd-gnn URL (deepmodeling#4482)  ## Summary by CodeRabbit - **Documentation** - Updated guidelines for creating and integrating new models in the DeePMD-kit framework. - Added new sections on descriptors, fitting networks, and model requirements. - Enhanced unit testing section with instructions for regression tests. - Updated URL for the DeePMD-GNN plugin to reflect new repository location.  Signed-off-by: Jinzhe Zeng <[email protected]> * docs: update DPA-2 citation (deepmodeling#4483)  ## Summary by CodeRabbit - **New Features** - Updated references in the bibliography for the DPA-2 model to include a new article entry for 2024. - Added a new reference for an attention-based descriptor. - **Bug Fixes** - Corrected reference links in documentation to point to updated DOI links instead of arXiv. - **Documentation** - Revised entries in the credits and model documentation to reflect the latest citations and details. - Enhanced clarity and detail in fine-tuning documentation for TensorFlow and PyTorch implementations.  --------- Signed-off-by: Jinzhe Zeng <[email protected]> * docs: fix a minor typo on the title of `install-from-c-library.md` (deepmodeling#4484)  ## Summary by CodeRabbit - **Documentation** - Updated formatting of the installation guide for the pre-compiled C library. - Icons for TensorFlow and JAX are now displayed together in the header. - Retained all installation instructions and compatibility notes.  Signed-off-by: Jinzhe Zeng <[email protected]> * fix: print dlerror if dlopen fails (deepmodeling#4485) xref: deepmodeling/deepmd-gnn#44  ## Summary by CodeRabbit - **New Features** - Enhanced error messages for library loading failures on non-Windows platforms. - Updated thread management environment variable checks for improved compatibility. - Added support for mixed types in tensor input handling, allowing for more flexible configurations. - **Bug Fixes** - Improved error reporting for dynamic library loading issues.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * change doc to py * Add out_bias out_std doc * change bias method to compute_stats_do_not_distinguish_types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change var_name to property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change logic of extensive bias * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add doc for neww added parameter * change doc for compute_stats_do_not_distinguish_types * try to fix dptest * change all property to property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete key 'property' completely * Fix UT * Fix dptest UT * pd: fix oom error (deepmodeling#4493) Paddle use `MemoryError` rather than `RuntimeError` used in pytorch, now I can test DPA-1 and DPA-2 in 16G V100... ![image](https://github.com/user-attachments/assets/42ead773-bf26-4195-8f67-404b151371de)  ## Summary by CodeRabbit - **Bug Fixes** - Improved detection of out-of-memory (OOM) errors to enhance application stability. - Ensured cached memory is cleared upon OOM errors, preventing potential memory leaks.  * pd: add missing `dp.eval()` in pd backend (deepmodeling#4488) Switch to eval mode when evaluating model, otherwise `self.training` will be `True`, backward graph will be created and cause OOM  ## Summary by CodeRabbit - **New Features** - Enhanced model evaluation state management to ensure correct behavior during evaluation. - **Bug Fixes** - Improved type consistency in the `normalize_coord` function for better computational accuracy.  * [pre-commit.ci] pre-commit autoupdate (deepmodeling#4497)  updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.4](astral-sh/ruff-pre-commit@v0.8.3...v0.8.4)  Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Delete attribute * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete property_name in serialize --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chenqqian Zhang <[email protected]> Co-authored-by: Jia-Xin Zhu <[email protected]> Co-authored-by: HydrogenSulfate <[email protected]> Co-authored-by: Jinzhe Zeng <[email protected]> * add multig1 mess --------- Signed-off-by: Jinzhe Zeng <[email protected]> Signed-off-by: Duo <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chenqqian Zhang <[email protected]> Co-authored-by: Jia-Xin Zhu <[email protected]> Co-authored-by: HydrogenSulfate <[email protected]> Co-authored-by: Jinzhe Zeng <[email protected]>

njzjz added a commit that referenced this issue Dec 11, 2024

fix: avoid linking to pytorch

d04c619

Fix #44.

njzjz mentioned this issue Dec 11, 2024

fix: avoid linking to pytorch #45

Closed

njzjz added a commit to njzjz/deepmd-kit that referenced this issue Dec 22, 2024

fix: print dlerror if dlopen fails

b3063df

xref: deepmodeling/deepmd-gnn#44

njzjz mentioned this issue Dec 22, 2024

fix: print dlerror if dlopen fails deepmodeling/deepmd-kit#4485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An issue regarding the use of LAMMPS with the deep potential trained using MACE #44

An issue regarding the use of LAMMPS with the deep potential trained using MACE #44

wxwth commented Dec 10, 2024 •

edited

Loading

njzjz commented Dec 10, 2024

wxwth commented Dec 11, 2024

njzjz commented Dec 11, 2024

wxwth commented Dec 11, 2024

njzjz commented Dec 11, 2024

An issue regarding the use of LAMMPS with the deep potential trained using MACE #44

An issue regarding the use of LAMMPS with the deep potential trained using MACE #44

Comments

wxwth commented Dec 10, 2024 • edited Loading

njzjz commented Dec 10, 2024

wxwth commented Dec 11, 2024

njzjz commented Dec 11, 2024

wxwth commented Dec 11, 2024

njzjz commented Dec 11, 2024

wxwth commented Dec 10, 2024 •

edited

Loading