Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly use int64 in Numpy/cython code to avoid OS inconsistency #3992

Merged
merged 25 commits into from
Aug 24, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Aug 10, 2024

Summary

Cython code need attention

  • src/pymatgen/optimization/linear_assignment.pyx
  • src/pymatgen/optimization/neighbors.pyx
  • src/pymatgen/util/coord_cython.pyx

Pin some threads from NumPy for reference:

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Aug 10, 2024

@njzjz Perhaps you could give me some advice or comment here because I'm not a cython expert? Thanks a ton in advance.

Looks like quite some code is broken when running on Windows system with NumPy < 2, because with 862c454 integer is explicitly declared as int64 independent of the platform, however on Windows with NumPy < 2 arrays might still be int32 (long) by default.

It looks like we should update all cython-related code to internally cast type as int64. But would this long -> int64 replacement cause unexpected downstream breakage if downstream cython code don't explicitly cast to int64? Is this necessary or recommended by NumPy?

@@ -1688,6 +1688,7 @@ def test_calculate_ase(self):
assert not hasattr(calculator, "dynamics")
assert self.cu_structure == struct_copy, "original structure was modified"

@pytest.mark.skip(reason="chgnet is failing with Numpy 1, see #3992")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a closer look, chgnet tests are failing for NumPy 1 as well, I guess all ints are now int64 in pymatgen, which might not be the case for chgnet's cython code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AegisIK can you comment on this? is the fix on the chgnet side to ensure all ints in the atom/bond graph construction are np.int64?

@njzjz
Copy link
Contributor

njzjz commented Aug 10, 2024

Fused types may help: http://docs.cython.org/en/stable/src/userguide/fusedtypes.html
https://stackoverflow.com/a/28112966/9567349

@njzjz
Copy link
Contributor

njzjz commented Aug 10, 2024

Also, it may be useful to run tests against both numpy 1.x and 2.x.

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Aug 11, 2024

Also, it may be useful to run tests against both numpy 1.x and 2.x.

Agreed! NumPy is recommending the same:

We recommend that you have at least one CI job which builds/installs via a wheel, and then runs tests against the oldest numpy version that the package supports.

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Aug 23, 2024

@njzjz I decide perhaps still explicitly use int64 to ensure consistency (not sure if that's the best solution, free feel to correct me). Would appreciate it if you have any comment. Thanks a ton!

@janosh I'm not a cython expert (important 😅 ) so please review all changes very carefully :)

Added a NumPy 1 workflow for windows as recommended by NumPy doc, but still not sure if it's worth it because I'm not expecting no new issues to pop up once we fixes these.

We recommend that you have at least one CI job which builds/installs via a wheel, and then runs tests against the oldest numpy version that the package supports.

@DanielYang59 DanielYang59 marked this pull request as ready for review August 23, 2024 10:38
@janosh janosh added compatability Concerning pymatgen compatibility with different OS, Python versions, numpy versions, etc. cython Performance critical Cython code ecosystem Concerning the larger pymatgen ecosystem labels Aug 24, 2024
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure we need to change np.array(dtype=int->np.int64) across the whole code base? reading the numpy docs you linked, it sounds like the potential for platform inconsistency is specific to cython code? so might be enough to only int->np.int64 in .pyx files?

@DanielYang59
Copy link
Contributor Author

it sounds like the potential for platform inconsistency is specific to cython code?

Sorry perhaps I should tweak the title, current one might be misleading. After NumPy 2.x, the default NumPy int type for 64-bit windows system changed from int32 to int64 (not just cython, sorry).

The default integer type on Windows is now int64 rather than int32, matching the behavior on other platforms

Meaning if we pass a numpy array as np.array(data, dtype=int) into a cython function, the latter now would explicitly expect int64, and the dtype could be int32 for the former in Numpy 1.x windows machines (therefore those CI failures). So I don't think just changing just the types in cython code suffice.

Meanwhile I think this is better than a fused type of int32 and int64, because if we explicitly use int64, we then could be confident our code would generate consistent results independent of OS? Because personally I think int64 might be the "mainstream" way to go (any NP2 system or Linux system)?

@DanielYang59 DanielYang59 changed the title Explicitly use int64 in cython code to avoid OS inconsistency Explicitly use int64 in Numpy/cython code to avoid OS inconsistency Aug 24, 2024
pyproject.toml Show resolved Hide resolved
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for looking into this @DanielYang59! hugely appreciated! 👍

@janosh janosh merged commit 41e4c69 into materialsproject:master Aug 24, 2024
43 checks passed
@DanielYang59
Copy link
Contributor Author

No problem. Also appreciate the input from @njzjz

Perhaps we should add a note in next release informing downstream packages of such potentially breaking change (chgnet for example)?

@DanielYang59 DanielYang59 deleted the fix-np-cython-int branch August 24, 2024 08:56
@janosh
Copy link
Member

janosh commented Aug 24, 2024

sure thing! would you like to draft a note?

@DanielYang59
Copy link
Contributor Author

Sure, and feel free to polish it as you like.

[Breaking] Default NumPy/Cython integer type changed to int64 on Windows

Change: With NumPy 2.x, the default integer type changed to int64 on Windows 64-bit system. Consequently, pymatgen now explicitly uses np.int64 for all NumPy and Cython code instead of the platform-dependent int type.

Recommendation: Please explicitly declare dtype=np.int64 when initializing a NumPy array if possible. You might also need to test NumPy 1.x on Windows in CI pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatability Concerning pymatgen compatibility with different OS, Python versions, numpy versions, etc. cython Performance critical Cython code ecosystem Concerning the larger pymatgen ecosystem fix Bug fix PRs windows Windows-specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatability with numpy <2 on Windows for Lattice.find_points_in_spheres
3 participants