Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora 38 is EOL --> compilation crashes at rpmfusion returning 404 error #100

Closed
AlexKurek opened this issue Dec 10, 2024 · 36 comments · Fixed by #142
Closed

Fedora 38 is EOL --> compilation crashes at rpmfusion returning 404 error #100

AlexKurek opened this issue Dec 10, 2024 · 36 comments · Fixed by #142
Milestone

Comments

@AlexKurek
Copy link
Contributor

AlexKurek commented Dec 10, 2024

Describe the bug
Fedora 38 is EOL.

Logs or error messages

Errors during downloading metadata for repository 'rpmfusion-nonfree':
  - Status code: 404 for https://repo.fedora.md/mirrors/rpmfusion/nonfree/fedora/releases/38/Everything/x86_64/os/repodata/repomd.xml (IP: 95.65.43.79)
  - Curl error (28): Timeout was reached for http://mirror.epn.edu.ec/rpmfusion/nonfree/fedora/releases/38/Everything/x86_64/os/repodata/repomd.xml [Failed to connect to mirror.epn.edu.ec port 80 after 30000 ms: Timeout was reached]
Error: Failed to download metadata for repo 'rpmfusion-nonfree': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
RPM Fusion for Fedora 38 - Nonfree - Updates    0.0  B/s |   0  B     01:00
Errors during downloading metadata for repository 'rpmfusion-nonfree-updates':
  - Curl error (28): Timeout was reached for http://mirror.epn.edu.ec/rpmfusion/nonfree/fedora/updates/38/x86_64/repodata/repomd.xml [Connection timeout after 30000 ms]
  - Curl error (28): Timeout was reached for http://mirror.epn.edu.ec/rpmfusion/nonfree/fedora/updates/38/x86_64/repodata/repomd.xml [Failed to connect to mirror.epn.edu.ec port 80 after 30000 ms: Timeout was reached]
  - Status code: 404 for https://repo.fedora.md/mirrors/rpmfusion/nonfree/fedora/updates/38/x86_64/repodata/repomd.xml (IP: 95.65.43.79)
Error: Failed to download metadata for repo 'rpmfusion-nonfree-updates': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Ignoring repositories: rpmfusion-nonfree, rpmfusion-nonfree-updates
No match for argument: pgplot
Error: Unable to find a match: pgplot
FATAL:   While performing build: while running engine: while running %post section: exit status 1

Additional information (if applicable or known):
https://discussion.fedoraproject.org/t/rpm-fusion-nonfree-updates-not-working-cant-download-nvidia-drivers/139016
Moving to Fedora 39 seems to fix it, but later:

+ pip install unittest2
Collecting unittest2
  Obtaining dependency information for unittest2 from https://files.pythonhosted.org/packages/72/20/7f0f433060a962200b7272b8c12ba90ef5b903e218174301d0abfd523813/unittest2-1.1.0-py2.py3-none-any.whl.metadata
  Downloading unittest2-1.1.0-py2.py3-none-any.whl.metadata (15 kB)
Collecting argparse (from unittest2)
  Obtaining dependency information for argparse from https://files.pythonhosted.org/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl.metadata
  Downloading argparse-1.4.0-py2.py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: six>=1.4 in /usr/lib/python3.12/site-packages (from unittest2) (1.16.0)
Collecting traceback2 (from unittest2)
  Obtaining dependency information for traceback2 from https://files.pythonhosted.org/packages/17/0a/6ac05a3723017a967193456a2efa0aa9ac4b51456891af1e2353bb9de21e/traceback2-1.4.0-py2.py3-none-any.whl.metadata
  Downloading traceback2-1.4.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting linecache2 (from traceback2->unittest2)
  Obtaining dependency information for linecache2 from https://files.pythonhosted.org/packages/c7/a3/c5da2a44c85bfbb6eebcfc1dde24933f8704441b98fdde6528f4831757a6/linecache2-1.0.0-py2.py3-none-any.whl.metadata
  Downloading linecache2-1.0.0-py2.py3-none-any.whl.metadata (1000 bytes)
Downloading unittest2-1.1.0-py2.py3-none-any.whl (96 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.4/96.4 kB 2.0 MB/s eta 0:00:00
Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Downloading traceback2-1.4.0-py2.py3-none-any.whl (16 kB)
Downloading linecache2-1.0.0-py2.py3-none-any.whl (12 kB)
Installing collected packages: linecache2, argparse, traceback2, unittest2
Successfully installed argparse-1.4.0 linecache2-1.0.0 traceback2-1.4.0 unittest2-1.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
+ pip install xmlrunner
Collecting xmlrunner
  Downloading xmlrunner-1.7.7.tar.gz (5.6 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      Traceback (most recent call last):
        File "/tmp/pip-install-bc8q2oax/xmlrunner_9fd7e24a417d4611aafdb6776196bf7f/xmlrunner/xmlrunner.py", line 13, in <module>
          from unittest2.runner import TextTestRunner
      ModuleNotFoundError: No module named 'unittest2'
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-cg18ehli/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-cg18ehli/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-cg18ehli/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-cg18ehli/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 6, in <module>
        File "/tmp/pip-install-bc8q2oax/xmlrunner_9fd7e24a417d4611aafdb6776196bf7f/xmlrunner/__init__.py", line 3, in <module>
          from .xmlrunner import XMLTestRunner
        File "/tmp/pip-install-bc8q2oax/xmlrunner_9fd7e24a417d4611aafdb6776196bf7f/xmlrunner/xmlrunner.py", line 17, in <module>
          from unittest import TestResult, _TextTestResult, TextTestRunner
      ImportError: cannot import name '_TextTestResult' from 'unittest' (/usr/lib64/python3.12/unittest/__init__.py). Did you mean: 'TextTestResult'?

A possible workaround is to manually install this commit:
pycontribs/xmlrunner#16
or install this instead: https://pypi.org/project/unittest-xml-reporting/ . It is imported the same way, so it might be a replacement. I verified taht it solves the above error.

It then crashes at astropy-helpers (https://github.com/astropy/astropy-helpers?tab=readme-ov-file#deprecated):

Collecting astropy-healpix (from -r /opt/lofar/requirements3.txt (line 4))
  Downloading astropy_healpix-1.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting astropy-helpers (from -r /opt/lofar/requirements3.txt (line 5))
  Downloading astropy-helpers-4.0.1.tar.gz (52 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      <string>:14: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
      Traceback (most recent call last):
        File "/opt/lofar/pyenv-py3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/opt/lofar/pyenv-py3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/lofar/pyenv-py3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-6nxqrfv7/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-6nxqrfv7/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-6nxqrfv7/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 22, in <module>
        File "/tmp/pip-install-b_4c777c/astropy-helpers_9a569f64683b4ebc95eedb1f1c0d5d5b/astropy_helpers/version_helpers.py", line 34, in <module>
          from .distutils_helpers import is_distutils_display_option
        File "/tmp/pip-install-b_4c777c/astropy-helpers_9a569f64683b4ebc95eedb1f1c0d5d5b/astropy_helpers/distutils_helpers.py", line 18, in <module>
          from .utils import silence
        File "/tmp/pip-install-b_4c777c/astropy-helpers_9a569f64683b4ebc95eedb1f1c0d5d5b/astropy_helpers/utils.py", line 4, in <module>
          import imp
      ModuleNotFoundError: No module named 'imp'

Invoked here: https://github.com/tikk3r/flocs/blob/fedora-py3/requirements3.txt#L5 . Is it still needed?

@tikk3r
Copy link
Owner

tikk3r commented Dec 12, 2024

Yeah this has been on the list for a while, but at the time I was running into some availability issues for packages I think. I'm not sure if the second part is related to this issue.

@tikk3r tikk3r added this to the 5.4 milestone Dec 12, 2024
@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 12, 2024

I think it is. Rpmfussion is returing 404 for Fedora 38 since:

I archived the f37 and f38 rpmfusion repos on Monday to free up space on the server.

https://discussion.fedoraproject.org/t/rpm-fusion-nonfree-updates-not-working-cant-download-nvidia-drivers/139016/2

So it seems that it will not come back online. The only solution would be to upgrade Fedora.

@AlexKurek AlexKurek changed the title Fedora 38 is EOL Fedora 38 is EOL --> compilation crashes at rpmfusion returning 404 error Dec 12, 2024
@tikk3r
Copy link
Owner

tikk3r commented Dec 13, 2024

I was referring to the failing pip install with "the second part". I don't think that is related to RPMFusion, but the base container indeed needs updating to a more recent Fedora.

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 14, 2024

Later also datahape crashes:

Collecting datashape (from datashader<0.16.0,>=0.15.0->shadems==0.5.3->-r /opt/lofar/requirements3.txt (line 38))
  Downloading datashape-0.5.2.tar.gz (76 kB)

[...]
AttributeError: module 'configparser' has no attribute 'SafeConfigParser'. Did you mean: 'RawConfigParser'?

Workaround:
blaze/datashape#245 (comment)

EDIT:
Fixed in 097405d and 76a1b20.
EDIT2: This also fixes the issue: git+https://github.com/ratt-ru/shadeMS.git@issue-124

Later it crashes at:

./src/shared_array.c -o build/temp.linux-x86_64-cpython-312/./src/shared_array.o
      ./src/shared_array.c:24:10: fatal error: numpy/arrayobject.h: No such file or directory
         24 | #include <numpy/arrayobject.h>

Is it possible to relax version requirement here? https://github.com/tikk3r/flocs/blob/fedora-py3/requirements3.txt#L112 EDIT: I verified that latest sharedarray does install in Fedora 39 (Python 3.12).
Later it crashes at:
ERROR: Package 'ddfacet' requires a different Python: 3.12.7 not in '<3.12,>=3.0'

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 15, 2024

There are a lot of 3.11 occurrences. Maybe they should be replaced by $PYTHON_VERSION ?
Do you think it is still neccesary to install Dysco? Its a part of CasaCore now (https://github.com/casacore/casacore/releases/tag/v3.5.0).

At Casacore there is:

CMake Warning:
  Manually-specified variables were not used by the project:

    USE_FFTW3

Explanation: https://github.com/casacore/casacore/releases/tag/v3.4.0. USE_FFTW3 removed in 9a23f17.

AlexKurek referenced this issue Dec 15, 2024
I verified that if finds the libraries without DUSE_FFTW3.
@AlexKurek
Copy link
Contributor Author

kittens is installed first from requirements3.txt then again by pip install Kittens.

@tikk3r
Copy link
Owner

tikk3r commented Dec 16, 2024

Is it possible to relax version requirement here? https://github.com/tikk3r/flocs/blob/fedora-py3/requirements3.txt#L112 EDIT: I verified that latest sharedarray does install in Fedora 39 (Python 3.12).

Potentially. I think I put it in because there was something about it that didn't install or interfered when later versions were used, but it might be relaxable now.

Later it crashes at:
ERROR: Package 'ddfacet' requires a different Python: 3.12.7 not in '<3.12,>=3.0'

DDFacet is a beast of its own as you can probably see from the amount of seding I do to install it, so I'm not surprised there are crashes there.

@AlexKurek
Copy link
Contributor Author

Do you have any LOFAR related benchmarking scripts or anything like that? I would like to test some optimizations.

@tikk3r
Copy link
Owner

tikk3r commented Dec 19, 2024

I have a bunch of reference datasets that I image at every release. I can make those commands available in the repository at some point.

@AlexKurek
Copy link
Contributor Author

In intel FFTW is not manually compiled. In AMD it is.
In both export FFTW_VERSION=3.5.8 is not used. The latest FFTW version is 3.3.10, so lower than that.

@AlexKurek
Copy link
Contributor Author

It should be safe to upgrade OpenBLAS to this:
https://github.com/OpenMathLib/OpenBLAS/releases/tag/v0.3.23
Only fixes and one speedup.

@AlexKurek
Copy link
Contributor Author

In the tests: https://github.com/tikk3r/flocs/actions/runs/12415699968/workflow?pr=114#L32
the requirement file still has 3 a the end.

@tikk3r
Copy link
Owner

tikk3r commented Dec 20, 2024

In intel FFTW is not manually compiled. In AMD it is.

It would be interesting to see if it makes a difference for the Intel containers. For AMD it was a necessity as the one that shipped with AOCL wasn't built threaded (at the time at least).

@AlexKurek
Copy link
Contributor Author

NOAVX512 is false for Intel and true for AMD.

@tikk3r
Copy link
Owner

tikk3r commented Dec 21, 2024

NOAVX512 is false for Intel and true for AMD.

This is intentional. AVX512 is giving odd segfaults in certain situations on our cascadelake machines, so I have it off by default.

@AlexKurek
Copy link
Contributor Author

At IDG there is:
if [ $HAS_CUDA = true ]; then [...] -DCMAKE_BUILD_TYPE=Debug. Should it stay like this?

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 21, 2024

There is:

-- HAVE_CUDA ................. = OFF

in the Sagecal install logs even thought

%arguments
HAS_CUDA=true

Also there is

-- CMAKE_CXX_FLAGS_RELEASE ... = -O3 -DNDEBUG -g -O3

even thought

DEBUG=false

Also there is:

-- Found BLAS: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_gf_lp64.so;/opt/intel/oneapi/mkl/2025.0/lib/libmkl_gnu_thread.so;/opt/intel/oneapi/mkl/2025.0/lib/libmkl_core.so;/usr/lib/gcc/x86_64-redhat-linux/13/libgomp.so;-lm;-ldl
-- Using generic BLAS

Maybe -DBLA_VENDOR should be set to MKL: https://github.com/nlesc-dirac/sagecal/blob/master/INSTALL.md#ubuntu-2204-quick-install-also-works-mostly-for-2004

@AlexKurek
Copy link
Contributor Author

I just got this again:

Collecting emcee (from -r /opt/lofar/requirements.txt (line 31))
  Downloading emcee-3.1.5-py2.py3-none-any.whl.metadata (3.0 kB)
  Downloading emcee-3.1.4-py2.py3-none-any.whl.metadata (3.0 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Downloading emcee-3.1.3-py2.py3-none-any.whl.metadata (3.0 kB)
  Downloading emcee-3.1.2-py2.py3-none-any.whl.metadata (3.0 kB)
  Downloading emcee-3.1.1-py2.py3-none-any.whl.metadata (3.0 kB)
  Downloading emcee-3.1.0-py2.py3-none-any.whl.metadata (3.0 kB)
  Downloading emcee-3.0.2-py2.py3-none-any.whl.metadata (2.9 kB)
  Downloading emcee-3.0.1-py2.py3-none-any.whl.metadata (2.9 kB)
  Downloading emcee-3.0.0-py2.py3-none-any.whl.metadata (2.8 kB)
  Downloading emcee-2.2.1.tar.gz (24 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
  Downloading emcee-2.2.0.tar.gz (24 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
  Downloading emcee-2.1.0.tar.gz (23 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
  Downloading emcee-2.0.0.tar.gz (18 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
  Downloading emcee-1.2.0.tar.gz (22 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
  Downloading emcee-1.1.2.tar.gz (17 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      Traceback (most recent call last):
        File "/opt/lofar/pyenv-py3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/opt/lofar/pyenv-py3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/lofar/pyenv-py3/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-nlg0epvn/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-nlg0epvn/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-nlg0epvn/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-nlg0epvn/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 3, in <module>
        File "/tmp/pip-install-18m8u2y0/emcee_5fa4e5433af34b7990d51b10df885d80/emcee/__init__.py", line 2, in <module>
          from .sampler import *
        File "/tmp/pip-install-18m8u2y0/emcee_5fa4e5433af34b7990d51b10df885d80/emcee/sampler.py", line 11, in <module>
          import numpy as np
      ModuleNotFoundError: No module named 'numpy'

@AlexKurek
Copy link
Contributor Author

This is intentional. AVX512 is giving odd segfaults in certain situations on our cascadelake machines, so I have it off by default.

Have you tried updating the CPU microcode and BIOS there?
Are you aware that there are now more AVX512 sets than you are neglecting in the recipe? And that negating AVX-512F will disable all other sets?

@tikk3r
Copy link
Owner

tikk3r commented Dec 26, 2024

Have you tried updating the CPU microcode and BIOS there?

It was not easily reproducible, so we just left it at that.

Are you aware that there are now more AVX512 sets than you are neglecting in the recipe? And that negating AVX-512F will disable all other sets?

Yeah, many of the toggles there are also becoming obsolete as the compiler has been warning me some will be removed. I didn't know -mno-avx512f disabled all other instructions. Those flags can be condensed into just that then as the point is to ensure it won't use any of them.

I just got this again:

This seems to be an incompatibility of the shadeMS update. I've reverted it so it should be fixed again.

At IDG there is: if [ $HAS_CUDA = true ]; then [...] -DCMAKE_BUILD_TYPE=Debug. Should it stay like this?

I think Release used to run some test or something that failed, but I don't think there's a particular reason to keep it at debug.

in the Sagecal install logs even thought

For the sagecal stuff, there is probably a bunch of stuff that can be done better for that installation. I put it in there to test it with Rapthor at some point previously, but none of my processing really uses it so I haven't looked deeply into it.

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 26, 2024

I didn't know -mno-avx512f disabled all other instructions. Those flags can be condensed into just that then as the point is to ensure it won't use any of them.

Yes, as far as I understand it can.

By the way, Fedora 39 is also EOL now :) But I might be too early for Fedora 40.

@tikk3r
Copy link
Owner

tikk3r commented Dec 26, 2024

By the way, Fedora 39 is also EOL now :) But I might be too early for Fedora 40.

I am aware, but something about these recipes combined with Fedora >=40 makes it such that stuff doesn' t build. I'm trying to figure out why.

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 27, 2024

I am aware, but something about these recipes combined with Fedora >=40 makes it such that stuff doesn' t build. I'm trying to figure out why.

GCC 14 has some nice performance improvements, eg. in vectorization:
https://gcc.gnu.org/gcc-14/changes.html
It might be a good idea to do benchmarks after a successful upgrade to Fedora 40.

BTW You could try this instead of disabling the entire AVX-512:

New compiler option -m[no-]evex512 was added. The compiler switch enables/disables 512-bit vector. It will be default on if AVX512F is enabled.

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Dec 30, 2024

How about

pip list
env -0 | sort -z | tr '\0' '\n'

after debug condition at the very end of %post? pip list would show the versions that were actually installed and env would show what really happened to the variables during the build.

@AlexKurek
Copy link
Contributor Author

Is it still necessary to use virtualenv?

@AlexKurek
Copy link
Contributor Author

Probably march implies mtune. There are some doubs on the internet, but AFAIK they are not valid for newer GCC versions.

@tikk3r
Copy link
Owner

tikk3r commented Jan 3, 2025

Is it still necessary to use virtualenv?

Probably not on a technical basis, but I like it for keeping things separate from the OS and since I use venvs on other machines as well it serves as a sort of "test" at the same time. I think there have been some slight concerns about the overhead of mounting venvs on certain file systems in the past, but nothing I notice in my daily processing.

Probably march implies mtune. There are some doubs on the internet, but AFAIK they are not valid for newer GCC versions.

I think mtune defaults to whatever march is set to yeah. At least GCC 14 on my laptop it does. I am not sure what you mean with them not being valid? Do you mean the sandybridge default?

@AlexKurek
Copy link
Contributor Author

I am not sure what you mean with them not being valid? Do you mean the sandybridge default?

Eg. this:
https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/
There was a bug in some very old GCC version causing problems if only march is set to native. AFAIK its fixed now and mtune=march presently.

@AlexKurek
Copy link
Contributor Author

I am not sure if this is working as we wanted --progress=bar:force:noscroll. I still see in the logs:

NVSS.fits            61%[===========>        ]  98.15M  1.45MB/s    eta 45s    
NVSS.fits            61%[===========>        ]  98.33M  1.39MB/s    eta 45s  

@AlexKurek
Copy link
Contributor Author

Do you think this line is neccesary?
export LD_LIBRARY_PATH="/opt/intel/oneapi/mkl/latest/lib/intel64/:/opt/intel/oneapi/compiler/latest/lib:/opt/intel/oneapi/tbb/latest/env/../lib/intel64/gcc4.8:$LD_LIBRARY_PATH"
later there is:
source /opt/intel/oneapi/mkl/latest/env/vars.sh

@tikk3r
Copy link
Owner

tikk3r commented Jan 4, 2025

I am not sure what you mean with them not being valid? Do you mean the sandybridge default?

Eg. this: https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/ There was a bug in some very old GCC version causing problems if only march is set to native. AFAIK its fixed now and mtune=march presently.

I read that as mtune being set to generic by default, if left unspecified. I don't think these container builds have been affected in that case as mtune was always specified alongside march.

Do you think this line is neccesary? export LD_LIBRARY_PATH="/opt/intel/oneapi/mkl/latest/lib/intel64/:/opt/intel/oneapi/compiler/latest/lib:/opt/intel/oneapi/tbb/latest/env/../lib/intel64/gcc4.8:$LD_LIBRARY_PATH" later there is: source /opt/intel/oneapi/mkl/latest/env/vars.sh

If I remember right that source didn't fully load things for me, but I could be remembering that wrong.

@tikk3r
Copy link
Owner

tikk3r commented Jan 4, 2025

I am not sure if this is working as we wanted --progress=bar:force:noscroll. I still see in the logs:

NVSS.fits            61%[===========>        ]  98.15M  1.45MB/s    eta 45s    
NVSS.fits            61%[===========>        ]  98.33M  1.39MB/s    eta 45s  

Ah, hmm. Well it was worth a try.

@AlexKurek
Copy link
Contributor Author

Ah, hmm. Well it was worth a try.

The log is larger without it (2385 vs 2927 kB). So no reason to revert this.

@AlexKurek
Copy link
Contributor Author

Should CASACore use MPI? Currently it is off:
-- USE_MPI ............... = OFF

@AlexKurek
Copy link
Contributor Author

Is Sagecal:

-- Found BLAS: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_gf_lp64.so;/opt/intel/oneapi/mkl/2025.0/lib/libmkl_gnu_thread.so;/opt/intel/oneapi/mkl/2025.0/lib/libmkl_core.so;/usr/lib/gcc/x86_64-redhat-linux/13/libgomp.so;-lm;-ldl
-- Using generic BLAS

@tikk3r
Copy link
Owner

tikk3r commented Jan 6, 2025

Should CASACore use MPI? Currently it is off: -- USE_MPI ............... = OFF

We don't use MPI, so I'm not sure if it has any effect or benefit from turning it on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants