Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.1.2+cu118 and v2.1.1+cu118 run into torchdata ImportError: libssl.so.3: cannot open shared object file: No such file or directory, that v2.1.0+cu118 doesn't have an issue with #1220

Open
justinxzhao opened this issue Jan 11, 2024 · 1 comment

Comments

@justinxzhao
Copy link

🐛 Describe the bug

We are noticing a strange error specifically when using torch2.1.1+cu118 and torch2.1.2+cu118 , that is not an issue with torch2.1.0+cu118.

The error looks like this:

Traceback (most recent call last):
    from ludwig.api import LudwigModel
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/api.py", line 41, in <module>
    from ludwig.backend import Backend, initialize_backend, provision_preprocessing_workers
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/backend/__init__.py", line 22, in <module>
    from ludwig.backend.base import Backend, LocalBackend
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/backend/base.py", line 34, in <module>
    from ludwig.data.cache.manager import CacheManager
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/data/cache/manager.py", line 8, in <module>
    from ludwig.data.dataset.base import DatasetManager
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/data/dataset/base.py", line 24, in <module>
    from ludwig.distributed import DistributedStrategy
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/distributed/__init__.py", line 3, in <module>
    from ludwig.distributed.base import DistributedStrategy, LocalStrategy
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/distributed/base.py", line 11, in <module>
    from ludwig.modules.optimization_modules import create_optimizer
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/modules/optimization_modules.py", line 21, in <module>
    from ludwig.utils.torch_utils import LudwigModule
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/utils/torch_utils.py", line 14, in <module>
    from ludwig.utils.strings_utils import SpecialSymbol
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/utils/strings_utils.py", line 33, in <module>
    from ludwig.utils.tokenizers import get_tokenizer_from_registry
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/utils/tokenizers.py", line 21, in <module>
    import torchtext
  File "/home/ray/anaconda3/lib/python3.8/site-packages/torchtext/__init__.py", line 12, in <module>
    from . import data, datasets, prototype, functional, models, nn, transforms, utils, vocab, experimental
  File "/home/ray/anaconda3/lib/python3.8/site-packages/torchtext/datasets/__init__.py", line 3, in <module>
    from .ag_news import AG_NEWS
  File "/home/ray/anaconda3/lib/python3.8/site-packages/torchtext/datasets/ag_news.py", line 5, in <module>
    from torchdata.datapipes.iter import FileOpener, IterableWrapper
  File "/home/ray/anaconda3/lib/python3.8/site-packages/torchdata/__init__.py", line 7, in <module>
    from torchdata import _extension  # noqa: F401
  File "/home/ray/anaconda3/lib/python3.8/site-packages/torchdata/_extension.py", line 34, in <module>
    _init_extension()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/torchdata/_extension.py", line 31, in _init_extension
    from torchdata import _torchdata as _torchdata
ImportError: libssl.so.3: cannot open shared object file: No such file or directory

It seems like there's some complaint about torchdata, which seems to install with urllib3>2.0.

When trying to install with urllib3==1.26.16 to try to mitigate the libssl.so error, then we get a different error:

Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1382, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/ray/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/ray/anaconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 28, in <module>
    from ..integrations.deepspeed import is_deepspeed_zero3_enabled
  File "/home/ray/anaconda3/lib/python3.8/site-packages/transformers/integrations/deepspeed.py", line 49, in <module>
    from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/accelerator.py", line 35, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/utils/__init__.py", line 153, in <module>
    from .launch import (
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/utils/launch.py", line 24, in <module>
    from ..commands.config.config_args import SageMakerConfig
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/commands/config/__init__.py", line 19, in <module>
    from .config import config_command_parser
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/commands/config/config.py", line 25, in <module>
    from .sagemaker import get_sagemaker_input
  File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/commands/config/sagemaker.py", line 35, in <module>
    import boto3  # noqa: F401
  File "/home/ray/anaconda3/lib/python3.8/site-packages/boto3/__init__.py", line 17, in <module>
    from boto3.session import Session
  File "/home/ray/anaconda3/lib/python3.8/site-packages/boto3/session.py", line 17, in <module>
    import botocore.session
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/session.py", line 26, in <module>
    import botocore.client
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/client.py", line 15, in <module>
    from botocore import waiter, xform_name
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/waiter.py", line 18, in <module>
    from botocore.docs.docstring import WaiterDocstring
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/__init__.py", line 15, in <module>
    from botocore.docs.service import ServiceDocumenter
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/service.py", line 14, in <module>
    from botocore.docs.client import ClientDocumenter, ClientExceptionsDocumenter
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/client.py", line 14, in <module>
    from botocore.docs.example import ResponseExampleDocumenter
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/example.py", line 13, in <module>
    from botocore.docs.shape import ShapeDocumenter
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/shape.py", line 19, in <module>
    from botocore.utils import is_json_value_header
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/utils.py", line 34, in <module>
    import botocore.httpsession
  File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/httpsession.py", line 21, in <module>
    from urllib3.util.ssl_ import (
ImportError: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_' (/home/ray/anaconda3/lib/python3.8/site-packages/urllib3/util/ssl_.py)

This suggests a different incompatibility (perhaps from deepspeed?).

Anyway, it seems like torch 2.1.0+cu118 doesn’t require the newest version of torchdata and/or it seems to work with urllib3==1.26.16, which appears to mitigate our issues.

However, the errors when trying to use 2.1.1+cu118 and 2.1.2+cu118 his seemed weird to me, so raising it here in case anyone had any helpful tidbits!

Versions

2.1.0+cu118 (works)
2.1.1+cu118 (broken)
2.1.2+cu118 (broken)

@malfet
Copy link

malfet commented Jan 12, 2024

Transferring to the torchdata project, though please note that it's not really maintained by anyone right now

@malfet malfet transferred this issue from pytorch/pytorch Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants