Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] S3 import from public bucket fails on first try #818

Open
us8945 opened this issue Aug 8, 2024 · 0 comments
Open

[BUG] S3 import from public bucket fails on first try #818

us8945 opened this issue Aug 8, 2024 · 0 comments
Assignees
Labels
type/bug Bug in code

Comments

@us8945
Copy link
Contributor

us8945 commented Aug 8, 2024

🐛 Bug

S3 import from public bucket fails on first try. It works after clicking continue and ignoring error message.
This is confirmed in version 1.8.0. The functionality worked in previous versions.
Also, in previous versions, when specifying bucket with single file, file name was pre-populated automatically.

To Reproduce

  • Import dataset
  • Select "S3"
  • Populate public bucket name

LLM Studio version

1.8.0

Error message

2024/08/01 20:20:55 # 
2024/08/01 20:20:55 # ┌────────────────┐ H2O Wave 
2024/08/01 20:20:55 # │  ┐┌┐┐┌─┐┌ ┌┌─┐ │ 1.3.3 20240614125607
2024/08/01 20:20:55 # │  └┘└┘└─└└─┘└── │ © 2021 H2O.ai, Inc.
2024/08/01 20:20:55 # └────────────────┘
2024/08/01 20:20:55 # ┌──────────────────────────────────────┐
2024/08/01 20:20:55 # │  Running at http://localhost:10101/  │
2024/08/01 20:20:55 # └──────────────────────────────────────┘
2024/08/01 20:20:55 # {"address":"download/","source":"/workspace/output/download","t":"private_dir"}
2024/08/01 20:20:55 # {"address":":10101","base-url":"/","t":"listen","web-dir":"/home/llmstudio/.local/share/virtualenvs/workspace-dqq3IVyd/www"}
2024/08/01 20:23:22 # {"addr":"172.17.0.1:45558","route":"/","t":"ui_add"}
2024/08/01 20:23:28 # {"client":"5f32bc84-355d-4982-ba57-8e689ee6f6f6","state":"DISCONNECT","t":"ws_disconnect"}
2024/08/01 20:23:33 # {"client":"5f32bc84-355d-4982-ba57-8e689ee6f6f6","t":"client_unsubscribe"}
2024/08/01 20:23:33 # {"client_id":"5f32bc84-355d-4982-ba57-8e689ee6f6f6","t":"ui_drop"}
2024/08/01 20:23:34 # {"addr":"172.17.0.1:48362","route":"/","t":"ui_add"}
2024/08/01 20:23:39 # {"client":"7c4fe074-642b-4da4-81fd-993747e0237a","state":"DISCONNECT","t":"ws_disconnect"}
2024/08/01 20:23:39 # {"addr":"172.17.0.1:48372","route":"/","t":"ui_add"}
2024/08/01 20:23:40 # {"client":"dc711a02-da29-40e9-b5a9-0b420fff7d1e","state":"DISCONNECT","t":"ws_disconnect"}
2024/08/01 20:23:40 # {"addr":"172.17.0.1:48398","route":"/","t":"ui_add"}
2024/08/01 20:23:40 # {"client":"9b793652-5c9f-4908-ad5a-21389513da96","state":"DISCONNECT","t":"ws_disconnect"}
2024/08/01 20:23:41 # {"addr":"172.17.0.1:48418","route":"/","t":"ui_add"}
2024/08/01 20:23:41 # {"client":"7c312e45-83d0-4dd4-b0cd-76faa968d97b","state":"DISCONNECT","t":"ws_disconnect"}
2024/08/01 20:23:42 # {"addr":"172.17.0.1:48430","route":"/","t":"ui_add"}
2024/08/01 20:23:44 # {"client":"7c4fe074-642b-4da4-81fd-993747e0237a","t":"client_unsubscribe"}
2024/08/01 20:23:44 # {"client_id":"7c4fe074-642b-4da4-81fd-993747e0237a","t":"ui_drop"}
2024/08/01 20:23:45 # {"client":"dc711a02-da29-40e9-b5a9-0b420fff7d1e","t":"client_unsubscribe"}
2024/08/01 20:23:45 # {"client_id":"dc711a02-da29-40e9-b5a9-0b420fff7d1e","t":"ui_drop"}
2024/08/01 20:23:45 # {"client":"9b793652-5c9f-4908-ad5a-21389513da96","t":"client_unsubscribe"}
2024/08/01 20:23:45 # {"client_id":"9b793652-5c9f-4908-ad5a-21389513da96","t":"ui_drop"}
2024/08/01 20:23:46 # {"client":"7c312e45-83d0-4dd4-b0cd-76faa968d97b","t":"client_unsubscribe"}
2024/08/01 20:23:46 # {"client_id":"7c312e45-83d0-4dd4-b0cd-76faa968d97b","t":"ui_drop"}
Cannot connect to Wave server, retrying...
[2024-08-01 20:23:46,898] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Error loading keyring: No recommended backend was available. Install a recommended 3rd party backend package; or, install the keyrings.alt package if you want to use the non-recommended backends. See https://pypi.org/project/keyring for details.. Disabling keyring save option.
INFO:     Started server process [7]
INFO:     Waiting for application startup.
2024/08/01 20:24:01 # {"host":"http://127.0.0.1:8756","route":"/","t":"app_add"}
2024-08-01 20:24:01,127 - INFO: STARTING APP
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8756 (Press CTRL+C to quit)
2024/08/01 20:24:01 # {"client":"ae3eb5bf-bb78-4210-b40c-c4718d22be09","state":"DISCONNECT","t":"ws_disconnect"}
2024/08/01 20:24:01 # {"addr":"172.17.0.1:44354","route":"/","t":"ui_add"}
2024/08/01 20:24:01 # {"addr":"172.17.0.1:44354","route":"/be6535db-c712-46ef-9d9a-2eb83628e1bd","t":"ui_add"}
INFO:     127.0.0.1:42804 - "POST / HTTP/1.1" 200 OK
2024-08-01 20:24:01,472 - INFO: Initializing app ...
2024-08-01 20:24:02,437 - INFO: Initializing app ... done
2024-08-01 20:24:02,437 - INFO: Initializing client None
2024-08-01 20:24:02,571 - INFO: User name: anon
2024-08-01 20:24:02,571 - INFO: No user settings found. Using default settings.
2024-08-01 20:24:02,571 - INFO: Heap off
2024-08-01 20:24:02,576 - INFO: Downloading default dataset...
2024/08/01 20:24:02 # {"path":"/_f/9000b9e8-458f-4611-9b43-fbec1862bd93/tmp2yj_z1bo.min.js","t":"file_download"}
2024/08/01 20:24:02 # {"path":"/_f/d4bfc8b9-c198-43fd-904a-2f87348afc40/icon.png","t":"file_download"}
Downloading readme:   0%|          | 0.00/10.6k [00:00<?, ?B/s]
Downloading readme: 100%|██████████| 10.6k/10.6k [00:00<00:00, 28.6MB/s]
2024/08/01 20:24:06 # {"client":"ae3eb5bf-bb78-4210-b40c-c4718d22be09","t":"client_unsubscribe"}
2024/08/01 20:24:06 # {"client_id":"ae3eb5bf-bb78-4210-b40c-c4718d22be09","t":"ui_drop"}
Downloading data:   0%|          | 0.00/63.5M [00:00<?, ?B/s]
Downloading data:   7%|▋         | 4.19M/63.5M [00:00<00:04, 12.0MB/s]
Downloading data:  20%|█▉        | 12.6M/63.5M [00:00<00:01, 27.8MB/s]
Downloading data:  33%|███▎      | 21.0M/63.5M [00:00<00:01, 37.0MB/s]
Downloading data:  46%|████▌     | 29.4M/63.5M [00:00<00:00, 43.3MB/s]
Downloading data:  59%|█████▉    | 37.7M/63.5M [00:01<00:00, 43.9MB/s]
Downloading data:  73%|███████▎  | 46.1M/63.5M [00:01<00:00, 46.5MB/s]
Downloading data:  86%|████████▌ | 54.5M/63.5M [00:01<00:00, 49.4MB/s]
Downloading data:  99%|█████████▉| 62.9M/63.5M [00:01<00:00, 50.6MB/s]
Downloading data: 100%|██████████| 63.5M/63.5M [00:03<00:00, 17.3MB/s]
Downloading data:   0%|          | 0.00/3.18M [00:00<?, ?B/s]
Downloading data: 100%|██████████| 3.18M/3.18M [00:00<00:00, 11.6MB/s]
Downloading data: 100%|██████████| 3.18M/3.18M [00:00<00:00, 8.18MB/s]
Generating train split:   0%|          | 0/128575 [00:00<?, ? examples/s]
Generating train split:   1%|          | 1000/128575 [00:00<01:09, 1826.66 examples/s]
Generating train split:  19%|█▊        | 24000/128575 [00:00<00:02, 48327.11 examples/s]
Generating train split:  38%|███▊      | 49000/128575 [00:00<00:00, 93056.89 examples/s]
Generating train split:  58%|█████▊    | 74000/128575 [00:00<00:00, 130389.49 examples/s]
Generating train split:  78%|███████▊  | 100000/128575 [00:00<00:00, 161964.83 examples/s]
Generating train split:  99%|█████████▉| 127000/128575 [00:01<00:00, 188091.99 examples/s]
Generating train split: 100%|██████████| 128575/128575 [00:01<00:00, 119332.23 examples/s]
Generating validation split:   0%|          | 0/6599 [00:00<?, ? examples/s]
Generating validation split: 100%|██████████| 6599/6599 [00:00<00:00, 235605.37 examples/s]
Downloading readme:   0%|          | 0.00/196 [00:00<?, ?B/s]
Downloading readme: 100%|██████████| 196/196 [00:00<00:00, 1.67MB/s]
Downloading data:   0%|          | 0.00/36.3M [00:00<?, ?B/s]
Downloading data:  12%|█▏        | 4.19M/36.3M [00:00<00:01, 20.0MB/s]
Downloading data:  35%|███▍      | 12.6M/36.3M [00:00<00:00, 36.4MB/s]
Downloading data:  58%|█████▊    | 21.0M/36.3M [00:00<00:00, 43.1MB/s]
Downloading data:  81%|████████  | 29.4M/36.3M [00:00<00:00, 39.0MB/s]
Downloading data: 100%|██████████| 36.3M/36.3M [00:00<00:00, 44.4MB/s]
Downloading data: 100%|██████████| 36.3M/36.3M [00:06<00:00, 5.67MB/s]
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 12859 examples [00:00, 157287.16 examples/s]
2024-08-01 20:24:38,597 - INFO: {'init_app'}
INFO:     127.0.0.1:44266 - "POST / HTTP/1.1" 200 OK
2024-08-01 20:32:21,646 - INFO: Initializing client True
2024-08-01 20:32:21,846 - INFO: {'dataset/list', 'home/disk_usage', 'init_app', 'home/experiments_stats', 'home/gpu_stats', 'home/compute_stats'}
INFO:     127.0.0.1:44266 - "POST / HTTP/1.1" 200 OK
2024-08-01 20:32:23,817 - INFO: Initializing client True
2024-08-01 20:32:24,015 - INFO: {'dataset/list', 'home/disk_usage', 'init_app', 'home/experiments_stats', 'dataset/display/footer', 'home/gpu_stats', 'home/compute_stats'}
INFO:     127.0.0.1:51710 - "POST / HTTP/1.1" 200 OK
2024-08-01 20:32:32,713 - INFO: Initializing client True
2024-08-01 20:32:32,916 - INFO: {'dataset/list', 'dataset/import', 'home/disk_usage', 'init_app', 'home/experiments_stats', 'dataset/import/footer', 'dataset/display/footer', 'home/gpu_stats', 'home/compute_stats'}
2024-08-01 20:32:35,559 - WARNING: Can't load S3 datasets list: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
INFO:     127.0.0.1:36966 - "POST / HTTP/1.1" 200 OK
2024-08-01 20:33:17,749 - INFO: Initializing client True
INFO:     127.0.0.1:52722 - "POST / HTTP/1.1" 200 OK
2024-08-01 20:33:28,142 - INFO: Initializing client True
2024-08-01 20:33:28,353 - INFO: {'dataset/list', 'dataset/import', 'home/disk_usage', 'init_app', 'home/experiments_stats', 'dataset/import/footer', 'dataset/display/footer', 'home/gpu_stats', 'home/compute_stats'}
2024-08-01 20:33:29,670 - ERROR: Dataset error:
Traceback (most recent call last):
  File "/workspace/./llm_studio/app_utils/sections/dataset.py", line 381, in dataset_import
    ) = await s3_download(
  File "/workspace/./llm_studio/app_utils/utils.py", line 398, in s3_download
    q, (s3.meta.client.head_object(Bucket=bucket, Key=filename))["ContentLength"]
  File "/home/llmstudio/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/llmstudio/.local/share/virtualenvs/workspace-dqq3IVyd/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
@us8945 us8945 added the type/bug Bug in code label Aug 8, 2024
@pascal-pfeiffer pascal-pfeiffer self-assigned this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Bug in code
Projects
None yet
Development

No branches or pull requests

2 participants