Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) at NamespaceFS.complete_object_upload #8620

Open
rkomandu opened this issue Dec 20, 2024 · 4 comments
Assignees
Labels

Comments

@rkomandu
Copy link
Collaborator

rkomandu commented Dec 20, 2024

Environment info

  • NooBaa Version: VERSION
  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify

Noobaa d/s rpm noobaa-core-5.17.2-20241217.el9.ppc64le

Actual behavior

When running Warp IO workload and performing the HA functionality with assert, sigsegv to mmfs daemon, the noobaa log observed the following messages

Warp run
./warp mixed --noclear --insecure --duration 120m --host .com:6443 --access-key 5gzVKZdN9XerVoPtImPr --secret-key evxl0frzdnk5rTYkj0fg0WkDxAcwTaYdgEVAoH0A --tls --obj.size 256M --bucket newbucket-warp-ha-efix-downstream-19dec2024-defrecreate

noobaa.log

 Dec 19 08:24:37 node-gui1 [1968840]: [nsfs/1968840] [ERROR] core.sdk.namespace_fs::  SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
Dec 19 08:24:37 node-gui1 [1968840]: [nsfs/1968840] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/hLVXsVmn/36.8aiUq4RMhm1N8nKc.rnd?uploadId=c8727afb-a153-4e72-8b63-be43914136eb</Resource><RequestId>m4vcsj8j-33lowe-ui2</RequestId></Error> POST /newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/hLVXsVmn/36.8aiUq4RMhm1N8nKc.rnd?uploadId=c8727afb-a153-4e72-8b63-be43914136eb {"host":"node-s3-ces.rtp.raleigh.ibm.com:6443","user-agent":"MinIO (linux; ppc64le) minio-go/v7.0.70 warp/(dev)","content-length":"1450","authorization":"AWS4-HMAC-SHA256 Credential=5gzVKZdN9XerVoPtImPr/20241219/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-checksum-crc32c;x-amz-content-sha256;x-amz-date, Signature=04c66799c342576868d
000e48b523c76735b993e38eb28c082f249c30b0f9987","content-type":"application/octet-stream","x-amz-checksum-crc32c":"ibWdPA==","x-amz-content-sha256":"e4ba0971bba257d08597b280a3a95bb8e963d92375c0ad46b17961c2d22cc3cc","x-amz-date":"20241219T132437Z"} SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
Dec 19 08:46:11 node-gui1 [2004213]: [nsfs/2004213] [ERROR] core.sdk.namespace_fs::  SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
Dec 19 08:46:11 node-gui1 [2004213]: [nsfs/2004213] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/PiKFysE%29/88.u%2917eBcxgwlaqNK8.rnd?uploadId=6f88d055-1d86-461a-9638-9b2aa047786f</Resource><RequestId>m4vdk9lg-ztnqv-1eg6</RequestId></Error> POST /newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/PiKFysE%29/88.u%2917eBcxgwlaqNK8.rnd?uploadId=6f88d055-1d86-461a-9638-9b2aa047786f {"host":"node-s3-ces.rtp.raleigh.ibm.com:6443","user-agent":"MinIO (linux; ppc64le) minio-go/v7.0.70 warp/(dev)","content-length":"1450","authorization":"AWS4-HMAC-SHA256 Credential=5gzVKZdN9XerVoPtImPr/20241219/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-checksum-crc32c;x-amz-content-sha256;x-amz-date, Signature=ad14ea8f5b058ff34837b6dd1845addeaecb100d78a0ee51d9cda8dfb183f1b1","content-type":"application/octet-stream","x-amz-checksum-crc32c":"YMQphA==","x-amz-content-sha256":"b1700fad23297971122af5d5f8d1286f183579608a6f0d2f0f029178e92134ef","x-amz-date":"20241219T134611Z"} SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
Dec 19 09:31:53 node-gui1 [2074822]: [nsfs/2074822] [ERROR] core.sdk.namespace_fs::  SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
...
Dec 19 09:31:53 node-gui1 [2074822]: [nsfs/2074822] [ERROR] core.sdk.namespace_fs::  SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
Dec 19 09:31:53 node-gui1 [2074822]: [nsfs/2074822] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/xxOCIlk4/148.pSOIaVsLCxklDGO7.rnd?uploadId=df72e09e-0c8d-4fb9-bc68-bdcfe21209fe</Resource><RequestId>m4vf71js-5kfsyd-qyo</RequestId></Error> POST /newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/xxOCIlk4/148.pSOIaVsLCxklDGO7.rnd?uploadId=df72e09e-0c8d-4fb9-bc68-bdcfe21209fe {"host":"node-s3-ces.rtp.raleigh.ibm.com:6443","user-agent":"MinIO (linux; ppc64le) minio-go/v7.0.70 warp/(dev)","content-length":"1450","authorization":"AWS4-HMAC-SHA256 Credential=5gzVKZdN9XerVoPtImPr/20241219/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-checksum-crc32c;x-amz-content-sha256;x-amz-date, Signature=c0a91fc76f95fa568d0dd98fb73c8f8b1e209c94d1d9693435ac10153549e230","content-type":"application/octet-stream","x-amz-checksum-crc32c":"D1MWzQ==","x-amz-content-sha256":"ab6268c93d01882039cb89cbec4038fe05bc7f48bfe84eb23ee1e34600799a3b","x-amz-date":"20241219T143153Z"} SyntaxError: Unexpected end of JSON input    at JSON.parse (<anonymous>)    at NamespaceFS.complete_object_upload (/usr/local/noobaa-core/src/sdk/namespace_fs.js:1873:47)    at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)    at async Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)    at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19)    at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)

Warp run started at 08:04 -> 10:04 for 2hrs run

ls -lrt warp-mixed-2024-12-19[080323]-1MuL.csv.zst
-rw-r--r--. 1 root root 1143577 Dec 19 10:04 'warp-mixed-2024-12-19[080323]-1MuL.csv.zst'

mmfs.log (at 08:24:37) --> nothing much the phase4 IP movement is happening 

2024-12-19_08:23:41.551-0500: [I] Connected to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[0]
2024-12-19_08:23:41.724-0500: [N] Connecting to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[1]
2024-12-19_08:23:41.745-0500: [I] Connected to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[1]
2024-12-19_08:24:15.287-0500: [I] mmcesop: moveIP moveIP phase1 9.42.93.99_4_1
2024-12-19_08:24:16.646-0500: [I] mmcesop: run: /usr/sbin/ip addr del 9.42.93.99/24 dev env2
rc=0
2024-12-19_08:24:17.057-0500: [I] mmcesop: moveIP moveIP phase2 9.42.93.99_4_1
2024-12-19_08:24:17.768-0500: [I] mmcesop: moveIP moveIP phase4 9.42.93.99_4_1
2024-12-19_08:33:37.755-0500: [I] Command: mmfsadm test assert

mmfs.log (08:46:11 as per noobaa.log)
2024-12-19_08:46:04.426-0500: [N] Connecting to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[0]
2024-12-19_08:46:04.435-0500: [I] Connected to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[0]
2024-12-19_08:46:04.654-0500: [N] Connecting to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[1]
2024-12-19_08:46:04.665-0500: [I] Connected to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[1]
2024-12-19_08:46:38.105-0500: [I] mmcesop: moveIP moveIP phase1 9.42.93.99_4_1
2024-12-19_08:46:38.402-0500: [I] mmcesop: run: /usr/sbin/ip addr del 9.42.93.99/24 dev env2
rc=0

mmfs.log (09:31:53 as per noobaa.log the JSON error), not anything in the mmfs.log 
2024-12-19_09:30:50.418-0500: [N] Connecting to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[1]
2024-12-19_09:30:50.435-0500: [I] Connected to 9.42.93.72 gpfs-p10-gui0 <c0n0>:[1]
2024-12-19_09:31:49.720-0500: [I] mmcesop: moveIP moveIP phase1 9.42.93.99_4_1
2024-12-19_09:31:54.844-0500: [W] mmcesop: sendEventOnOpenNfsConnection: NFS connection not closed for ip 9.42.93.99. Remaining data = <[::ffff:9.42.124.113]:885 [::ffff:9.42.93.99]:2049>
2024-12-19_09:31:54.858-0500: [I] mmcesop: run: /usr/sbin/ip addr del 9.42.93.99/24 dev env2
rc=0
2024-12-19_09:31:55.246-0500: [I] mmcesop: moveIP moveIP phase2 9.42.93.99_4_1
2024-12-19_09:31:55.951-0500: [I] mmcesop: moveIP moveIP phase4 9.42.93.99_4_1
2024-12-19_09:40:46.625-0500: [I] Command: mmfsadm test assert
2024-12-19_09:40:46.625-0500: [X] logAssertFailed: !"test assert"
 

Expected behavior

Error w/r/t JSON output is concerning, does it has a problem w/r/t metadata access of JSON output ?

Steps to reproduce

Warp run and perform HA of GPFS assert , sigsegv in a loop

More information - Screenshots / Logs / Other output

Output logs uploaded to box folder https://ibm.ent.box.com/folder/299558120602

@rkomandu rkomandu added the NS-FS label Dec 20, 2024
@shirady
Copy link
Contributor

shirady commented Dec 22, 2024

Hi,

Summary so far:

  1. As expected, I tried to search two keys (hLVXsVmn/36.8aiUq4RMhm1N8nKc.rnd and PiKFysE%29/88.u%2917eBcxgwlaqNK8.rnd encoded) that were under Resource from the S3 ERROR logs and couldn't find them in the bucket (as their complete-multi-part failed).
  2. Unexpected - When I tried to list the files under the bucket under the hidden directory that contains .noobaa-nsfs prefix under the directory multipart-uploads I would expect to see many directories (at least as the number of errors on the complete-multi-part failed), but I saw only one and its key did not match the one with the error.
  3. We also tried to see if the keys are mentioned as abort multipart upload - and didn't find it in both nodes.
  4. Wasn't planned as a part of the issue - but we saw errors related to unlink operation of the file with the suffix of final, and we might want to change the behavior to avoid this error (linkfileat on GPFS overrides the previous link).

Additional Information:

  1. We wanted to look at the file, but currently, I couldn't find the file - that was expected as the operation failed during NamespaceFS.complete_object_upload.

I will share something that I saw:
In the error logs we can see the bucket name and the key name

<Resource>/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/PiKFysE%29/88.u%2917eBcxgwlaqNK8.rnd

But in the bucket files there is no PiKFysE%29, only the same prefix with suffix ) (the encoded of ) is %29 in js):
ls -al /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate | grep 'PiKFysE'

drwxrwx---. 2 18002 18000 16384 Dec 19 10:04 PiKFysE)

But when looking at the file:
I don't see the prefix 88, for example:
ls -al /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/'PiKFysE)' | grep 88

-rw-rw----. 1 18002 18000 256000000 Dec 19 09:43 186.KbWqO88xwWwp8bm2.rnd
-rw-rw----. 1 18002 18000 256000000 Dec 19 09:43 188.GjaC(tncH2B7wL65.rnd

From what I saw the structure name of the files is <number>.<characters>.rnd or wrapped with single quotes '<number>.<characters>.rnd'

  1. Since we suspect that the failure at the point of parsing a JSON file:
    const { data: create_params_buffer } = await nb_native().fs.readFile(
    fs_context,
    path.join(params.mpu_path, 'create_object_upload')
    );
    const upload_params = { fs_context, upload_path, open_mode, file_path, params, target_file };
    const create_params_parsed = JSON.parse(create_params_buffer.toString());

    I also tried to search for a multipart upload that was not completed, so I tried to search for:
    ls -al /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate | grep .noobaa

drwxrwx---. 3 18002 18000 4096 Dec 19 07:51 .noobaa-nsfs_6764165e0bf4eb72dad51b45

I saw only 1 file (although I would expect to see a directory for each operation of a complete multipart upload that failed).
Anyway, I tried to see the content:
ls -al /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/

total 17
drwxrwx---. 3 18002 18000 4096 Dec 19 07:51 .
drwxrwx---. 43 18002 18000 8192 Dec 19 08:05 ..
drwxrwx---. 3 18002 18000 8192 Dec 19 10:04 multipart-uploads

ls -al /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads

total 9
drwxrwx---. 3 18002 18000 8192 Dec 19 10:04 .
drwxrwx---. 3 18002 18000 4096 Dec 19 07:51 ..
drwxrwx---. 2 18002 18000 4096 Dec 23 04:15 2a8e302c-fa5f-4a49-88dd-c70399590c9

ls -al /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/2a8e302c-fa5f-4a49-88dd-c70399590c95

total 258057
drwxrwx---. 2 18002 18000 4096 Dec 23 04:15 .
drwxrwx---. 3 18002 18000 8192 Dec 19 10:04 ..
-rw-rw----. 1 18002 18000 450 Dec 19 10:03 create_object_upload
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-1
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-10
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-11
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-12
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-13
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-14
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-15
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-16
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-2
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-3
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-4
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-5
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-6
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-7
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-8
-rw-rw----. 1 18002 18000 0 Dec 19 10:03 part-9
-rw-rw----. 2 18002 18000 256000000 Dec 19 10:03 parts-size-16777216
-rw-rw----. 1 18002 18000 69468160 Dec 19 10:03 parts-size-4341760

cat /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/2a8e302c-fa5f-4a49-88dd-c70399590c95/create_object_upload | jq .

{
  "bucket": "newbucket-warp-ha-efix-downstream-19dec2024-defrecreate",
  "key": "4fTpXE7s/208.ezMZmJlL1AQyxVNY.rnd",
  "content_type": "application/octet-stream",
  "xattr": {},
  "storage_class": "STANDARD",
  "obj_id": "2a8e302c-fa5f-4a49-88dd-c70399590c95",
  "mpu_path": "/ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/2a8e302c-fa5f-4a49-88dd-c70399590c95",
  "source_stream": null
}

In this example, we can see a valid JSON that can be parsed.
This file was created during the create_object_upload

async create_object_upload(params, object_sdk) {
try {
const fs_context = this.prepare_fs_context(object_sdk);
await this._load_bucket(params, fs_context);
await this._throw_if_low_space(fs_context);
params.obj_id = uuidv4();
params.mpu_path = this._mpu_path(params);
await native_fs_utils._create_path(params.mpu_path, fs_context);
const create_params = JSON.stringify({ ...params, source_stream: null });
await this._throw_if_storage_class_not_supported(params.storage_class);
await nb_native().fs.writeFile(
fs_context,
path.join(params.mpu_path, 'create_object_upload'),
Buffer.from(create_params), {
mode: native_fs_utils.get_umasked_mode(config.BASE_MODE_FILE),
},
);
return { obj_id: params.obj_id };
} catch (err) {
throw native_fs_utils.translate_error_codes(err, native_fs_utils.entity_enum.OBJECT);
}
}

To make sure that I found all the existing files that have create_object_upload in their name I run:
cd /ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate
newbucket-warp-ha-efix-downstream-19dec2024-defrecreate]# find . -iname '*create_object_upload*'

./.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/2a8e302c-fa5f-4a49-88dd-c70399590c95/create_object_upload

  1. I also tried to search for: grep "NamespaceFS: abort_object_upload" noobaa.log-20241220 and saw one appearance that is not related to the errors above:

Dec 19 04:44:40 gpfs-p10-gui1 [1355200]: [nsfs/1355200] [L0] core.sdk.namespace_fs:: NamespaceFS: abort_object_upload /ibm/fvt_fs/s3user-18001-dir/newbucket-18001-ha-efix-ds-ioupload-sus-res-19dec2024/.noobaa-nsfs_6763bd68a08abdac7c099214/multipart-uploads/d3d87434-2f4b-41e7-a6d1-9b1114e52b13

looking at this upload ID we didn't see it is related to the mentioned error:
grep "d3d87434-2f4b-41e7-a6d1-9b1114e52b13" noobaa.log-20241220 | grep "SyntaxError" (no output)

We also tried to look at the second node with zgrep "NamespaceFS: abort_object_upload" /var/log/noobaa.log-20241219.gz still no output (also for a day before zgrep "NamespaceFS: abort_object_upload" /var/log/noobaa.log-20241218.gz and a day after zgrep "NamespaceFS: abort_object_upload" /var/log/noobaa.log-20241220.gz just in case there is a mismatch in the times).

The rationale for this search is to understand if the directory of multipart upload of the upload ID was deleted as part of this operation:

async abort_object_upload(params, object_sdk) {
const fs_context = this.prepare_fs_context(object_sdk);
await this._load_multipart(params, fs_context);
dbg.log0('NamespaceFS: abort_object_upload', params.mpu_path);
await native_fs_utils.folder_delete(params.mpu_path, fs_context);
}

The multipart upload directory is deleted either from the abort (as you can see above) or from the complete, but we didn't find those keys in either of these operations.

if (config.NSFS_REMOVE_PARTS_ON_COMPLETE) await native_fs_utils.folder_delete(params.mpu_path, fs_context);

  1. Attached the EEXIST on the file ends with final
Dec 19 08:24:37 gpfs-p10-gui1 [1968839]: [nsfs/1968839] [ERROR] core.sdk.namespace_fs::  [Error: File exists] { code: 'EEXIST', context: 'Link _oldpath=/ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/819f611b-6b57-4348-8bcb-df3750754e20/parts-size-16777216 _newpath=/ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/819f611b-6b57-4348-8bcb-df3750754e20/final ' }
Dec 19 08:24:37 gpfs-p10-gui1 [1968839]: [nsfs/1968839] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/4fTpXE7s/40.SC0I7WymYEKFkN%28r.rnd?uploadId=819f611b-6b57-4348-8bcb-df3750754e20</Resource><RequestId>m4vcsjhz-8pyuo6-11kk</RequestId></Error> POST /newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/4fTpXE7s/40.SC0I7WymYEKFkN%28r.rnd?uploadId=819f611b-6b57-4348-8bcb-df3750754e20 {"host":"gpfs-p10-s3-ces.rtp.raleigh.ibm.com:6443","user-agent":"MinIO (linux; ppc64le) minio-go/v7.0.70 warp/(dev)","content-length":"1450","authorization":"AWS4-HMAC-SHA256 Credential=5gzVKZdN9XerVoPtImPr/20241219/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-checksum-crc32c;x-amz-content-sha256;x-amz-date, Signature=94806f26504bc6f5c226287dbb7fe7d85230fb5b008632b9045788bde6a0dc5e","content-type":"application/octet-stream","x-amz-checksum-crc32c":"UwvwDQ==","x-amz-content-sha256":"a9b1c640dd0b24f958029e33a859acdba2c646faaa0d59a4a2efedb33aec2995","x-amz-date":"20241219T132437Z"} Error: File exists - context: Link _oldpath=/ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/819f611b-6b57-4348-8bcb-df3750754e20/parts-size-16777216 _newpath=/ibm/fvt_fs/s3user-18002-dir/newbucket-warp-ha-efix-downstream-19dec2024-defrecreate/.noobaa-nsfs_6764165e0bf4eb72dad51b45/multipart-uploads/819f611b-6b57-4348-8bcb-df3750754e20/final

@shirady
Copy link
Contributor

shirady commented Jan 2, 2025

Hi @rkomandu,
I looked at the cluster again and as mentioned in the comment above I could not find the directory with the content of the during the process of multipart upload file of the keys that were failed on complete multipart upload.

What I can suggest is when trying to reproduce it to run a shell script that would display the json files that we have once in a while with a timestamp (and save this output in logs).

To check the idea on my local machine I ran something like that on a script that was created in the directory of the bucket
touch my_script.sh; chmod +x my-my_script.sh (I run the example on a script that was created inside the bucket, but you can change it and add the timestamp in the printings so we will understand what we read).

#!/bin/bash

while true; do
  echo "printing all create_object_upload files..."
  find . -type f -name 'create_object_upload' -exec cat {} +; echo
  sleep 10
done

Partial output for example:

printing all create_object_upload files...

{"bucket":"mpu-bucket","key":"zuzu","content_type":"application/octet-stream","xattr":{},"storage_class":"STANDARD","obj_id":"c29cf8f6-c690-4a58-b604-b27ae6ae6f94","mpu_path":"/Users/buckets/mpu-bucket/.noobaa-nsfs_67693c71d815d39029f03b4b/multipart-uploads/c29cf8f6-c690-4a58-b604-b27ae6ae6f94","source_stream":null}

We can see in the output that this is a valid JSON.
With something like that, we can search the bucket and key that had the error in those logs and try to see their file.
This should help uo with the open question of WHAT happens to the JSON file (and then we still need to understand WHY it happens...)

@shirady
Copy link
Contributor

shirady commented Jan 9, 2025

Hi @rkomandu,
I edited the script so you can pass a full path and run it:

echo "$(date) Printing all create_object_upload files in" $1
SEARCH_PATH=$1 # must be full path
index=1

while true; do
  echo; echo "Start Print number ${index}"
  find ${SEARCH_PATH} -type f -name 'create_object_upload' -exec sh -c 'echo "File {}"; cat {}' _ {} \;
  echo; echo "End Print number ${index}"
  ((index++))
  sleep 10
done

I also added log printing lines in the functions so they will be printed (as a string without parsing it to JSON).
Note: It is not efficient, as I read the file again, but it is only for reproduction.
diff.txt
Please let me know if you want a new RPM for the reproduction.

For example:
On the created:

Jan-9 15:54:32.750 [nsfs/84027]    [L2] core.sdk.namespace_fs:: SDSD create_object_upload {"bucket":"mpu-bucket","key":"riri","content_type":"application/octet-stream","xattr":{},"storage_class":"STANDARD","obj_id":"622a5ea1-e142-4dcf-8b1c-e69c07d63ad7","mpu_path":"/Users/buckets/mpu-bucket/.noobaa-nsfs_67693c71d815d39029f03b4b/multipart-uploads/622a5ea1-e142-4dcf-8b1c-e69c07d63ad7","source_stream":null}

I removed the first curly bracket ({) in the JSON so you can see:

an-9 16:08:43.535 [nsfs/84027]    [L2] core.sdk.namespace_fs:: SDSD complete_object_upload "bucket":"mpu-bucket","key":"rara","content_type":"application/octet-stream","xattr":{},"storage_class":"STANDARD","obj_id":"3c597ad3-ab54-462d-a662-563237bdd056","mpu_path":"/Users/buckets/mpu-bucket/.noobaa-nsfs_67693c71d815d39029f03b4b/multipart-uploads/3c597ad3-ab54-462d-a662-563237bdd056","source_stream":null}

And it is printed before the action fails:

Jan-9 16:08:43.537 [nsfs/84027] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/mpu-bucket/rara?uploadId=3c597ad3-ab54-462d-a662-563237bdd056</Resource><RequestId>m5pem4ug-30cocm-ikk</RequestId></Error> POST /mpu-bucket/rara?uploadId=3c597ad3-ab54-462d-a662-563237bdd056 {"host":"localhost:6443","accept-encoding":"identity","user-agent":"aws-cli/2.17.11 md/awscrt#0.20.11 ua/2.0 os/macos#24.2.0 md/arch#arm64 lang/python#3.11.10 md/pyimpl#CPython cfg/retry-mode#standard md/installer#source md/prompt#off md/command#s3api.complete-multipart-upload","x-amz-date":"20250109T140843Z","x-amz-content-sha256":"cd078f6fc04d3c6f839cf1c37aeed7176de350c60f540bdbfeaab2e8b598c868","authorization":"AWS4-HMAC-SHA256 Credential=Dwertyuiopasdfg11001/20250109/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=e7bd20d3f4076a39cb5721a6034d82d9eac12652a20b51efbad613b88e3dcc38","content-length":"180"} SyntaxError: Unexpected non-whitespace character after JSON at position 8 (line 1 column 9)
    at JSON.parse (<anonymous>)
    at NamespaceFS.complete_object_upload (/Users/shiradymnik/SourceCode/noobaa-core/src/sdk/namespace_fs.js:1897:47)
    at async NsfsObjectSDK._call_op_and_update_stats (/Users/shiradymnik/SourceCode/noobaa-core/src/sdk/object_sdk.js:561:27)
    at async Object.post_object_uploadId [as handler] (/Users/shiradymnik/SourceCode/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:29:19)
    at async handle_request (/Users/shiradymnik/SourceCode/noobaa-core/src/endpoint/s3/s3_rest.js:173:19)
    at async Object.s3_rest [as handler] (/Users/shiradymnik/SourceCode/noobaa-core/src/endpoint/s3/s3_rest.js:67:9)
2025-01-09 16:08:43.537378 [PID-84027/TID-259] [L1] FS::FSWorker::Begin: Stat _path=/etc/noobaa.conf.d/buckets/mpu-bucket.json

@shirady
Copy link
Contributor

shirady commented Jan 13, 2025

Hi @rkomandu,
I uploaded the mentioned printings (see comment above) in an upstream image with key name noobaa-core-5.17.3-20250113-add-printings-json-parse-issue.el9.ppc64le.rpm
You can see it in by running: aws s3 ls s3://noobaa-core-rpms | sort | grep 20250113, the output is:

2025-01-13 02:42:35  226388080 noobaa-core-5.18.0-20250113-master.el9.x86_64.rpm
2025-01-13 03:38:45  229707142 noobaa-core-5.18.0-20250113-master.el9.ppc64le.rpm
2025-01-13 14:42:08   83524352 noobaa-core-5.17.3-20250113-add-printings-json-parse-issue.el9.ppc64le.rpm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants