Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cmake s3 support.wif #2741

Merged
merged 23 commits into from
Oct 2, 2023
Merged

Fix cmake s3 support.wif #2741

merged 23 commits into from
Oct 2, 2023

Conversation

WardF
Copy link
Member

@WardF WardF commented Aug 23, 2023

A small mistake in logic was resulting in Amazon S3 SDK libraries not being properly linked against, even if found on the system. There are additional steps required for Windows (Visual Studio) builds, so this PR is not yet ready. See

for reference.

Todo

  • Add changes required for Windows builds.
  • Update documentation as need be.
  • Update CI to include testing for AWS S3 to catch this sort of thing.

@WardF WardF added this to the 4.9.3 milestone Aug 23, 2023
@WardF WardF self-assigned this Aug 23, 2023
Copy link
Collaborator

@DennisHeimbigner DennisHeimbigner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first change to liblib/CMakeLists.txt is incorrect.
It should stay as

IF(ENABLE_S3_AWS)

…iscovered at configure time is functional. Still need to wire it into autotools.
@WardF
Copy link
Member Author

WardF commented Aug 29, 2023

@DennisHeimbigner When I run the S3 tests on MacOS using the Unidata credentials and turning on -DWITH_S3_TESTING=TRUE, I'm seeing two different failure states in MacOS depending on whether I'm using the internal S3 code or the aws-sdk. Can you provide any insight? I wonder if the files in the Unidata bucket need to be updated to reflect changes in the tests (as sometimes happens with other string-comparison tests in our test suite).

Internal S3

223/233 Test #224: nczarr_test_run_strings ...............***Failed   13.79 sec
findplugin.sh loaded
final HDF5_PLUGIN_DIR=/Users/wfisher/Desktop/gitprojects/netcdf-c/build/plugins
*** Test: nczarr string write then read; format=file
*** create pure zarr file
*** create nczarr file
*** read purezarr
Default action: objdump
*** read nczarr
Default action: objdump
*** verify zarr output
*** verify nczarr output
*** Test: nczarr string write then read; format=s3
*** create pure zarr file
*** create nczarr file
*** read purezarr
/Users/wfisher/Desktop/gitprojects/netcdf-c/build/ncdump/ncdump: https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data/netcdf-c/testdir_strings_1804289383/tmp_string_zarr#mode=zarr,s3: NetCDF: Attempt to read empty NCZarr map key

224/233 Test #225: nczarr_test_run_scalar ................***Failed   13.43 sec
findplugin.sh loaded
final HDF5_PLUGIN_DIR=/Users/wfisher/Desktop/gitprojects/netcdf-c/build/plugins
*** Test: scalar write/read
*** create pure zarr file
*** create nczarr file
*** read purezarr
Default type: ubyte
Default action: objdump
*** read nczarr
Default type: ubyte
Default action: objdump
*** verify
*** Test: scalar write/read
*** create pure zarr file
*** create nczarr file
*** read purezarr
Default type: ubyte
Default action: objdump
*** read nczarr
Default type: ubyte
Default action: objdump
*** verify
1a2,3
> dimensions:
> 	_scalar_ = 1 ;
3c5
< 	int v ;
---
> 	int v(_scalar_) ;

AWS-SDK

The tests just hang until ctest times out.

@DennisHeimbigner
Copy link
Collaborator

Please try running again.
I was running the tests myself and realized that it might interfere with you running tests.
I am fixing this, but just in case, you should rerun the tests just to make sure that it was
not interference from me.

@WardF
Copy link
Member Author

WardF commented Aug 29, 2023

@DennisHeimbigner Ok, I'm retesting. Out of curiosity, roughly speaking, how long should I expect nczarr_test/run_chunkcases to take? Can the --with-s3-testing tests be run in parallel or do they require running in sequence? I run tests in parallel ( -j 4 ) and if that's a problem, I'll stop.

@DennisHeimbigner
Copy link
Collaborator

Do not execute the nczarr tests in parallel, there is a significant problem with interference.
Some of the tests can timeout if you run in parallel and others will just fail.

@WardF
Copy link
Member Author

WardF commented Aug 29, 2023 via email

@WardF
Copy link
Member Author

WardF commented Aug 29, 2023

@DennisHeimbigner I re-ran the tests in serial fashion, the only one failing now is nczarr_test/run_interop. This is failing with -DENABLE_S3=TRUE -DENABLE_S3_INTERNAL=TRUE -DWITH_S3_TESTING=TRUE. How long should this test take? How long does it take when it passes on your system?

@DennisHeimbigner
Copy link
Collaborator

Is it failing or timing out?

@WardF
Copy link
Member Author

WardF commented Aug 29, 2023

@DennisHeimbigner The test times out after 1500 seconds, resulting in a ctest failure.

@DennisHeimbigner
Copy link
Collaborator

Ok. Just disable that test. I discovered that error, but it is tied up with unlimited, so forget about it for now.

@DennisHeimbigner
Copy link
Collaborator

BTW have you been able to tell if the pure awssdk test hangs?

@WardF
Copy link
Member Author

WardF commented Aug 30, 2023

@DennisHeimbigner the pure awssdk test does not hang for the AWSSDK I compile myself on Windows. I'll test the version of the awssdk used in the original python/conda-forge issue next.

@DennisHeimbigner
Copy link
Collaborator

Possibly relevant. I tried building under windows again, but using aws-sdk-cpp.
The test ncdump/ref_ctest.c hung.
I was able to run it in visual studio and it indicated that there was exception
thrown in Aws::Monitoring::InitMonitoring inside Aws:InitAPI.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Aug 31, 2023

ok, try this next:

  1. edit run_s3sdk.sh
  2. At about line 30, surround this line:
${CMD} ${execdir}/test_s3sdk -u "${URL}" -k "${S3ISOPATH}/test_s3sdk.txt" write

with the lines

export NCTRACING=15
${CMD} ${execdir}/test_s3sdk -u "${URL}" -k "${S3ISOPATH}/test_s3sdk.txt" write
unset NCTRACING
  1. Edit the file libnczarr/zdebug.h
  2. At about line 8, change
#undef ZCATCH
#undef ZTRACING

to

#define ZCATCH
#define ZTRACING
  1. rebuild netcdf-c library

Note that this may produce voluminous output.

@WardF
Copy link
Member Author

WardF commented Sep 5, 2023

@DennisHeimbigner the additional output is below:

wfisher@shetland:~/Desktop/gh2741-s3/netcdf-c/build$ ctest -R s3sdk -V
UpdateCTestConfiguration  from :/Users/wfisher/Desktop/gh2741-s3/netcdf-c/build/DartConfiguration.tcl
Parse Config file:/Users/wfisher/Desktop/gh2741-s3/netcdf-c/build/DartConfiguration.tcl
 Add coverage exclude regular expressions.
UpdateCTestConfiguration  from :/Users/wfisher/Desktop/gh2741-s3/netcdf-c/build/DartConfiguration.tcl
Parse Config file:/Users/wfisher/Desktop/gh2741-s3/netcdf-c/build/DartConfiguration.tcl
Test project /Users/wfisher/Desktop/gh2741-s3/netcdf-c/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 209
    Start 209: unit_test_run_s3sdk

209: Test command: /opt/homebrew/bin/bash "-c" "export srcdir=/Users/wfisher/Desktop/gh2741-s3/netcdf-c/unit_test;export TOPSRCDIR=/Users/wfisher/Desktop/gh2741-s3/netcdf-c;bash /Users/wfisher/Desktop/gh2741-s3/netcdf-c/build/unit_test/run_s3sdk.sh "
209: Working Directory: /Users/wfisher/Desktop/gh2741-s3/netcdf-c/build/unit_test
209: Test timeout computed to be: 1500
209: Running S3 AWSSDK Unit Tests.
209: 	o Checking https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data exists
209: testbucketexists: exists=1
209: ***PASS
209: 	o Checking write to https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data
209: ***FAIL: (-138) NetCDF: S3 error @ testwrite:224
209: Enter: (6): NCZ_filter_finalize:
209: Exit: (6): NCZ_filter_finalize: 
209: Enter: (6): NCZ_filter_finalize:
209: Exit: (6): NCZ_filter_finalize: 
1/1 Test #209: unit_test_run_s3sdk ..............***Failed  259.27 sec
Running S3 AWSSDK Unit Tests.
	o Checking https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data exists
testbucketexists: exists=1
***PASS
	o Checking write to https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data
***FAIL: (-138) NetCDF: S3 error @ testwrite:224
Enter: (6): NCZ_filter_finalize:
Exit: (6): NCZ_filter_finalize: 
Enter: (6): NCZ_filter_finalize:
Exit: (6): NCZ_filter_finalize: 


0% tests passed, 1 tests failed out of 1

Total Test time (real) = 259.28 sec

The following tests FAILED:
	209 - unit_test_run_s3sdk (Failed)
Errors while running CTest

@DennisHeimbigner
Copy link
Collaborator

Well that did not help any. Oh well.

@WardF
Copy link
Member Author

WardF commented Sep 8, 2023

Reverted previous changes because I realized that I was using an axe, incorrectly, when I should be using a scalpel. The code initially changed is working for other tests, just not the test_s3sdk.sh test. Indicators are currently that there is an issue that manifests when the number of bytes sent does not align with the number of bytes we say we're sending. This is not obviously the case, and also it works on linux if not MacOS. Before we can actually sort out Windows, I would like to know that this is all working on MacOS and Linux, to ensure there aren't any weird errors with a common root, manifesting differently depending on platform and compiler.

@DennisHeimbigner
Copy link
Collaborator

Refresh my memory; what happened when you did not specify a length?

@DennisHeimbigner
Copy link
Collaborator

BTW, if you set the cmake option -DENABLE_CMAKE_LOGGING=on
then I believe you can see what curl is generating for the length.
I believe it streams the output through the aws logging system.

@WardF
Copy link
Member Author

WardF commented Sep 8, 2023

Refresh my memory; what happened when you did not specify a length?

test_s3sdk passed, but a lot of other tests started failing, because the length defaulted to 0.

@DennisHeimbigner
Copy link
Collaborator

What happens if you set the length to -1 ?

@WardF
Copy link
Member Author

WardF commented Sep 8, 2023

@dmh

What happens if you set the length to -1 ?

All the tests fail

@WardF
Copy link
Member Author

WardF commented Sep 13, 2023

@DennisHeimbigner I'd like to get this working with the s3 internal api across platforms, for now, since the issue seems to be specific to the awssdk-cpp library. As of this morning, I'm seeing a new issue with -DENABLE_S3_INTERNAL=TRUE, Error: Unexpected Element: <ChecksumAlgorithm>. This breaks the tests, and s3util. Do you have any idea what happened there?

@DennisHeimbigner
Copy link
Collaborator

The context is this. ncs3sdk_h5.c makes a ListObject request that returns a complex XML
document. I parse the xml and then walk it to produce internal data structures corresponding
to that XML document. It appears then that the XML document has a ChecksumAlgorithm
element in unexpected place in the XML. What I need is to see the complete XML document
so I can see where the ChecksumAlgorithm element is occurring and then modify my walker
to handle it.
I will review the code and send some instructions so we can see the whole XML document.

@DennisHeimbigner
Copy link
Collaborator

Ok, I think if you change the following lines in ncs3sdk_h5.c, then it should output the XML document(s).
Lines to change:

#undef DEBUG
to
#define DEBUG

@WardF
Copy link
Member Author

WardF commented Sep 13, 2023

@DennisHeimbigner for whatever reason, after spending most of the day on this and related issues, it has resolved itself. I was not running into an issue where I was running the tests in multiple places at the same time, but I did observe failures under Linux and MacOS. Now, however, things are back where they started, e.g. -WITH_S3_TESTING=TRUE -DENABLE_S3_INTERNAL=FALSE hangs on MacOS.

@DennisHeimbigner
Copy link
Collaborator

Did the ChecksumAlgorithm issue resolve also?

@WardF
Copy link
Member Author

WardF commented Sep 14, 2023

@DennisHeimbigner The ChecksumAlgorithm issue is resolved; the hang has not. I think for now, for the sake of expediency, we will need to tell non-linux platforms to use -DENABLE_S3_INTERNAL=TRUE until we figure out what, exactly, is going on with the awssdk-cpp on other platforms. I'm still seeing test failures in my MSYS2-bash environment, I'll send you a log to see if you have any feedback (I know the current main branch works for you under cygwin).

@WardF
Copy link
Member Author

WardF commented Sep 28, 2023

@DennisHeimbigner it did.

I think our best bet right now is to use S3_INTERNAL on Windows, but I'm seeing other failures that preclude a release. Does anything leap out at you with these?

@DennisHeimbigner
Copy link
Collaborator

Your msys failures are a mystery to me. The first failing test is ncdap_test_tst_ncdap3.
It works for under windows+cygwin and it works on the msys run of github actions.
I can think of a couple of possibilities.

  1. the file /home/wardf/netcdf-c/ncdap_test/testdata3/synth1.dds and/or the file /home/wardf/netcdf-c/ncdap_test/testdata3/synth1.das are actually missing. can you check to see if they are there?
  2. there is a problem with path conversion, although that path looks ok to me.

@WardF WardF merged commit 3c013ce into Unidata:main Oct 2, 2023
97 checks passed
@WardF
Copy link
Member Author

WardF commented Oct 2, 2023

Merged to date as part of #2755, opening another PR to continue tracking this issue.

@WardF WardF deleted the fix-cmake-s3-support.wif branch October 2, 2023 16:08
@WardF WardF mentioned this pull request Oct 2, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants