Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add audbackend.backend.Minio #231

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open

Add audbackend.backend.Minio #231

wants to merge 43 commits into from

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented Sep 3, 2024

Implements a backend for MinIO.

As MinIO is compatible with S3 storage, this might already work with other S3 storage as well. @ChristianGeng has tested this partly for a S3 instance at Hetzner, which worked with the exception that creation and deletion of buckets and empty files were not supported.

I decided to handle the authentication the same way as we do for audbackend.backend.Artifactory, by allowing to specify the username / password inside a config file or environment variable. In addition, we have the class method audbackend.backend.Minio.get_authentication() to retrieve them, in order to check their values.
For local MinIO servers we also need to set the secure argument to False. I added this to the config file as well, in oder to make it part of the backend configuration, and added the class method audbackend.backend.Minio.get_config(), which returns all entries as dictionary.
In certain edge cases you might want to add other arguments to the underlying minio.Minio class. For that reason I decided to add support for **kwargs, but do no checks if there is any conflict with authentication as I think this is only for power users anyway.

image

image

image

Summary by Sourcery

Add support for MinIO as a new backend in the audbackend library, enabling storage operations with MinIO. Refactor tests to accommodate the new backend and ensure robust testing of its functionality.

New Features:

  • Introduce a new backend for MinIO, allowing integration with MinIO storage systems.

Enhancements:

  • Refactor test parameterization to use a centralized list of backend-interface combinations, improving test maintainability.

Tests:

  • Add comprehensive tests for the new MinIO backend, including authentication, file operations, and configuration parsing.

@hagenw hagenw marked this pull request as draft September 3, 2024 15:00
@hagenw hagenw changed the title Add audbackend.backend.MinIO Add audbackend.backend.Minio Sep 4, 2024
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.0%. Comparing base (937da89) to head (af109fc).

Additional details and impacted files
Files with missing lines Coverage Δ
audbackend/backend/__init__.py 100.0% <100.0%> (ø)
audbackend/core/api.py 100.0% <100.0%> (ø)
audbackend/core/backend/minio.py 100.0% <100.0%> (ø)

@hagenw hagenw mentioned this pull request Oct 1, 2024

"""
path = path.replace(self.sep, "/")
if path.startswith("/"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python versions greater than 3.8 it would be safe to remove the conditional and replace it with. path = path.removeprefix("/"). I believe that it is getting close to deprecating 3.8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When removing Python 3.8, there are other things we should update, e.g. if I remember correctly the typing module might no longer be needed. I would prefer to first create a general issue to gather all the things we would like to update when removing Python 3.8, and then do this in dedicated pull requests.

Which means for now in this pull request we need to still support Python 3.8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now tracked in [tools/meta#12] internally.

cannot be closed.

"""
# At the moment, this is automatically handled.
Copy link
Member

@ChristianGeng ChristianGeng Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatically handled means by the minio client lib. Would it make sense to make this explicit?

The docstring is identical to the one in the base class Base. Possibly the information that this is automatically handled should go into the subclass docstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed _close() and added close() with an updated docstring instead:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, we can also take care about opening and closing by providing an http_client object to MinIO, but as it will be very complicated to find out the best settings in that case, I would stay for now with the default and just state that closing is automatically handled by MinIO.

r"""Copy file on backend."""
src_path = self.path(src_path)
dst_path = self.path(dst_path)
# `copy_object()` has a maximum size limit of 5GB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5GB limit is a limit of S3 native? This post suggests this at least. So the minio client uses s3n?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know, I just found the information in the MinIO documentation. As we don't have a test for this, maybe we should also lower the value slightly to be on the safe side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably lowering would be ok to get it working right now.
The background is a different one: if MinIO has the limit of 5GB this suggests that the minio package is using the old s3n interface. Afaiu s3a succeeds s3n. The difference between is that s3n supports objects up to 5GB, but s3a should support 5TB and has improved performance. I cannof find the link right now, but I think I had seen that fsspec has an implementation of s3a.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the size limit to 4.9 GB to be on the safe side.

* Add support for owner()

* Be more conservative regarding owner
def _size(
self,
path: str,
) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used by _copy_file where it is used to determine whether file size exceeds 5GB. However I can see no cast in _copy_file. Should the return type hint be integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it returns indeed an integer. I corrected the typing information.

backend.open()


def test_get_config(tmpdir, hosts, hide_credentials):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests parsing of the configuration, but does not have a one-line docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a docstring.

)
def test_maven_file_structure(
tmpdir, interface, file, version, extensions, regex, expected
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the test name is already telling, a single line docstring would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a docstring.

@@ -91,6 +91,7 @@ def tree(tmpdir, request):
[
(audbackend.backend.Artifactory, audbackend.interface.Unversioned),
(audbackend.backend.FileSystem, audbackend.interface.Unversioned),
(audbackend.backend.Minio, audbackend.interface.Unversioned),
Copy link
Member

@ChristianGeng ChristianGeng Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface parametrization seems to vary little between tests.

Would it make sense to recast this into a fixture? Possibly like this:

@pytest.fixture(params=[
    (audbackend.backend.Artifactory, audbackend.interface.Unversioned),
    (audbackend.backend.FileSystem, audbackend.interface.Unversioned),
    (audbackend.backend.Minio, audbackend.interface.Unversioned),
    (SingleFolder, audbackend.interface.Unversioned),
])
def interface(request):
    return request.param

Probably local to test_interface_unversioned.py would be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface was already a fixture, but I could solve the problem by a variable:

# Backend-interface combinations to use in all tests
backend_interface_combinations = [ 
    (audbackend.backend.Artifactory, audbackend.interface.Unversioned),
    (audbackend.backend.FileSystem, audbackend.interface.Unversioned),
    (audbackend.backend.Minio, audbackend.interface.Unversioned),
    (SingleFolder, audbackend.interface.Unversioned),
]

# ...

@pytest.mark.parametrize(
    "interface",
    backend_interface_combinations,
    indirect=True,
)
def test_archive(tmpdir, tree, archive, files, tmp_root, interface, expected):

@hagenw hagenw marked this pull request as ready for review October 22, 2024 12:41
Copy link

sourcery-ai bot commented Oct 22, 2024

Reviewer's Guide by Sourcery

This pull request implements a new backend for MinIO in the audbackend library. It adds support for MinIO storage, which is compatible with S3, and includes necessary changes to the existing codebase to integrate the new backend. The implementation follows a similar pattern to the existing Artifactory backend, with additional configuration options specific to MinIO.

Class diagram for the new Minio backend

classDiagram
    class Minio {
        - MinioClient _client
        + Minio(host: str, repository: str, authentication: Tuple[str, str] = None, secure: bool = None, **kwargs)
        + get_authentication(host: str) Tuple[str, str]
        + get_config(host: str) Dict
        + close()
        + _checksum(path: str) str
        + _collapse(path)
        + _copy_file(src_path: str, dst_path: str, verbose: bool)
        + _create()
        + _date(path: str) str
        + _delete()
        + _exists(path: str) bool
        + _get_file(src_path: str, dst_path: str, verbose: bool)
        + _ls(path: str) List[str]
        + _move_file(src_path: str, dst_path: str, verbose: bool)
        + _open()
        + _owner(path: str) str
        + path(path: str) str
        + _put_file(src_path: str, dst_path: str, checksum: str, verbose: bool)
        + _remove_file(path: str)
        + _size(path: str) int
    }
    class Base {
        <<abstract>>
    }
    Minio --|> Base
    note for Minio "This class implements a backend for MinIO storage, compatible with S3."
Loading

File-Level Changes

Change Details Files
Implement MinIO backend class
  • Create new Minio class inheriting from Base
  • Implement authentication and configuration methods
  • Add methods for file operations (put, get, copy, move, delete)
  • Handle large file transfers (>5GB) with a fallback mechanism
audbackend/core/backend/minio.py
Integrate MinIO backend into existing codebase
  • Register MinIO backend in core API
  • Add MinIO import in backend init.py
  • Update test configurations to include MinIO
  • Modify existing tests to accommodate MinIO backend
audbackend/core/api.py
audbackend/backend/__init__.py
tests/conftest.py
tests/test_interface_unversioned.py
tests/test_interface_versioned.py
tests/test_interface_maven.py
Add new tests for MinIO backend
  • Create test_backend_minio.py with comprehensive tests
  • Test authentication, configuration, and file operations
  • Add tests for content type handling and large file copying
tests/test_backend_minio.py
Update existing tests to include MinIO
  • Modify test_api.py to include MinIO in backend tests
  • Update test_interface_unversioned.py, test_interface_versioned.py, and test_interface_maven.py to include MinIO in interface tests
tests/test_api.py
tests/test_interface_unversioned.py
tests/test_interface_versioned.py
tests/test_interface_maven.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hagenw - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Hardcoded MinIO access key found. (link)
  • Hardcoded MinIO secret key found. (link)
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🔴 Security: 2 blocking issues
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

audbackend/core/backend/minio.py Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
audbackend/core/backend/minio.py Outdated Show resolved Hide resolved
audbackend/core/backend/minio.py Outdated Show resolved Hide resolved
audbackend/core/backend/minio.py Show resolved Hide resolved
audbackend/core/backend/minio.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/test_backend_minio.py Outdated Show resolved Hide resolved
tests/test_backend_minio.py Show resolved Hide resolved
hagenw and others added 8 commits October 22, 2024 14:54
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@hagenw
Copy link
Member Author

hagenw commented Oct 22, 2024

I addressed now all comments and updated the description by mentioning our Hetzner S3 test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants