Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reuse of FileIO object in GlueCatalog for manifest caching to work #11776

Open
2 of 3 tasks
mothukur opened this issue Dec 13, 2024 · 0 comments
Open
2 of 3 tasks
Labels
bug Something isn't working

Comments

@mothukur
Copy link

Apache Iceberg version

1.7.1 (latest release)

Query engine

None

Please describe the bug 🐞

The current GlueCatalog implementation does not allow for the reuse of the FileIO object, leading to inefficient usage of manifest cache implemented in ManifestFiles class.

Problematic Code
The GlueTableOperations class creates a new FileIO object for each instance:
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java#L113

public FileIO io() {
  if (fileIO == null) {
    fileIO = initializeFileIO(this.tableCatalogProperties, this.hadoopConf);
  }
  return fileIO;
}

This prevents the ManifestFiles class from using the cache :
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/ManifestFiles.java#L75

static ContentCache contentCache(FileIO io) {
    return CONTENT_CACHES.get(
        io,
        fileIO ->
            new ContentCache(
                cacheDurationMs(fileIO), cacheTotalBytes(fileIO), cacheMaxContentLength(fileIO)));
}

Proposed Solution
Add a constructor or method to the GlueCatalog class that accepts a FileIO object or a function that builds a FileIO object, similar to JdbcCatalog:
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java#L99

  public JdbcCatalog(
      Function<Map<String, String>, FileIO> ioBuilder,
      Function<Map<String, String>, JdbcClientPool> clientPoolBuilder,
      boolean initializeCatalogTables) {
    this.ioBuilder = ioBuilder;
    this.clientPoolBuilder = clientPoolBuilder;
    this.initializeCatalogTables = initializeCatalogTables;
  }

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@mothukur mothukur added the bug Something isn't working label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant