Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update solr-ocrhighlighting and make SOLR_HOCR_PLUGIN_PATH available to php-fpm #345

Merged
merged 3 commits into from
Jul 30, 2024

Conversation

joecorall
Copy link
Contributor

@joecorall joecorall commented Jul 30, 2024

Addendum to #338

First, bump solr-ocrhighlighting to latest version

Then, in order to add the necessary solr configuration for hOCR highlighting, the SOLR_HOCR_PLUGIN_PATH environment variable needs to be set and available in php-fpm. Without passing this environment variable into php-fpm's www conf, the php-fpm process is not able to read the environment variable resulting in a solrconfig_extra.xml with

<lib dir="" regex=".*\.jar" />

By passing the environment variable into the php-fpm process, downloading the solr config XML through Drupal's UI results in

<lib dir="/opt/solr/server/solr/contrib/ocrhighlighting/lib" regex=".*\.jar" />

Related links

@joecorall joecorall changed the title Make SOLR_HOCR_PLUGIN_PATH available to php-fpm Update solr-ocrhighlighting and make SOLR_HOCR_PLUGIN_PATH available to php-fpm Jul 30, 2024
@joecorall joecorall marked this pull request as ready for review July 30, 2024 13:22
@@ -436,3 +436,5 @@ clear_env = yes
;php_admin_value[error_log] = /var/log/php83/$pool.error.log
;php_admin_flag[log_errors] = on
;php_admin_value[memory_limit] = 32M

env['SOLR_HOCR_PLUGIN_PATH'] = "{{ getenv "SOLR_HOCR_PLUGIN_PATH" }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed, see the comment on nginx/Dockerfile.

@@ -89,6 +89,7 @@ ENV \
PHP_POST_MAX_SIZE=128M \
PHP_PROCESS_CONTROL_TIMEOUT=60 \
PHP_REQUEST_TERMINATE_TIMEOUT=60 \
PHP_UPLOAD_MAX_FILESIZE=128M
PHP_UPLOAD_MAX_FILESIZE=128M \
SOLR_HOCR_PLUGIN_PATH=/opt/solr/server/solr/contrib/ocrhighlighting/lib
Copy link
Contributor

@nigelgbanks nigelgbanks Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the nginximage is used by quite a few down stream images, I think this should go in the Drupal image, since it is the only one to make use of this environment variable.

It's a shame that this isn't part of the site configuration, as is typical. This makes it work differently than every other environment variable used to configure Drupal modules.

Normally, it should go here, and follow the existing conventions for supporting multi-sites, i.e. DRUPAL_DEFAULT_ SOLR_HOCR_PLUGIN_PATH. Then it would be used to configure each site.

ENV \
[email protected] \
DRUPAL_DEFAULT_ACCOUNT_NAME=admin \
DRUPAL_DEFAULT_ACCOUNT_PASSWORD=password \
DRUPAL_DEFAULT_BROKER_HOST=activemq \
DRUPAL_DEFAULT_BROKER_PORT=61613 \
DRUPAL_DEFAULT_BROKER_URL=tcp://activemq:61613 \
DRUPAL_DEFAULT_BROKER_WEB_ADMIN_PASSWORD=password \
DRUPAL_DEFAULT_BROKER_WEB_ADMIN_USER=admin \
DRUPAL_DEFAULT_BROKER_WEB_PORT=8161 \
DRUPAL_DEFAULT_CANTALOUPE_URL=https://islandora.traefik.me/cantaloupe/iiif/2 \
DRUPAL_DEFAULT_CONFIGDIR=/var/www/drupal/config/sync \
DRUPAL_DEFAULT_DB_NAME=drupal_default \
DRUPAL_DEFAULT_DB_PASSWORD=password \
DRUPAL_DEFAULT_DB_USER=drupal_default \
[email protected] \
DRUPAL_DEFAULT_FCREPO_HOST=islandora.traefik.me \
DRUPAL_DEFAULT_FCREPO_PORT=8081 \
DRUPAL_DEFAULT_FITS_HOST=fits \
DRUPAL_DEFAULT_FITS_PORT=8080 \
DRUPAL_DEFAULT_INSTALL_EXISTING_CONFIG=false \
DRUPAL_DEFAULT_INSTALL=true \
DRUPAL_DEFAULT_LOCALE=en \
DRUPAL_DEFAULT_MATOMO_URL_HTTP=http://islandora.traefik.me/matomo/ \
DRUPAL_DEFAULT_MATOMO_URL_HTTPS=https://islandora.traefik.me/matomo/ \
DRUPAL_DEFAULT_NAME=Default \
DRUPAL_DEFAULT_PROFILE=standard \
DRUPAL_DEFAULT_SALT=9PPaL0CxZAIcq0l9wxgDGlCZrp7JdT_x7v9gVzpdbUjMt1PqDz3uD0Zy-i16DuJ1-Htuq5hqeg \
DRUPAL_DEFAULT_SITE_URL=https://islandora.traefik.me \
DRUPAL_DEFAULT_SOLR_CORE=ISLANDORA \
DRUPAL_DEFAULT_SOLR_HOST=solr \
DRUPAL_DEFAULT_SOLR_PORT=8983 \
DRUPAL_DEFAULT_SUBDIR=default \
DRUPAL_DEFAULT_TRIPLESTORE_HOST=blazegraph \
DRUPAL_DEFAULT_TRIPLESTORE_NAMESPACE=islandora \
DRUPAL_DEFAULT_TRIPLESTORE_PORT=8080 \
DRUPAL_ENABLE_HTTPS=true \
DRUPAL_REVERSE_PROXY_IPS= \
DRUPAL_SITES=DEFAULT

But that won't work in this case, as it needs to be exposed to the php-fpm as an environment variable rather than a Drupal configuration override.

So what we should do is create a file drupal/rootfs/etc/php83/php-fpm.d/solr.conf, that contains:

# Configuration for https://github.com/discoverygarden/islandora_hocr
env['SOLR_HOCR_PLUGIN_PATH'] = "/opt/solr/server/solr/contrib/ocrhighlighting/lib"

Using the full path rather than a docker image environment variable is fine since it's not configurable in the solr image either, so it does not need to vary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env var needs to be available in php-fpm, not nginx. While it's not ideal, putting it in the base nginx/php container is the easiest to get it available to Drupal. Otherwise we need to override the entire www.conf for drupal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joecorall sorry I misunderstood initially, I've updated the comment with a solution for php-fpm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we could just have the Solr image always load the plugin, regardless of the solrconfig_extra.xml that would probably be ideal, since this is the end goal anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that add that directive to the www php-fpm pool? I'm trying to read the docs but they're a little sparse around this topic. I'll just try it locally and will make the change if it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do, this is what I see inside the container.

1d8b54aff6fb:/etc/php83# rg php-fpm.d php-fpm.conf 
143:include=/etc/php83/php-fpm.d/*.conf

Copy link
Contributor Author

@joecorall joecorall Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we could just have the Solr image always load the plugin, regardless of the solrconfig_extra.xml that would probably be ideal, since this is the end goal anyway.

Those solr config files are created dynamically by Drupal, based on the search_api_solr module version. It makes managing this pretty difficult, and why we're stuck with this sort of hacky way to get the proper CONF set in solr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do, this is what I see inside the container.

But php-fpm confs define pools, and the env vars need to go into the pool. Our pool is set in www.conf. Adding the additional file with env directive gives

2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: [/etc/php83/php-fpm.d/solr.conf:1] Array are not allowed in the global section
2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: Unable to include /etc/php83/php-fpm.d/solr.conf from /etc/php83/php-fpm.conf at line 1
2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: failed to load configuration file '/etc/php83/php-fpm.conf'
2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: FPM initialization failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was wrong. Sorry for wasting your time. I'm going to merge this even though it's not passing because we're still experiencing terrible rate-limiting problems. I'll keep an eye on the release and re-run the build in 6+ hours when our rate limit resets. After mid-August, we should no longer have to deal with the rate limits.

@nigelgbanks nigelgbanks merged commit adcbc1d into main Jul 30, 2024
75 of 76 checks passed
@nigelgbanks nigelgbanks deleted the solr-hocr branch July 30, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants