Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update solr-ocrhighlighting and make SOLR_HOCR_PLUGIN_PATH available to php-fpm #345

Merged
merged 3 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion nginx/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ ENV \
PHP_POST_MAX_SIZE=128M \
PHP_PROCESS_CONTROL_TIMEOUT=60 \
PHP_REQUEST_TERMINATE_TIMEOUT=60 \
PHP_UPLOAD_MAX_FILESIZE=128M
PHP_UPLOAD_MAX_FILESIZE=128M \
SOLR_HOCR_PLUGIN_PATH=/opt/solr/server/solr/contrib/ocrhighlighting/lib
Copy link
Contributor

@nigelgbanks nigelgbanks Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the nginximage is used by quite a few down stream images, I think this should go in the Drupal image, since it is the only one to make use of this environment variable.

It's a shame that this isn't part of the site configuration, as is typical. This makes it work differently than every other environment variable used to configure Drupal modules.

Normally, it should go here, and follow the existing conventions for supporting multi-sites, i.e. DRUPAL_DEFAULT_ SOLR_HOCR_PLUGIN_PATH. Then it would be used to configure each site.

ENV \
[email protected] \
DRUPAL_DEFAULT_ACCOUNT_NAME=admin \
DRUPAL_DEFAULT_ACCOUNT_PASSWORD=password \
DRUPAL_DEFAULT_BROKER_HOST=activemq \
DRUPAL_DEFAULT_BROKER_PORT=61613 \
DRUPAL_DEFAULT_BROKER_URL=tcp://activemq:61613 \
DRUPAL_DEFAULT_BROKER_WEB_ADMIN_PASSWORD=password \
DRUPAL_DEFAULT_BROKER_WEB_ADMIN_USER=admin \
DRUPAL_DEFAULT_BROKER_WEB_PORT=8161 \
DRUPAL_DEFAULT_CANTALOUPE_URL=https://islandora.traefik.me/cantaloupe/iiif/2 \
DRUPAL_DEFAULT_CONFIGDIR=/var/www/drupal/config/sync \
DRUPAL_DEFAULT_DB_NAME=drupal_default \
DRUPAL_DEFAULT_DB_PASSWORD=password \
DRUPAL_DEFAULT_DB_USER=drupal_default \
[email protected] \
DRUPAL_DEFAULT_FCREPO_HOST=islandora.traefik.me \
DRUPAL_DEFAULT_FCREPO_PORT=8081 \
DRUPAL_DEFAULT_FITS_HOST=fits \
DRUPAL_DEFAULT_FITS_PORT=8080 \
DRUPAL_DEFAULT_INSTALL_EXISTING_CONFIG=false \
DRUPAL_DEFAULT_INSTALL=true \
DRUPAL_DEFAULT_LOCALE=en \
DRUPAL_DEFAULT_MATOMO_URL_HTTP=http://islandora.traefik.me/matomo/ \
DRUPAL_DEFAULT_MATOMO_URL_HTTPS=https://islandora.traefik.me/matomo/ \
DRUPAL_DEFAULT_NAME=Default \
DRUPAL_DEFAULT_PROFILE=standard \
DRUPAL_DEFAULT_SALT=9PPaL0CxZAIcq0l9wxgDGlCZrp7JdT_x7v9gVzpdbUjMt1PqDz3uD0Zy-i16DuJ1-Htuq5hqeg \
DRUPAL_DEFAULT_SITE_URL=https://islandora.traefik.me \
DRUPAL_DEFAULT_SOLR_CORE=ISLANDORA \
DRUPAL_DEFAULT_SOLR_HOST=solr \
DRUPAL_DEFAULT_SOLR_PORT=8983 \
DRUPAL_DEFAULT_SUBDIR=default \
DRUPAL_DEFAULT_TRIPLESTORE_HOST=blazegraph \
DRUPAL_DEFAULT_TRIPLESTORE_NAMESPACE=islandora \
DRUPAL_DEFAULT_TRIPLESTORE_PORT=8080 \
DRUPAL_ENABLE_HTTPS=true \
DRUPAL_REVERSE_PROXY_IPS= \
DRUPAL_SITES=DEFAULT

But that won't work in this case, as it needs to be exposed to the php-fpm as an environment variable rather than a Drupal configuration override.

So what we should do is create a file drupal/rootfs/etc/php83/php-fpm.d/solr.conf, that contains:

# Configuration for https://github.com/discoverygarden/islandora_hocr
env['SOLR_HOCR_PLUGIN_PATH'] = "/opt/solr/server/solr/contrib/ocrhighlighting/lib"

Using the full path rather than a docker image environment variable is fine since it's not configurable in the solr image either, so it does not need to vary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env var needs to be available in php-fpm, not nginx. While it's not ideal, putting it in the base nginx/php container is the easiest to get it available to Drupal. Otherwise we need to override the entire www.conf for drupal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joecorall sorry I misunderstood initially, I've updated the comment with a solution for php-fpm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we could just have the Solr image always load the plugin, regardless of the solrconfig_extra.xml that would probably be ideal, since this is the end goal anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that add that directive to the www php-fpm pool? I'm trying to read the docs but they're a little sparse around this topic. I'll just try it locally and will make the change if it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do, this is what I see inside the container.

1d8b54aff6fb:/etc/php83# rg php-fpm.d php-fpm.conf 
143:include=/etc/php83/php-fpm.d/*.conf

Copy link
Contributor Author

@joecorall joecorall Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we could just have the Solr image always load the plugin, regardless of the solrconfig_extra.xml that would probably be ideal, since this is the end goal anyway.

Those solr config files are created dynamically by Drupal, based on the search_api_solr module version. It makes managing this pretty difficult, and why we're stuck with this sort of hacky way to get the proper CONF set in solr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do, this is what I see inside the container.

But php-fpm confs define pools, and the env vars need to go into the pool. Our pool is set in www.conf. Adding the additional file with env directive gives

2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: [/etc/php83/php-fpm.d/solr.conf:1] Array are not allowed in the global section
2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: Unable to include /etc/php83/php-fpm.d/solr.conf from /etc/php83/php-fpm.conf at line 1
2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: failed to load configuration file '/etc/php83/php-fpm.conf'
2024-07-30 11:27:55 [30-Jul-2024 15:27:55] ERROR: FPM initialization failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was wrong. Sorry for wasting your time. I'm going to merge this even though it's not passing because we're still experiencing terrible rate-limiting problems. I'll keep an eye on the release and re-run the build in 6+ hours when our rate limit resets. After mid-August, we should no longer have to deal with the rate limits.


COPY --link rootfs /
2 changes: 2 additions & 0 deletions nginx/rootfs/etc/confd/templates/www.conf.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -436,3 +436,5 @@ clear_env = yes
;php_admin_value[error_log] = /var/log/php83/$pool.error.log
;php_admin_flag[log_errors] = on
;php_admin_value[memory_limit] = 32M

env['SOLR_HOCR_PLUGIN_PATH'] = "{{ getenv "SOLR_HOCR_PLUGIN_PATH" }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed, see the comment on nginx/Dockerfile.

4 changes: 2 additions & 2 deletions solr/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ ARG SOLR_VERSION=9.5.0
ARG SOLR_FILE=solr-${SOLR_VERSION}.tgz
ARG SOLR_URL=https://archive.apache.org/dist/solr/solr/${SOLR_VERSION}/solr-${SOLR_VERSION}.tgz
ARG SOLR_FILE_SHA256=d8538502019af1945e0b124a4613b46ca43aedcf3f20e9912c482c080407ea21
ARG OCRHIGHLIGHT_VERSION=0.8.6
ARG OCRHIGHLIGHT_VERSION=0.9.0
ARG OCRHIGHLIGHT_FILE=solr-ocrhighlighting-${OCRHIGHLIGHT_VERSION}.jar
ARG OCRHIGHLIGHT_URL=https://github.com/dbmdz/solr-ocrhighlighting/releases/download/${OCRHIGHLIGHT_VERSION}/solr-ocrhighlighting-${OCRHIGHLIGHT_VERSION}.jar
ARG OCRHIGHLIGHT_FILE_SHA256=3cf22d554003347de5486a1e2b6b624759495122a5b35fef9d8306eeb5e14f61
ARG OCRHIGHLIGHT_FILE_SHA256=79eb7374989359c74903daefbe61f7feb9aeb7367ee6f7e1361fe8b911d2fa82
ARG OCRHIGHLIGHT_DEST=/opt/solr/server/solr/contrib/ocrhighlighting/lib

EXPOSE 8983
Expand Down
Loading