Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pqinsert to read from STDIN #124

Open
mdklatt opened this issue Jun 3, 2023 · 1 comment
Open

Allow pqinsert to read from STDIN #124

mdklatt opened this issue Jun 3, 2023 · 1 comment

Comments

@mdklatt
Copy link

mdklatt commented Jun 3, 2023

Feature Proposal

The pqinsert command should accept products via STDIN in addition to disk files.

Motivation

With distributed and containerized application architectures, sharing a disk across multiple components can be more difficult. I am using the LDM Docker container as a microservice. I added httpexec to the image to allow remote execution of the LDM commands.

However, to execute pqinsert there needs to be a shared volume that client containers can write to and the LDM container can read from. Another approach is a wrapper script in the LDM container that receives a product from httpexec via STDIN, writes it to a local temporary file, and then calls pqinsert for that file. If pqinsert itself could read from STDIN, it could communicate directly with httpexec without the need for a shared Docker volume or intermediate local file.

For some applications, this feature would allow a product to be generated in memory and streamed to LDM without ever being written to disk until being inserted into the queue, which could increase overall performance dramatically.

Implementation

I forked this repo and created a proof of concept of this feature.

API

A filename argument of "-" is interpreted as STDIN instead of a disk path, and is read accordingly. This product will have a key value of "STDIN" unless the -p option is provided (which is recommended in this case). There are no other changes to the current API.

The current implementation allows only one product can be submitted via STDIN. If support for multiple files is essential, a suggestion is to have an option to enable input only from STDIN, and the filename arguments can instead be content lengths that are used to delimit multiple products. Knowing product lengths beforehand would also simplify the implementation, and this information should already be known by the caller.

Internals

STDIN is not guaranteed to act like a disk file, and thus mmap() cannot be used. Instead, the product is read into allocated memory. One limitation of this is that it does not offer the ability to handle large out-of-memory objects like mmap(). An alternative solution would be to stream STDIN to a temporary disk file, but this has a performance penalty.

I have only implemented this for the USE_MMAP code branch, as I could not get the other branch to compile.

@mdklatt mdklatt changed the title Allow pqsinert to read from STDIN Allow pqinsert to read from STDIN Jun 3, 2023
@semmerson
Copy link
Collaborator

Got it. We'll get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants