You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pqinsert command should accept products via STDIN in addition to disk files.
Motivation
With distributed and containerized application architectures, sharing a disk across multiple components can be more difficult. I am using the LDM Docker container as a microservice. I added httpexec to the image to allow remote execution of the LDM commands.
However, to execute pqinsert there needs to be a shared volume that client containers can write to and the LDM container can read from. Another approach is a wrapper script in the LDM container that receives a product from httpexec via STDIN, writes it to a local temporary file, and then calls pqinsert for that file. If pqinsert itself could read from STDIN, it could communicate directly with httpexec without the need for a shared Docker volume or intermediate local file.
For some applications, this feature would allow a product to be generated in memory and streamed to LDM without ever being written to disk until being inserted into the queue, which could increase overall performance dramatically.
Implementation
I forked this repo and created a proof of concept of this feature.
API
A filename argument of "-" is interpreted as STDIN instead of a disk path, and is read accordingly. This product will have a key value of "STDIN" unless the -p option is provided (which is recommended in this case). There are no other changes to the current API.
The current implementation allows only one product can be submitted via STDIN. If support for multiple files is essential, a suggestion is to have an option to enable input only from STDIN, and the filename arguments can instead be content lengths that are used to delimit multiple products. Knowing product lengths beforehand would also simplify the implementation, and this information should already be known by the caller.
Internals
STDIN is not guaranteed to act like a disk file, and thus mmap() cannot be used. Instead, the product is read into allocated memory. One limitation of this is that it does not offer the ability to handle large out-of-memory objects like mmap(). An alternative solution would be to stream STDIN to a temporary disk file, but this has a performance penalty.
I have only implemented this for the USE_MMAP code branch, as I could not get the other branch to compile.
The text was updated successfully, but these errors were encountered:
mdklatt
changed the title
Allow pqsinert to read from STDIN
Allow pqinsert to read from STDIN
Jun 3, 2023
Feature Proposal
The
pqinsert
command should accept products viaSTDIN
in addition to disk files.Motivation
With distributed and containerized application architectures, sharing a disk across multiple components can be more difficult. I am using the LDM Docker container as a microservice. I added httpexec to the image to allow remote execution of the LDM commands.
However, to execute
pqinsert
there needs to be a shared volume that client containers can write to and the LDM container can read from. Another approach is a wrapper script in the LDM container that receives a product from httpexec viaSTDIN
, writes it to a local temporary file, and then callspqinsert
for that file. Ifpqinsert
itself could read fromSTDIN
, it could communicate directly with httpexec without the need for a shared Docker volume or intermediate local file.For some applications, this feature would allow a product to be generated in memory and streamed to LDM without ever being written to disk until being inserted into the queue, which could increase overall performance dramatically.
Implementation
I forked this repo and created a proof of concept of this feature.
API
A filename argument of "-" is interpreted as
STDIN
instead of a disk path, and is read accordingly. This product will have a key value of "STDIN" unless the-p
option is provided (which is recommended in this case). There are no other changes to the current API.The current implementation allows only one product can be submitted via
STDIN
. If support for multiple files is essential, a suggestion is to have an option to enable input only fromSTDIN
, and the filename arguments can instead be content lengths that are used to delimit multiple products. Knowing product lengths beforehand would also simplify the implementation, and this information should already be known by the caller.Internals
STDIN
is not guaranteed to act like a disk file, and thusmmap()
cannot be used. Instead, the product is read into allocated memory. One limitation of this is that it does not offer the ability to handle large out-of-memory objects likemmap()
. An alternative solution would be to streamSTDIN
to a temporary disk file, but this has a performance penalty.I have only implemented this for the
USE_MMAP
code branch, as I could not get the other branch to compile.The text was updated successfully, but these errors were encountered: