-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Parsing of RPM Metadata without loading cpio archive into memory #23
Comments
hugely in favour of this. The only extension I would add, is introducing an Those two could then be combined (if desired) into Not sure how signing fits into this though, at what point is the signature checked? And if the two signatures are going to be checked separately? A great advantage would be that this approach could easily be modeled as But this also leaves the question when to check the signature. Not sure if we need random access to individual files, or how doable that would be.
I am not so favourable with this, it hides quite a bit and it becomes non obvious when signatures are verified, which is a key in trust and processing chain and should be very explicit. To make this workable, one would also require a |
To make sure we have the same discussion basis (and also for general documentation purposes of RPM), it might make sense to have a common a ( |
I am also in favour of the first solution since it is the simplest. Creating an extra type for the archive itself could be beneficial. The signature can only be checked if all bytes have been processed since the signature always spans both over metadata and archive. let metadata = RPMPackageMetadata::parse(input)?;
//input has now advanced to the archive
metadata.write(out)?;
// the tee ensures that processed bytes are written immediately
let tee = tee(out,input);
let verifier = Verifier::new(key)?;
if let Err(e) = verifier.verify_reader(&metadata,tee) {
// undo out here
} This way, all bytes processed during the verification process are directly written to the final destination.
Maybe we could provide a function to remove some of the boilerplate involved: let metadata = rpm::RPMPackageMetadata::parse(input)?;
let out = create_out_based_on_metadata(&metadata)?; // not part of rpm-rs
let verifier = rpm::signature::PGPVerifier::new(key)?;
if let Err(e) = rpm::process_and_verify(metadata,input,out,verifier)?; {
// undo out here
} This leaves only the problem that the metadata itself might be large enough to cause OOM problems. But IMO this is more a theoretical problem. Also it is good practice to to check/limit the size of untrusted or unknown streams. |
Not sure there is really a point in resetting the
I would add a limiter configuration struct, to prevent oversized rpms. |
Creating and cleaning I am sure that library users will often need some information found in the let header = metadata.header;
let rpm_path = format!("{}-{}-{}.rpm",header.get_name()?,header.get_version()?,header.get_revision()?);
let out = fs::create(rpm_path)? Sadly, I do not see another solution than using a rpm::process_and_verify(metadata,input,out,verifier)?;
// or
rpm::write_and_verify(metadata,input,out,verifier)?;
// or whatever name may fit better for this. is our best bet to make this happen without unnecessary overhead. I have some problems with the proposed API:
and
give the impression that the archive gets verified individually and to do so, it needs a let (metadata, input) = rpm::RPMPackageMetadata::parse(input)?;
let archive = rpm::RPMArchive::parse(metadata,input)?; // at this point, the content is read to memory anyway
let verifier = rpm::signature::PGPVerifier::new(key)?;
let pkg = rpm::RPMPackage::new(metadata,archive);
pkg.verify(verifier)?; which is basically what we have at the moment but with more checks regarding the archive itself and more control over the individual steps of processing. |
Okay, I just thought about how to make this more barable. pub trait ProcessVerifier: io::Writer {
fn verify(metadata: &RPMPacakgeMetadata) -> Result<(),RPMError> // gets called when input is completely consumed
}
impl ProcessVerifier for PGPVerifier {
//...
}
let pgp_verifier = rpm::signature::PGPVerifier(key);
let custom_verifier = my_custom_verifier();
let processor = RPMProcessor::new(metadata,input)
.add_verifier(pgp_verifier)
.add_verifier(custom_verifier)
.add_destination(out)
.add_destination(another_location);
if let Err(e) = processor.process() {
// handle error accordingly
} creating a custom |
I think that's the most ergonomic API so far, lets go for it! The only thing I do not quite like is that a lot of details are hidden and now we require another |
Okay, I'll implement this.
Yeah, this stinks but you are right. Do you have time to create a PR for rgpg? I am not quite sure why this is not higher on their priority list since this is IMHO a core feature to make rgpg viable for server deployments. |
This comment has been minimized.
This comment has been minimized.
rpgp/rpgp#106 first attempt to introduce the |
Currently, you have to read the complete RPM into memory in order to access its metadata. I've encountered some workflows where the package content is completely irrelevant for further processing.
Something like this:
In this case, reading the complete body in a byte buffer is a waste of memory.
Not only that, since RPM's can potentially be multiple gigabyte in size, this also limits the usage of this library in memory sensitive deployments.
I currently see two solutions for this problem:
RPMPackageMetadata::parse
andRPMPackageMetadata::write
public methods and be done with it.This leaves the opportunity to read the
Reader
until the metadata object is created and decide for your self what you want to do with the actual CPIO archive. The downside to this is that we are widening the API surface which might or might not bite us later.Coming back to the original example, the code would look like this:
RPMPackage::content
fromVec<u8>
to a generic reader. Something like this:This could be a simple cursor when creating the RPM. The issue I see with this is the confusing semantics in combination with
RPMPackage::sign
andRPMPackage::verify
since both of them would need to consume the reader without any obvious indication. Since most network requests are notSeek
able, it would not be good to narrowT
toRead + Seek
in general. One way around this could be to create a specialimpl
forRead + Seek
that features bothsign
andverify
methods and leave them out otherwise.@drahnr , do you have any thoughts on this?
The text was updated successfully, but these errors were encountered: