-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error: "numeric field was not a number: t Regard when getting cksum for ...." #313
Comments
Found the problem, I'll submit MR tomorrow. |
Fix the error: "numeric field was not a number: t 'Regard' when getting cksum for 'a text ...'. The error is caused by wrongly setting the `self.next` position due to accounting harlinks size. According to man 5 tar, the size of the hard-links should be set to 0. size Size of file, as octal number in ASCII. For regular files only, this indicates the amount of data that follows the header. In particular, this field was ignored by early tar implementations when extracting hardlinks. Modern writers should always store a zero length for hardlink entries. But since the writer wasn't *modern*, the entry_size is 64700 which causes miscalculation of the `self.next`. [tar-rs/src/archive.rs:372] &entry.header() = UstarHeader { entry_size: 64700, size: 64700, path: "some/path", link_name: Some( "some/link", ), mode: 0o640, uid: 1058, gid: 1061, mtime: 1673424346, username: Some( "example", ), groupname: Some( "example", ), device_major: Some( 9, ), device_minor: Some( 2, ), cksum: 24700, cksum_valid: true, } [tar-rs/src/archive.rs:373] entry.header().entry_type() = Link Closes: alexcrichton#313 Signed-off-by: Nikola Pajkovsky <[email protected]>
I am running into a similar error while trying to read tar.gz archives generated by Cargo (.crate file), however, it seems to occur randomly, and I haven't been able to pin down what conditions trigger it. here is a simplified code example to explain: #[test]
fn check_extract_cargo_dot_toml_from_krate_file() {
const DATA: &[u8] = include_bytes!("path/to/raw-cargo-publish-http-request-body");
let krate_file: &[u8] = // code to split .crate file from json metadata in request body
let cargo_dot_toml = extract_cargo_dot_toml(krate_file).unwrap();
// ..
}
// simplified version to demonstrate usage of tar crate api
fn extract_cargo_dot_toml(krate_file: &[u8]) -> Result<_, _> {
let decoder = flate2::read::GzDecoder::new(krate_file);
let mut archive = tar::Archive::new(decoder);
for entry in archive.entries()? {
let mut entry = entry?;
let path = entry.path()?;
if path.ends_with("Cargo.toml.orig") {
// store .read_to_string(..) output in `manifest`
} else if path.ends_with(".cargo_vcs_info.json") {
// store .read_to_string(..) output in `vcs_info`
}
if manifest.is_some() && vcs_info.is_some() {
break
}
}
// ..
} The vast majority of the time, this code works fine. Once in a while, I get
Strangely, it seems to correlate with running the tests with a slightly different set of command line flags, i.e. I have tried using a fuzzer to produce a self-contained example, but all that produced so far is an example that always fails, including using the tar binary and the code in the PR discussed here (it does fail with a similar error). I would love to be able to identify the cause of this. Can you think of any theory for why the error would show up sporadically this way? Update: following the code in
relevant code in impl Header {
pub fn cksum(&self) -> io::Result<u32> {
octal_from(&self.as_old().cksum)
.map(|u| u as u32)
.map_err(|err| {
io::Error::new(
err.kind(),
format!("{} when getting cksum for {}", err, self.path_lossy()),
)
})
}
pub fn as_old(&self) -> &OldHeader {
unsafe { cast(self) }
}
// ..
}
fn octal_from(slice: &[u8]) -> io::Result<u64> {
let trun = truncate(slice);
let num = match str::from_utf8(trun) {
Ok(n) => n,
Err(_) => {
return Err(other(&format!(
"numeric field did not have utf-8 text: {}",
String::from_utf8_lossy(trun)
)));
}
};
match u64::from_str_radix(num.trim(), 8) {
Ok(n) => Ok(n),
Err(_) => Err(other(&format!("numeric field was not a number: {}, (input slice = {:?})", num, slice))),
}
} I know nothing about the tar format, so not sure what to make of this... $ evcxr
>> let input: &[u8] = &[0, 0, 49, 51, 49, 51, 51, 0][..];
>> std::str::from_utf8(input)
Ok("\0\013133\0")
>> std::str::from_utf8(input).unwrap().trim()
"\0\013133\0"
>> u64::from_str_radix(std::str::from_utf8(input).unwrap().trim(), 8)
Err(ParseIntError { kind: InvalidDigit })
>> u64::from_str_radix("13133", 8)
Ok(5723) Update 2: I only get the error when |
A user was running into issues with `grafbase deploy` yesterday, where deployments failed with a bizzare error very similar to [this one][1]. We eventually tracked this down to files in the users `.direnv` or `.devenv` folders - based on the tar-rs issue _probably_ hard links of some description. While it would be good to not choke on hardlinks, I'd argue we shouldn't have been archiving these files in the first place: they're clearly meant to stay local to the users machine. So this PR switches `walkdir` for `ignore`, which will respect `.gitignore` files etc, providing users with a sensible (and in most cases zero effort) way to exclude files from the archive we build on `grafbase deploy`. I think this is a sensible default. This _could_ have ramifications for anyone doing a `grafbase deploy` after some kind of local build step, though I don't know how likely that is really. I suggest we try it and see - if anyone complains we can revisit, maybe adding a CLI flag to change the behaviour. [1]: alexcrichton/tar-rs#313
A user was running into issues with `grafbase deploy` yesterday, where deployments failed with a bizzare error very similar to [this one][1]. We eventually tracked this down to files in the users `.direnv` or `.devenv` folders - based on the tar-rs issue _probably_ hard links of some description. While it would be good to not choke on hardlinks, I'd argue we shouldn't have been archiving these files in the first place: they're clearly meant to stay local to the users machine. So this PR switches `walkdir` for `ignore`, which will respect `.gitignore` files etc, providing users with a sensible (and in most cases zero effort) way to exclude files from the archive we build on `grafbase deploy`. I think this is a sensible default. This _could_ have ramifications for anyone doing a `grafbase deploy` after some kind of local build step, though I don't know how likely that is really. I suggest we try it and see - if anyone complains we can revisit, maybe adding a CLI flag to change the behaviour. Fixes GB-6107 [1]: alexcrichton/tar-rs#313
Hello,
I have a tarball that I want to read
But I've got an error
It looks like that it starts reading a header from a wrong position.
My code looks like this
And with a couple of dbg!() I've got this.
Any ideas what could go wrong?
The text was updated successfully, but these errors were encountered: