-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion of Par3 Specification #1
Comments
I found a typo in the spec, line 578;
This may be "Table: Contents of a chunk description". I have a question about how to map input files into input blocks. Is there any rule in the order of mapping input files ? PAR2 spec has definition of files' order by FileIDs on Main packet. In PAR3 spec, Root packet has a list of File packets and their order is defined. But, it seems that there may be no relation to mapping of input files into input blocks. |
No, there is no relation on mapping input files to input blocks. I wanted that freedom, in case multiple threads were processing files separately. It is best to use low numbers for the input blocks. E.g., if you use 50 input blocks, they should be numbered 0 to 49. (Or something like that.) Using low numbers leaves room for incremental backups. |
I saw an interesting point in the spec, line 347 under Start Packet Body Contents;
The Galois field size is 2, when 16-bit Reed-Solomon Codes like PAR2. The Galois field size is 1, when 8-bit Reed-Solomon Codes like PAR1. Currently, most programing languages support 64-bit integer (8-bytes). Recent CPU's SIMD supports 128-bit, 256-bit or 512-bit (16, 32, or 64-bytes). But, won't 2040-bit Galois field size (255-bytes) be enough for PAR3 ? I felt that using 1-byte for the item might be enough. Also, when only XOR is used like a simple parity or LDPC, may its Galois field size be stored as 0 ? At that time, is there no generator item in the Start packet ? |
Hmmm.... for some reason, I thought SIMD 512 was 512 bytes, not bits. Yes, I think 1 byte is enough. Change made. The attached MarkDown file has the fix. For the moment, I left out support for XOR. First, I feel like there should be a generator polynomial that acts exactly like XOR, but I haven't taken time to look into it. Second, I wondered how much it would be used. Par3 operates on bytes and even using random 1-byte Galois Fields factors will significantly improve an LDPC's recovery rate. Lastly, it isn't hard to add support for it later, if we want. |
I remembered why we don't need XOR: all Galois Field addition is XOR. We can create a recovery block using only XOR by making all the factors in the Code Matrix equal to 1. And that works for any Galois Field of any size. If a client wants fast LDPC, they can just do the optimization that multiplying any input block by a factor of 1 is just the input block itself. |
While I was thinking how to put input files on input blocks, I found some interesting points. Basically PAR3 is more free than PAR2, and the construction may depend on PAR3 client's developers. It's too complex to be a headache. As compared to PAR2, PAR3 is difficult (or almost impossible) to predict the number of input blocks. In PAR2, there is a strong relation between block size and number of input blocks. When a user set a block size in a PAR2 client, PAR2 can calculate the number of resulting input blocks for given input files. Also, when user set a preferable block count, PAR2 can suggest a reasonable block size for the given input files. In PAR3, new features (Packing of tail chunks and Deduplication) may decrease the number of actual input blocks than its pre-calculated number. So, difference between a user's specified block count and resulting block count will become larger. There may be a range of Block size. Because it's stored as 64-bit integer (8-bytes), the format's maximum Block size is 2^64 - 1. The max size is limited by total size of input files, too. When all input files are put in one input block, the Block size is same as total file size. Setting larger block size than total file size is worthless. If packing feature isn't used, the max Block size may be same as the largest file size. About minimum block size, setting less size than 8 is worthless. When block size is 1~8 bytes, their checksums (CRC-64) can restore each input block without any recovery blocks. Thus, it may be good to mention that 8 is the minimum value of Block size. I came up with another case of when recovery blocks are not required. When all input files are equal or less size than 39-bytes, they can be put in tail chunks by setting block size to be 40-bytes. In this case, there is no input blocks at all, because each input file is stored as raw data in File Packets. When some files are equal or larger than 40-bytes and other files are equal or less than 39-bytes, by setting Block size to be 40-bytes or larger, only small files can be restored without recovery blocks. Developers should be careful in treating those small files (equal or less than 39-bytes), because PAR2 didn't have such feature. I have a question about External Data Packet. This packet contains checksums of input blocks. Does it require checksums of all input blocks ? When an input block consists of single or multiple tail chunks, their checksums are stored in each File Packet already. If it's possible to ignore checksums of such input blocks for tail chunks, the number of External Data Packets may be same as the total number of chunks in File Packets. Though it's possible to store checksums of all input blocks, duplicated checksums may not be useful. Furthermore, when Block size is less than 24-bytes, checksums in External Data Packets become larger than original data of input files. In this case, it may be good to store raw data of input blocks as same as small tail chunk. Or, it may be good to suggest larger Block size than 24-bytes for efficiency. Anyway, users won't set so small Block size mostly. |
Yes, it is harder to determine the number of input blocks. But I'm not sure that users care about that. I think they care more about how small a piece of damage can be repaired and how much recovery data there is. Par3 is not good for very small blocks. There is too much overhead. I would be surprised if anyone wants to use blocksizes below 1024 bytes, because of the overhead. Personally, I would want blocks at least 2400 bytes, so because the overhead is at least 24-bytes per block and I'd want the overhead to be less than 1%. I don't know that every 8-byte value has a unique CRC-64-ISO. It's possible, but I don't know that it is guaranteed. Yes, large collections of tiny files would result in everything being stored in the tail data of the File packets. But I don't see a way around that ... other than zipping or tarring the files before using Par3. The External Data packet is for a sequence of input blocks. The specification does NOT require that every input block has to be covered by a checksum in an External Data packet. It is up to the client to decide if they have enough information to do the recovery. I suppose a client author could try to deduplicate checksums, but I don't think there is much data to be saved by doing that. As I said before, the minimum overhead is 24-bytes per block and around 131-bytes per file. So, it is not suited to small blocks or many very tiny files. If users want the overhead to be less than 1%, we're talking block sizes at least 4kb. |
While I tried to get file information by C-runtime library, I might find a possible problem in File System Specific Packets (UNIX Permissions Packet and FAT Permissions Packet). There are some documents under "File System Specific Packets" section in PAR3 spec. A PAR3 client may write favorite or safe items on a packet. Another PAR3 client may read favorite or safe items from a packet and will adopt the attributes or permission on a file. For example, storing and restoring timestamp is safe and may be useful. Now, each item has 2 states; written value or zero. When a PAR3 client didn't write an item to eliminate a risk (or could not get the information), the item became zero. (PAR3 spec must fill zero for un-used bytes.) When another PAR3 client will read the item as zero bytes, it cannot distinguish the item value was originally zero or non-written item. This may be a fault, because zero has meaning as attributes. For example, there is "i_mode" item in UNIX Permissions Packet. The item has bit, which is non-zero for read or write permission. When the value is zero, it means that all users have no read/write permission. Another PAR3 client may not determine that the file should have permission or not. For example, there is "FileAttributes" item in FAT Permissions Packet. The item has a bit for "Archive" attribute. This bit is non-zero normally after creation or modification. When the value is zero, it means that it needs to erase the attribute, which was set by default. (Because this attribute isn't used nor refered mostly, there may be no problem.) Thus, there should be a bit flag, which indicates the item was written or not. For example, setting either bit flag of Directory or File would be good. Both "i_mode" and "FileAttributes" items have these two bits. Because either bit flag is non-zero always, the written item cannot become zero. When a PAR3 client reads that an item is zero, it will indicate that another PAR3 client didn't write the item. By the way, I felt that 1-byte might be enough for "FileAttributes" item in FAT Permissions Packet. There are many unused bits in the field. It's strange to allocate larger field size than required size. Anyway, it's impossible to use raw bit data from Win32API GetFileAttributes. So, I think that you don't need to keep original bit alignment. A programmer will be able to edit bit flag with ease. |
I'm not sure what you mean by "favorite or safe". I think you mean that a client might write some piece of data in the UNIX or FAT Permissions packet, but not all of them. For example, might write timestamps, but not the UID or GID. The GNU C library supports all of the UNIX Permission packet fields, except for "xattr". https://www.gnu.org/software/libc/manual/html_node/File-Attributes.html I assumed that xattr could be left empty. I assumed that any Windows client would support all the FAT Permissions packet fields. The permission packets were not required, so any Java or Javascript client can just ignore them. If a client wants to fill in some but not all parts of a Permissions packet, I figured that they could find reasonable default values:
Yes, if the encoding client writes those default values into the Permissions packet, the decoding client needs to respect them, if possible. But, then again, if they were not written into the file, the decoding client would need to make something up anyway. Perhaps I should add that to the specification. Particularly about the UID/GID set to MAX_INT and usernames set to empty string As for the size of the FAT Attributes field, I think it's okay to use 1 more byte to make it had to screw up. (The per-file overhead is about 130 bytes, so adding 1 more isn't too expensive.) Those 2 bytes are returned by GetFileAttributesA and accepted by SetFileAttributesA. Actually, those functions return/expect 4 bytes, but I thought only 2 bytes were necessary, so I did save 2 bytes. |
I added the "unset values" to the specification. |
Yes, I do. Some users requested a feature to keep modification time of source files ago. C-runtime library will be able to restore the timestamp. Because modification time is the only common information between UNIX or FAT Permissions packets, storing the item may be useful for both users.
I think that default values for timestamp should be 0, because it's not a real used value. The definition of
From PAR3 spec, a same packet may be applied to multiple files, directories, or links. But, it will be impossible by different timestamps, even when their permissions are same. I think that timestamps are different normally, unless a user set a specific time manually. Or, do most files have a same timestamp on Linux/UNIX ? If timestamp differ between files, each file needs to attach its own Permissions Packet. I feel that it may be good to split timestamp from permissions. For example, two independent packet types like; Permissions Packet and Timestamp Packet.
From PAR3 spec, only 12-bit are used in 16-bits fireld. Then, is it ok to set 0x8180 (regular file, owner read+write) as default
When I read how to calculate InputSetID, I understand that the 8-bytes value's randomness depends on another random 16-bytes hash value. Then, I felt that CRC-64 might be enough to represent the packet body, instead of using
I'm not sure that you intended this big difference between PAR2 and PAR3. In PAR2, SetID is specific to input files and block size, even when the value looks random. If a user make a recovery data for same files and block size, the resulting recovery data has same SetID and they are compatible. In PAR3, InputSetID will differ with everytime a user create a recovery data for same files. Even though their recovery data can be compatible, PAR3 client won't use other data by different InputSetID. For example, par2cmdline has an option Now, if par3cmdline has a similar feature, it will require a parent PAR3 file instead of specifying input files. It's almost same as parent PAR3 file for incremental backup. But it will inherit the parent's InputSetID and Start Packet, when their input files are same. So, it needs to verify input files before creating additional recovery blocks. If their input files were changed after the first creation, it will fail to create extra blocks in the same InputSetID. At that time, a user may select repairng to original files or processing incremental backup for the changed files. Anyway, PAR3 developers and users should notice the different behavior from PAR2. Even when a user create PAR3 files for a same source files with same setting, creating multiple times simply will result in incompatible recovey files. (Though recovery blocks themselves are compatible, they have different InputSetID.) It requires a special feature to create extra recovery blocks with the same InputSetID. |
0 is a perfectly valid timestamp. In both UNIX and Windows formats! 2^64-1 is a better value. I should add a note to the specification about times from the future.
Good catch! I've updated the spec to use 0x0180. BTW, setting the UNIX i_mode to 0x0180 and FAT Attributes to all 0s are going to act like default values. That is, the decoding client cannot tell the default value from an "unset" value. I am okay with that right now. In most cases where that is a problem, the permission packets shouldn't be in the file at all. If you can present an expected usage where that is a problem, we can talk about changing it.
First, I agree with the users. If a file exists and has the correct data, I don't know why a Par2 client should modify it. The client should only have to read the file, not write it. But that's a decoding client behavior and has nothing to do with the file specification. I'd expect a Par3 client to behave the same if no Permissions Packets are present.
Yes, that is correct.
The overhead for packet header is 48 bytes. The timestamps are 24 bytes. The other permissions are usually going to be about 28 bytes. So, in the current specification, there is 1 packet for each file and its length is 48+24+28 bytes. You propose to replace that with one packet that holds the permissions at length 48+28 bytes plus one packet per file holding the timestamps of length 48+24 bytes. Since the File packet overhead is already around 140 bytes, with a very large number of files you'd cut the per-file-overhead from 240 bytes to 212 bytes. I don't think it's a big enough win to be worth a different packet.
The Blake3 hash has uniqueness guarantees that the CRC-64 does not have. We need the InputSetID to be unique.
First, it doesn't have to. Client authors can use a technique similar to Par2's calculation of the RecoverySetID. Line 368 of the Markdown version of the spec says: "Another method for generating a globally unique number is, if the input set's data is known ahead of time, is to use a fingerprint hash of the parent's InputSetID, block size, Galois field parameters, and all the files' contents." But Par3 does allow a client to use a globally-unique random number. The reason for the change from Par2 to Par3 was that some client authors reported that Par2's value was too restrictive. In Par2, an encoding client had to hash all the files and file names first, to generate the RecoverySetID. Only then, could it write out the packets to the Par2 file. Moreover, if anything changed, even just a file name, the RecoverySetID changed. For Par3, I wanted to add a little more freedom. The InputSetID could be the same value as with Par2, or it could be a globally-unique random number. And, yes, that design impacts the usage. In Par2, for the same set of input files, every client would create the same packets. But that's not the case with Par3. But, given the other features (like deduplication), that wasn't going to be guaranteed anyway. (In fact, I think I need to rewrite the specification to make sure that there aren't clashes with deduplication.) If a Par3 client has an existing Par3 file, it can create an incremental backup or just create new Recovery Packets (and possibly matrix packets) that reuse the existing InputSetID. Yes, you are correct when you say "it needs to verify input files before creating additional recovery blocks." But, to create Cauchy-matrix recovery blocks, the client needs to read all the files anyway. So that isn't a big deal. But, if we assume two clients create two different Par3 files using the same block size and Galois field, it is possible for a client to identify overlapping input blocks of common files and reuse the recovery information. The client would definitely be more complicated than the Par2 client, but it is possible. And, if a client supports incremental backup, it may not be too different. |
While I read the GNU C Library manual to see difference from Microsoft C-runtime library, I found an interesting point.
Oh, I see. You are right. That was a bad idea. Then, I came up with a perfect solution for FAT Permissions Packet and a maybe good solution for UNIX Permissions Packet. FAT Permissions Packet has 4 items, and the packet body size is 26-bytes. If the priority order is fixed, it's possible to determine which items exist by body size. The list of body size and contents is like below;
Because UNIX Permissions Packet has more items, it's impossible to select each. So, it splits timestamp and permissions only. The list of body size and contents is like below;
How about my new idea ? Single packet can store various items. The packet's body size will indicate the kind of content. |
|
btw, Could storing it help with incremental backups here? |
Yes, mtime is the time the file's contents changed and ctime is when the file's metadata changed. The ctime can only be set by (1) mounting the drive without mounting the file system or (2) changing the system time. (I think #2 is called a "time storm".) And we don't want clients doing either of those. The "tar" program stores ctime. If either the mtime or the ctime on a file changes, tar will backup the file and its metadata. I believe @malaire is right --- the ctime might be useful for incremental backups. If the ctime changes, we update the UNIX Permissions packet. (And, obviously, if the mtime changes, we update the file's contents.) The other way to check if permissions changed would be to call stat() and getxattrs() for every file. That will probably be expensive with xattrs. Also, "tar" stores ctime and, when in doubt, I'll imitate "tar". I will add a note to the specification about ctime. @Yutaka-Sawada Your idea of various-sized permissions packets is interesting. But, it requires picking an ordering the fields in importance. That's easy for the FAT Permissions packet. I don't see an obvious ordering for UNIX Permission fields. |
Latest Markdown version of the specification. |
When I tried to construct chunk information for files, I thought that 16-bytes hash of chunk might be troublesome. When chunk size is equal or larger than block size, When it can keep a whole file data on memory, there is no problem in calculating hash of chunks. But, it may be annoying to read and hash chunk's data from the first byte to the last byte ordinary, when file size is very large. File access order is the problem, because it wants to read every input blocks 's same offset at once, instead of reading each block incrementally. Technically, it's possible to read a whole file at first and read every blocks next, but it's not fast by reading the same file data two times. Also, I'm not sure the worth to calculate a hash value, which has no use later. Thus, I suggest to use 1-byte flag, instead of 16-bytes hash. It indicates the chunk usage. If the flag is 0, the chunk isn't protected. (such like PAR inside) If the flag is 1, the chunk's hash may be stored in |
This sounds like the same argument again. That there is a hash of each block, so we don't need a per-file hash. I believe we need per-file hashes. I've relaxed my position a little to have per-chunk hashes instead of per-file hashes, but am still having second thoughts about it. The very first thing an archiving program should do when encoding is compute a checksum of the whole dataset. And the very last thing it should do when decoding data is verify that checksum. Any extra calculation or extra complication to calculating that checksum is an opportunity for a bug to sneak in and for the client to report "the data is just fine!" when it isn't. That chucksum is the most important guarantee to our users. If we ever say "the data is just like it was at the beginning" and it isn't, we've lost the trust of all our users. I know that per-file hashes (or per-chunk hashes) are extra work. And they may require a second pass over the data, or a more complicated client to avoid two passes. But I think it is important and worth it. As for the details, I'm not sure what you mean by "File access order is the problem, because it wants to read every input blocks 's same offset at once, instead of reading each block incrementally." I'm not sure what "it" is. Is it the per-chunk hash algorithm? Is it the per-input-block hash algorithm? Is it the calculation of the recovery data? Because I'm pretty sure all of those can be coded to pass over the file incrementally from start to finish. I don't know of any algorithm that has to process the first byte (or Galois field) of every input block at once, before going to the second byte (or Galois field) of every input block. I don't know how every Blake3 library is coded, but I can believe that some require the data to arrive sequentially. That is, file has to be loaded in order. That may be a problem for clients that want to go very fast and have multiple CPUs available to calculate the hash. In that case, the encoding client might want to break the input file into N equal-sized chunks, so that each of the N CPUs can each calculate the hash for one chunk of the file. |
I understand what you say. Verifying a whole file with a checksum is important. But, how to calculate the checksum at creating time is the problem. I felt that calculating BLAKE3 of chunks were worthless, while I calculated BLAKE3 of each input block in every chunks. How about using general CRC-32 for each file's checksum ? Most archivers and checksum tools support CRC-32. (On the other hand, most tools don't support BLAKE3 yet.) Because CRC-32 is different algorithm from checksum of input blocks, file's CRC-32 will give additional trust. (However, my aim is that CRCs can join at calculation.) It will check CRC-64 and BLAKE3 of input blocks, and CRC-32 of input file as the final confirmation. They are worth to check, instead of checking BLAKE3 hashes of same file data two times. Also, users will be able to see raw CRC-32 values in recovery set, and other tools can confrim the file reading was correct or it was verified correctly. One weak point of PAR3 is that using hash algorithms are minority. By using popular CRC-32 as file's checksum, PAR3's result become compatible with other tools. Though MD5 was the common checksum in PAR2, we use BLAKE3 instead of MD5 in PAR3. While we know BLAKE3 would be more reliable than PAR2's MD5 or CRC-32, it's difficult to check by 3rd party tool. So, storing CRC-32 in PAR3 may be good for compatibility.
If PAR3 will use 1-byte flag in Chunk Descriptions and use CRC-32 as per-file hashes, we don't need per-chunk hashes. In this case, file's checksum will skip non-protected chunk bytes simply. Storing 16-bytes zeros as a flag for non-protected chunk is strange. From the nature of CRC, CRC-32 is suitable for per-file hashes. (It's easy to calculate on very large files at creation.) |
Any CRC is unsuitable as file checksum as they are too short and have too many collisions (and are also easy to forge, but not sure if that issue here). Personally I think that every file must have single cryptographic hash (at least 128 bits, but I'd prefer 256) which covers whole file so that it is trivial to check that file was created correctly. |
I suggested CRC-32 for checking bug or failure in implementation. It doesn't need to have strength against forge or collision. There is BLAKE3 already for checking error or damage in file data.
If it's a single hash to check integrity of files, I think so, too. Because PAR3 has BLAKE3 hash for every input blocks in a file, this case won't require cryptographic strength to see the correctness of read file data or repaired result. But, it may fail to arrange the blocks or miss somewhere. Then, checksum of whole input blocks is calculated as chunk's hash. In this limited usage, I thought that CRC-32 was suitable, because it's easy to concatenate.
There is a reason, why I wanted to shorten Chunk Descriptions. From PAR3 spec, a File Packet may contain multiple Chunk Descriptions. And all PAR3 clients must treat them. While I was thinking how to implement deduplication, I found that it's difficult (or impossible) to load many Chunk Descriptions on memory in rare case. For example, there is a 1 MB file with random data. By setting block size to be 1-byte, it will make 256 input blocks and 1,048,320 duplicate blocks. There may be max 1,048,576 Chunk Descriptions in the File Packet. Because one Chunk Description is total 32-bytes now, Max 1,048,576 Chunk Descriptions consume around 32 MB. Though you think that 32 MB is small, a 1 GB file will make a File Packet of 32 GB. I predict that most PAR3 clients won't be able to decode so large packet. Small block size may result in many duplicated blocks. That was why I suggested to set minimum block size to be 8-bytes or 40-bytes ago. |
We use the Blake3 hash for uniqueness. We use CRCs for rolling hashes. We want the data, after decoding, to match what it was at the beginning. Uniquely. That is the entire purpose of having 1 hash for all of the data. Thus, we use a tree of Blake3 hashes for it. @malaire I agree that I would like every file to have a cryptographic hash. But Par3 doesn't protect all the data, in order to allow Par-in-Par. So, hashing the entire file was problematic. I settled on cryptographic hashes for each chunk. I feel that in most cases, each file will be one chunk. |
I came up with the best perfect solution for a problem; lack of file's checksum. I explain the problem and our opinions here again. Design of PAR3PAR3 treats an input file as an array of input blocks. Each input block is protected by BLAKE3 hash and CRC-64. For deduplication, incremental backup, or PAR inside, an input file may have single or multiple chunks, which consist of multiple input blocks or unknown data. Though each chunk is protected by BLAKE3 hash, each input file has no checksum for the entire file data.
My (Yutaka Sawada's) opinionI like simple construction and speed. Because I believe that tree of BLAKE3 hashes can protect the entire file data indirectly, I don't complain about the lack of file's checksum. Even when there is no checksum of the file itself, internal blocks' BLAKE3 hashes would be enough to represent the completeness. Then, I oppose checksum of chunk. I want to remove the BLAKE3 hash of chunk as a troublesome factor. While blocks' BLAKE3 hashes can proof the integrity of file, it should not require additional BLAKE3 hash for chunk. Markus Laire' opinionHe wants that every file has single cryptographic hash, which covers whole file. Because it's impossible to check PAR3's array of BLAKE3 hashes manually, single hash is preferable for compatibility with other checksum tools. Some users requested that PAR3 should have included multiple kind of hashes to check integrity of source files ago. I understand their hope. There are some maniac or paranoid users, who calculate hashes of their files. Parchive is the last wish for them, when their files were broken. Michael Nahas's opinionWhile PAR3 protects file data by tree-structure of BLAKE3 hashes, PAR3 clients may happen to fail the calculation. So, it checks BLAKE3 hash of each chunk. He refused to remove checksum of chunk, as oppose to my suggestion. Though he agreed the requirement of file's cryptographic hash, he could not put checksum of the file in File Packet. For "PAR inside" feature, a chunk may not have checksum for unknown data. When a chunk isn't protected, the whole file cannot be protected, too. Because of the rare case, PAR3 doesn't contain checksum of file. Idea of Checksum PacketNow, I suggest a new packet type; Checksum Packet. Checksum Packet contains multiple hashes of an input file. It's linked to an input file as a packet for options, as same as UNIX Permissions Packet. When a file has non-protected chunks, it cannot have Checksum Packet, because the file data is unknown. Checksum Packet behaves as if extra packet for additional insurance. If a user want to protect his files strongly, he can add favorite cryptographic hashes in this packet. PAR3 clients will be able to support many hash functions. Markus Laire would be satisfied with this solution.
For example, MD5 hash is stored as Change of Chunk DescriptionAs mentioned above, I solved the fault of PAR3, which cannot check a whole file with general hash functions by compatible way. When Checksum Packet protects file data with cryptographic hash functions, checksum of chunk doesn't need so strong hash. Then, I suggest to use CRC-32 for checksum of chunks instead of current BLAKE3 hash. Though I wanted to remove checksum of chunk, I compromise the Michael Nahas's concern for internal calculation failure. CRC-32 can be the minimum error checker for both chunks and the entire file. When a user wants speed and doesn't make Checksum Packet, CRC-32 is shown as a checksum of whole file. (A PAR3 client can concatenate CRC-32s of chunks to show CRC-32 of entire file data.) As a checksum of file, 4-bytes is very small, but it's better than nothing. (Currently, PAR3 doesn't have checksum of file at all.) Because 32-bit is too small to indicate a non-protected chunk, it may require 1-byte flag. Even when there are 1-byte flag and 4-bytes CRC-32 for each chunk, the total 5-bytes is less than BLAKE3 hash's 16-bytes. Small chunk description is good for small packet size. |
Par3 is about safety not speed. Saying "I can make this much faster by giving up safety" is going against the total point of Par3. Saying "safety is optional" is going against Par3. I care about speed, but it is a far secondary consideration far below safety. We should be discussing how to put per-file hash back in. Can we get rid of the per-chunk checksums and say that unprotected chunks are assumed to be 0-bytes when calculating the per-file hash? |
Yes, we can, though "Par Inside Another File" feature needs tweak. When a user cannot check "per-file hash" with another tool, the hash value is impossible to confirm, as same as "pre-block hash" or "per-chunk hash". If a PAR3 client shows "per-file hash" to a user, it should have a feature to reconstruct original "outside file". For example, there is a ZIP file and its hash value. A user can check the integrity of ZIP file by re-calculating the hash. When a PAR3 client creates "PAR inside ZIP", the hash value becomes different. When a PAR3 client removes "inside PAR" from the ZIP file, the hash value returns to be same. So, PAR3 clients will have two repair mode for "PAR inside ZIP"; reconstruct original ZIP file (by removing PAR3 packets), or repair modified ZIP file (by keeping PAR3 packets). To perform such "Remove PAR from Outside File" feature, "PAR inside" feature has some restrictions. At first, sum of protected chunks must become the original outside file. When a PAR3 client puts PAR3 packets or extra data between or after the original file, such additional bytes must belong to unprotected chunk. So that, another PAR3 client will be able to join the protected chunks to construct an original file, when a user wants to check the file with "per-file hash" manually. |
When I'm making packets, I found a problem in the order of packets. "Order of Packets" in PAR3 spec is written as following;
To do single-pass recovery, File and Directory and Root packets must exist before Data packets. Also, External Data should exist before Data packets. Data packets contain raw data of input blocks. A PAR3 client needs to know which input file to put the input block at recovery. A PAR3 client needs to know which input file is damaged before putting the input block. So, a PAR3 client wants to read file and directory tree information at first. By the way, the order of File and Directory and Root packets seems to be decided at creation time. Because Directory Packet includes checksums of child files and directories, a PAR3 client needs to make packets of children at first. So, the making order of packets must be; File packets -> Directory packets -> Root packet. Normally, making order and writing order is same. Then, the recommended order of packets would be:
|
I don't expect another program, like "b3sum", to be able to confirm the hashes for files with Par-inside. In most cases, we're modifying those files. It's a rare enough case that I'm not going to worry about it. I'll try to spend some time working on putting the per-file hash back into the specification. |
I found my restriction against PAR3 spec : "Link Packet". It seems that Microsoft C-runtime library cannot distinguish a real file, hard or symbolic link on Windows OS. I don't know how they are searched or looked. Because adding/removing Link seems to require administrator privilege on Windows OS, it's difficult to use by normal applications like PAR3 clients. So, I won't support the Link Packet in my PAR3 client for Windows OS. Then, implementing Link Packet for Linux/UNIX OS will be another programer's task. |
This will be possible, only when damaged blocks are very few. Or else, it depends on randomness. Also, there is a problem of memory requirement for the generator matrix. The matrix size seems to be well known problem of LDPC. There are many papers which tried to solve the problem. To decrease memory size for matrix, structured matrix is used, instead of randomly created matrix. One of them is called as Quasi-Cyclic LDPC. While I tried to read some papers, I could not understand the construction. There seems to be a fast encoding method, too. Construction of quasi-cyclic LDPC codes from quadratic congruences
|
Though I refered the original Sian-Jheng Lin's sample code and Markus Laire 's Rust implementation, it was too difficult to implement Low Rate Encoder/Decoder for me. It's not so easy, hehe. My mathematical knowledge is junior high school level. But, I found that leopard-RS library worked with 100% over redundancy. While it's High Rate Encoder/Decoder currently, it works by zero padding internally. There seems to be speed and memory usage problem. If someone implements Low Rate Encoder/Decoder in leopard-RS library, it will solve problem. By the way, I found a paper about this problem. There may be another solution in future. |
That sample code has two bugs for encodeL path. To change that code to use encodeL:
I also had to rename ps. In the other thread I said that generally you can't use same decoder for encodeH and encodeL since they are not compatible. That sample code does use same decoder for both and in this specific example it works, but not generally. |
I have a question about UNIX Permissions Packet. Should I repeat the optional packet as same as File Packets in each PAR3 file ? At this time, I categolize packets as below. One time in each PAR3 file: Multiple times in each PAR3 file, if there are multiple Data Packets or Recovery Data Packets: One time in whole PAR3 files: |
I implemented mtime and i_mode (permissions) in UNIX Permissions Packet. Other fields are not supported (or incompatible) on Windows OS. Feature to store/recover mtime (modification time) will be useful, as some ZIP archivers restore mod time. By the way, changing READ or WRITE permissions of a file may fail verification and/or repair. If you don't have read permission, you may not be able to verify. If you don't have write permission, you may not be able to repair. Though I test behavior on Windows OS, I don't know what happen on Linux OS. On Windows OS, timestamp and permissions of folders (directories) may not work. Even when it restores original timestamp (modification time) of a directory, it will be changed after repairing child files. It seems that permissions of folders are ignored. Then, I added the optional packet for files only. |
Directory timestamp needs to be restored after its child items (files & directories) have been handled. |
Oh, I see. Thank you for the idea. Though I implemented the function, it doesn't work on Windows OS. I found that Microsoft C-runtime library did not support changing property of directory. I cannot change both timestamp and permissions of a directory on Windows OS. It seems that Win32API can modify timestamp, but I cannot write in standard C language. I don't know it works on Linux/UNIX. |
I found a problem about File System Specific packets (options in File Packet). When a File Packet includes an option (such like UNIX Permissions Packets), checksum of the File Packet becomes different. Because Root Packet includes checksums of File Packets, Root Packet becomes different, too. Then, Recovery Data happens to be imcompatible, even though file data itself is same. This isn't so serious, when a user create PAR3 files in same setting. This may cause compatibility issue, when another user tries to create additional PAR3 files. Because it's difficult to retrive all setting of original PAR3 files, par3cmdline is good to have a special feature to create additional PAR3 files. Such like, |
Ago, I posted an idea of cohorts to support many blocks. The technique seems to be known as Interleaving in general. Mostly interleaving is used to correct burst error (lost of a whole file in our case) in small Code Word. Because the advantage is prooved, it would be good to try. As one kind of Interleaving, it maps blocks to cohorts by using very simple modulo based interleaver. For example, when there are 10 blocks (index = 0 1 2 3 4 5 6 7 8 9). The relation between blocks and cohorts is easy to calculate. While this method is simple, printing the verification result may become complex. I will implement and test. |
I implemented cohorts (interleaving blocks) to support many blocks for FFT based Reed-Solomon Codes. It seems to work well for loss (burst error) of input files. Even when Reed-Solomon Codes is calculated over 16-bit Finite Field, par3cmdline supports max 2^47 blocks currently. (2^32 cohorts * 2^15 blocks = 2^47 total blocks) The number is more than 32-bit Reed-Solomon Codes and the speed is faster. It's possible to support more number of blocks in future, when a PC contains huge RAM. But, the recovering capability is lower than whole codes. Recovery blocks in a cohort can recover input blocks in the cohort only. If many blocks in a cohort were damaged, it may not restore the cohort. Even when total repair is impossible, other cohorts may be recoverable independently. It would work as if locally repairable codes. I wrote how to interleave blocks and construction of |
I implemented "extend" command in par3cmdline. It reads a PAR3 set and creates compatible recovery blocks. A user may add more recovery files or re-create a damaged PAR3 file newly. Internally, the mechanism is combination of "verify" and "create". It verifies input files with the PAR3 set, and creates recovery blocks for them in same settings. So, a user cannot change arrangement of input files nor alignment of input blocks. Because a user is difficult to set unknown options by another user, this "extend" command will be useful. QuickPar has this feature as "Extra" button. In par2cmdline, same input files and block size will make same recovery blocks. There was no compatibility issue in the age of PAR2. On the other hand, PAR3 has more factors and may be incompatible by creating PAR3 clients. So, I made this function to follow settings of original PAR3 set. |
I consider "PAR inside ZIP" feature (putting recovery data inside a ZIP). From Parity_Volume_Set_Specification_v3.0:
Simple construction of ZIP file is like below; The "list of content" must exist at the end of ZIP file. So, par3cmdline may put PAR3 packets between "file data". The protected ZIP construction will be like below; Now, there is a restriction of position, where to put PAR3. It cannot split real "file data" in a ZIP file. When a ZIP file contains only 1 file, there are only 2 spaces to put PAR3 packets. Construction of single file in ZIP is like below; The protected ZIP construction will be like below; For example, there is a ZIP file of 200 MB. Setting 3% redundancy creates 6 MB recovery data. When the ZIP file contains only 1 file, it puts 3 MB recovery data in front and back of "file data". When the ZIP file contains multiple files, it may put 2 MB recovery data in front, middle, and back of "file data". Then, I'm not sure how many space is good. Putting in 2 spaces (front and back) is simple. But, it may be easy to loss whole PAR3 packets. Deleting top of file and end of file can strip off protection with ease. Putting in 3 spaces (front, middle, and back) is more complex and hard to remove. But, is it worth to try ? As I'm lazy, simple construction (2 spaces) would be enough. |
I tested a method to insert 2 spaces in a ZIP file. By inserting unprotected chunk before But, I found a little problem. As it updated offset in ZIP file, the hash value becomes different. Construction of original zip file; Construction of spaced zip file; Construction of protected zip file; Also, it needs to make a temporary file, which size is larger than the original ZIP file. Though it may keep a small ZIP file on RAM, large ZIP file will require more file access. Inserting bytes between a file is a heavy task. I feel that "PAR2 style recovery record" may be good. In the method, it just appends recovery data after a ZIP file. It's simple, fast, and easy to implement. But, it's weaken against sequential damage (burst error), because all recovery data is put at the end of file. Construction of protected zip file with PAR2 recovery record; |
While I tried to implement "PAR inside ZIP" feature, I found a fault in current PAR3 specifications. In File Packet, there are 2 hash values; Because protected outside file consists in unprotected chunks and protected chunks, these hash values are useless to detect damage of the file anyway. It's worthless to store such useless values in the packet. These values will be; When the outside file is a ZIP file, File Packet should contain hash values of original ZIP file, instead of protected ZIP file. To insert PAR3 packets inside a ZIP file, it may modify the original data in the ZIP file. To remove PAR3 packets from a protected ZIP file, it should know how was the file before. By checking hash value of original file, it can confirm that the removing was done correctly. par3cmdline will have two features to "insert PAR3 packets in ZIP file" and "remove PAR3 packets from protected ZIP file". Then, even if PAR3 packets inside ZIP are damaged, it's possible to restore them anytime. |
Though I posted one idea "storing checksum of original file at PAR inside" ago, I take it back. It's hard to determine original form of a file. It's specific to file format and the PAR3 client. So, it's impossible to define the common usage of the checksum. I implemented "PAR inside ZIP" feature. File Packet contain two checksums of an outside file. CRC-64 of the first 16 KB. BKALE3 hash of the protected data. If the first 16 KB includes any unprotected chunks, the CRC-64 value becomes useless (set to be zero). All unprotected chunks are ignored at calculating BKALE3 hash. So, a PAR3 client's verification process must consider the existence of unprotected chunk. I tested two file formats, ZIP (.zip) and 7-Zip (.7z). The manner is same as that of PAR2 recovery record. It doesn't change any bytes of an original file. It just adds recovery record at the end of the original file. How to add PAR3 recovery recode to ZIP file (.zip):
How to add PAR3 recovery recode to 7-Zip file (.7z):
Because it doesn't modify an original file, it's easy to append or remove PAR3 packets. I made But, there is one problem, verification cannot state that a protected ZIP file is complete or not. Because it cannot determine the status of unprotected chunks, it checks protected chunks only. Also, it's difficult to repair the protected ZIP file in the protected state. Though it's possible to repair a damaged protected ZIP file, it will erase appended PAR3 packets in unprotected chunks. As their original data in unprotected chunks are unknown, it's difficult to copy back PAR3 packets in the repaired protected ZIP file. While it can protect a ZIP file (outside file), it cannot protect itself (PAR3 packets inside the ZIP file). This would be a fault of "PAR inside" feature. |
There is a defect in current PAR3 specification, which cannot determine a PAR3 file is complete or not. While a PAR3 file consists of many packets, there is no checksum of the entire file. Even though every packets have their own checksum to detect damage, there is no way to find loss/insert/replace/change of any packets. It's impossible to reconstruct the original formation of a PAR3 file. PAR3 inherits this problem from PAR2, whcih uses the same packet construction. Thus, PAR2 clients cannot repair PAR2 files themselves, even when they can repair input files. While PAR3 introduces "PAR inside PAR" feature to protect another (outside) PAR3 file, it cannot protect itself. "PAR inside" allocates unprotected chunks to put PAR3 packets. As I wrote in the last post, state of PAR3 packets in the unprotected chunks are unknown. We should solve this problem in PAR3. I came up with a new idea, which keeps information of each PAR3 file. An input file consist of many input blocks. PAR3 files store information of all input blocks in External Data Packet. I think that the same manner is available for PAR3 file itself. Because a PAR3 file consists of many PAR3 packets, keeping the order of packets is enough to reconstruct them in the future. I named the usage as Order Packet. It stores a PAR3 filename and all checksums of PAR3 packets in the PAR3 file. If some PAR3 packets are lost or damaged, a PAR3 client will be able to copy the packet from another position. It can arrange packets to reconstruct a PAR3 file by specified order in the Order Packet. Though it cannot recover lost or damaged packets, it just copies complete packets only. Because most PAR3 packets are duplicated many times in a set of PAR3 files, it's possible to find the same packet from another PAR3 file. I write current details below. If someone finds a problem or better idea, please advise me. Order PacketThis packet specifies filename and order of PAR3 packets in a PAR3 file. The Order packet has a type value of "PAR ORD\0" (ASCII). The packet's body contains the following: Table: Order Packet Body Contents
The first and second fields are used to store filename of each PAR3 file. When there is only one PAR3 file, it may not store the filename. It's possible to correct misnamed PAR3 file by reading filename in Order Packet. When a PAR3 client doesn't store the PAR3 filename in Order Packet, The number of including checksums is calculable by this packet's length. Table: Identifier for Order Packet
Because it cannot store packet's checksum of itself, it uses identifier. Normally, a PAR3 file should include its own Order Packet only. Example case of four PAR3 files; But, it's possible to put Order Packets of other PAR3 files as a rare case. Unimportant packets, which may not be duplicated in a set of PAR3 files.When a PAR3 file is damaged (some PAR3 packets are missing or damaged), Table: Identifier for Creator Packet
Because it doesn't need to protect Creator Packet, it uses identifier. Table: Identifier for Comment Packet
Because it doesn't need to protect Comment Packet, it uses identifier. Unique packet, which isn't duplicated in a set of PAR3 files.Normaly PAR3 files don't contain identical Data Packet nor Recovery Data Packet. Table: Identifier for Data Packet
Table: Identifier for Recovery Data Packet
Outside data, which isn't belong to PAR3 packet.When a PAR3 file includes some data outside its PAR3 packets, they are called as "outside data". Order Packet stores the size of the area, where PAR3 packets may not exist. Table: Identifier for outside data
|
Sorry, I've been inactive for a while. I didn't fall off the boat, if that's what you were wondering! Yutaka Sawada, you say that "There is a defect in current PAR3 specification" that PAR3 files cannot repair themselves and there isn't a checksum for the entire file. I'm willing to discuss the details of implementing this, but first I want to know the use case. Where do you expect users to need this? Right now, my use case is that someone either (1) stores data on a faulty disk or (2) transmits data over a faulty network. When data is lost and they use the PAR3 files to repair the data. I'm not sure why someone wants to repair the PAR3 file. If a user wants to store the data again or transmit the data again, any damaged PAR3 data can be deleted and new PAR3 recovery data can be generated. Can you say why a user needs the exact same PAR3 as the original? |
I'm not clear what you mean by you implemented "cohorts". Are certain packets written out of the usual order? Is something done to the matrix? What is the goal of cohorts? Interleaving is used to distribute errors evenly among the input blocks. Randomizing the placement of errors is useful when the recovery matrix has a harder time recovering errors near each other. The only recovery mechanisms we've talked about are either random or Cauchy. Neither of those is affected by errors being nearer to each other. |
Regarding Par-in-ZIP, I find it is amazing you're able to do that! It was certainly not something I expected to do with the initial release of Par3. I figured the initial versions of Par-in-ZIP would work the way you implemented: appending to the end of the file and duplicating the "list of contents" section. I'm sorry that I wasn't active and able to explain it to you at the time. |
Regarding file/directory permissions, yes, the checksum the Root Packet was designed to change if the file permissions change. I considered it best to treat a directory snapshot with permissions as completely different from one without permissions. It's more of a "superset" relationship, but I decided it wasn't worth modifying the specification to capture that aspect of the relationship. I don't think it will come up much. |
I don't know when I'll get time to look at your code, but I'll try to make some time soon. |
Welcome back, Michael Nahas.
Users want to know which PAR files are damaged. Only when a PAR file is damaged, he deletes the damaged one and will re-create it again. Below is the user's real complaint.
Because the original post disappered at the end of old web-forum, I put the MIME HTML and Web Capture (if you cannot open the old html file) on my web space. They are "Error handling does not detect damaged par files.mht" and "Error handling does not detect damaged par files.jpeg" in "MultiPar_sample" folder on OneDrive.
Because I'm not good at English words, I may be hard to say waht I think. I try as possible as I can. You seems to miss the first starting point; "How to find damaged PAR3 file ?". While a user doesn't know a PAR3 is damaged or not, he won't delete it nor re-create it. Then the user posted the refered claim; "Error handling does not detect damaged par files". On the web-forum, another user suggested to make Checksum file for created PAR files. So that, he can detect damage of PAR files by the checksum file. Because he seems to be a little paranoid, he doesn't rely a solution but uses multiple checkers.
But, most users may use PAR3 without another checksumer. In your example use case, you would not store nor send a checksum file of PAR files. Do you say that "Detect PAR3 file's damage by using the attached checksum file" ? I think that it's useful to offer a way to detect damage of a PAR file by using itself.
It's not so important to restore the original formation. But, it's important to know the difference. If a PAR file is different from the original data, he can know something wrong in the PAR file. It depends on him to recreate newly or keep still. At least, he needs a chance to action by seeing the status of PAR files. After I read the post, I modified my PAR2 client to return status of PAR2 files. While it's impossible to determine a PAR2 file is "complete original formation", it can find damaged PAR2 packets or unknown bytes between packets. Each PAR2 file is labeled as "good (all PAR2 packets are complete)", "damaged (there are damaged packets or unknown bytes)", or "useless (it doesn't include any packets)". Though it's simple and may not detect damage rarely, it seems to be enough for most usage. The rare case may be interrupted transmit (or save) after some PAR2 packets. Because PAR2 file consists of many packets, it will be a good PAR2 file with a few complete packets. For example, there is a PAR2 file of 100 MB. If only the first 1 MB of the file is sent or saved, it can be good status with complete packets. It may happen, when someone sends (saves) PAR2 packet one by one. Normally, it won't happen, as he would send (save) multiple PAR2 packets at once. If transmit (save) is stopped while a packet data, the packet is shorten, and it will be detected as damaged. Now, it's more difficult to determine a PAR3 file is damaged ot not, than previous PAR2 file. PAR3 allows padding between packets. "PAR inside" feature inserts unknown bytes (outside file data) between packets. It will be hard to distinguish "remain of damaged packet" and "intended inserted bytes". Though it's possible to label a PAR3 file's status at the current time, it's not so accurate. |
The "cohort" implementation is based on John M Długosz 's idea. It distributes input blocks into some groups. It creates recovery blocks in each group independently. He named the group of input blocks and recovery blocks as "cohort". Each cohort contains a set of input blocks and recovery blocks. A recovery block in a cohort can recover an input block in the cohort only. The idea of "cohort" can be adapted to all Error Correction Codes, because the behavior is just a regular interleaver.
At this time, I implemented "cohort" system (interleaving) only for FFT based Reed-Solomon Codes, because Cauchy Reed-Solomon Codes is too slow to treat many blocks. I added a new packet type "FFT Matrix Packet" to support FFT based Reed-Solomon Codes. I wrote details on "Appendix.txt" in par3cmdline's directory.
For convenience, I put one stripe's Recovery Data Packets at once. PAR3 file's volume number represents the number of recovery blocks per cohort instead of number of total recovery blocks. For example, there are 200,000 input blocks. Someone creates PAR3 file with 10% redundancy (20,000 recovery blocks). Because 16-bit Reed-Solomon Codes cannot treat so many blocks, my PAR3 client distributes them into 4 cohorts. Then, there are 50,000 input blocks per each cohort. Thus, 16-bit Reed-Solomon Codes can create 5,000 recovery blocks from the 50,000 input blocks.
This style gives smaller volume numbers with fewer PAR3 files. Even when there are many recovery blocks, the number of PAR3 files won't become so much. This numbering indicates real recovering capability. While there are 20,000 recovery blocks, it can recover only 5,000 lost input blocks per each cohort.
As I wrote above example, "cohort" supports unlimited number of input blocks, which exceeds 16-bit Reed-Solomon Codes. Because 32-bit Reed-Solomon Codes is too slow, I want a faster method to support many blocks. If there is another way (faster recovery codes such like LDPC) for many blocks, I don't use cohort for the Error Correction Codes. At this time, I didn't find a reliable good implementation of LDPC. Using cohort with FFT based Reed-Solomon Codes seems to be the best method for many blocks currently.
English wikipedia page for interleaving is hard to understand for me. But, it seems to say that "burst errors" may happen more often than "random errors". This would be true in PAR3 usage. When a file is missing, sequential input blocks in the file are lost. It's same as burst errors. Interleaving (as cohorts in PAR3) is good at treating such burst errors. Though it has a weak point (it cannot recover too many lost in each cohort), the risk may be small at very many blocks. This is why I didn't adapt cohort to Cauchy Reed-Solomon Codes for a few blocks. If 32-bit or 64-bit Reed-Solomon Codes is faster than 16-bit Codes, I don't use interleaving. To support many blocks like 2^32 or 2^64, we will need to use cohorts system. |
Hey. Sorry for the absence. I created a linux/ directory. It has the code from the windows/ directory, with some changes. The big change is to use the portable %" PRIu64 " to print 64-bit numbers, instead of %I64u. I wrote Automake files (Makefile.am and configure.ac) for it. It also has #ifdefs to include Linux header files and to exclude any functions that call Windows system calls. The code compiles in Linux, but doesn't create a binary because those functions are missing. The code should compile on Windows. I was able to get it to compile with MinGW, which allows Windows source code to compile on Linux, and run it in WINE, the Windows emulator I was able to create a par3 file and verify it! Yes! WINE doesn't support AVX instructions, so I had to modify linux/configure.ac to disable AVX in Blake3. I don't know if I'll run into a similar problem with the Leopard library calling AVX instructions. The functions that need to be implemented natively for Linux are:
Some things volunteers can do:
I'll see what I can do. I also need to learn about the Leopard library, reply to Yutaka's messages, and update the specification. |
@Yutaka-Sawada Almost every C file starts with "_CRT_SECURE_NO_WARNINGS". I googled it and those warnings are associated with buffer overflows in scanf. Do we want to suppress those warnings? Especially in every file? |
The difinision is put to disable warning of Microsoft Visual Studio. So you may move the line into
Microsoft C suggests to use |
Okay, but one person put it "using _CRT_SECURE_NO_WARNINGS is the equivalent of turning off the alarm for both "you left your bedroom door unlocked" and "you left a nuclear bomb in the kitchen". " Can we limit usage of the #define? Or can we call fopen_s and write our own fopen_s for Linux? I did that with _filelengthi64(). |
The
There are pages about Security Features in the CRT and Security-enhanced versions of CRT functions. There are many warned functions. No need to make Linux version for all those secure functions. At this time, I use |
I created a pull request with my latest changes. |
I found a possible problem of |
Par3's specification is in near-final form. I created this issue to discuss any changes that need to be made, as we work on a reference implementation of Par3.
The earlier discussion about the Par3 specification is in this issue on the par2cmdline repo on GitHub:
Parchive/par2cmdline#130
The current draft of the specification is available at:
https://parchive.github.io/doc/Parity_Volume_Set_Specification_v3.0.html
The text was updated successfully, but these errors were encountered: