-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor huff lz #147
Refactor huff lz #147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Using the constructor initializer lists opens up the possibility of embedding objects, rather than pointers to them. That could simplify memory management a little.
src/Archives/HuffLZ.cpp
Outdated
HuffLZ::HuffLZ(BitStream *bitStream) : | ||
m_BitStream(bitStream), | ||
m_ConstructedBitStream(0), // Don't need to delete stream in destructor | ||
m_HuffTree(new AdaptHuffTree(314)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably get rid of the new
here, and update the datatype to not be a pointer. I likely used new
because at the time, I didn't know how to use constructor initializer lists, and so couldn't figure out how to initialize the AdaptHuffTree in place.
Hmm, maybe should rename the type name to avoid abbreviations. Suggestion: AdaptiveHuffmanTree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both accounts sound good. I'll update before merging. I didn't think about the fact that we could remove the pointers.
src/Archives/HuffLZ.cpp
Outdated
// Initialize the decompress buffer to spaces | ||
memset(m_DecompressBuffer, ' ', 4096); | ||
} | ||
|
||
// Creates an internal bit stream for the buffer | ||
HuffLZ::HuffLZ(std::size_t bufferSize, void *buffer) | ||
HuffLZ::HuffLZ(std::size_t bufferSize, void *buffer) : | ||
m_BitStream(new BitStream(bufferSize, buffer)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concept here. Could have constructed the BitStream
object in place using the initializer list.
AdaptHuffTree rename is complete. Still need to fix class variable pointers in HuffLZ. Should have time tomorrow for this. |
- Remove destructor
What do you think of the following three lines:
I think this is the proper way to handle creating the HuffLZ class around an existing bitstream, but wanted to make sure it looked good to you. I have compiled the code, but have not used it to actually read a HuffLZ compressed volume yet. I will do that before merging. Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just unlocked more potential for simplification.
src/Archives/HuffLZ.cpp
Outdated
m_BitStreamReader(bitStream), | ||
m_ConstructedBitStreamReader(0), // Don't need to delete stream in destructor | ||
m_AdaptiveHuffmanTree(new AdaptiveHuffmanTree(314)), | ||
m_ConstructedBitStreamReader(), // Don't need to delete stream in destructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the empty parenthesis a bit odd here. I'm wondering if we should be more explicit about this. Passing a nullptr
would be appropriate for the initialization value.
src/Archives/HuffLZ.h
Outdated
BitStreamReader *m_ConstructedBitStreamReader; | ||
AdaptiveHuffmanTree *m_AdaptiveHuffmanTree; | ||
BitStreamReader m_BitStreamReader; | ||
BitStreamReader m_ConstructedBitStreamReader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see you've changed the type of this field as well. Hmm, this field actually makes much less sense now. We should probably eliminate it. Previously the combination of m_BitStreamReader
and m_ConstructedBitStreamReader
together functioned somewhat like a shared_ptr
. It would either reference an existing object which it did not own (and hence would not free), or it would allocated the object itself and thus be responsible for the lifetime and cleanup of that object.
By using containment, we are implying ownership. There is no question about it. If we are passed an object, it must be copied before it can be used. That is, for the constructor that takes the BitStreamReader
object, the object must be copied rather than simply storing a reference to it.
In hindsight, I probably wrote the code as it was thinking there was a bigger performance gain by avoiding the copy. I had probably erroneously assumed the entire data buffer would be copied. In actuality though, it's a shallow copy, in that only the 4 fields defined in BitStreamReader
are copied, such as the buffer pointer. It doesn't need to copy the actual data buffer itself. The copy operation is actually fairly cheap, and not frequently occurring. Performance implications should be fairly minimal. The copy operation is slightly more expensive this way, though there is less indirection when accessing the object, so some slight efficiency gains there. This might even be a net gain in terms of performance, and a big gain in terms of code readability.
My advice is to drop the m_ConstructedBitStreamReader
field now that the referenced objects are contained in the host class.
As a side note, it would be reasonable to eliminate the alternate constructor that takes a pointer and size, and instead only take a constructed BitStreamReader
. Such an object is cheap to construct and copy. The parameter could be const
, since a (non-const) copy is made. That could allow a temporarily constructed object to be passed to the constructor. Eliminating the alternate constructor that takes individual pointer and size values cuts down on code, and cuts down on the number of ways the library could be used to accomplish the same task. That might actually be a usability gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side side note: The Stream code uses the parameter order (void* buffer, std::size_t size)
, but the HuffLZ object reverses it to (std::size_t size, void* buffer)
. We should probably try to be consistent about the ordering.
Recent changes remove the need for this field. Previous design was of questionable value as well.
Ok, I removed the field in question. I did not remove any of the constructors, nor address the order of constructor parameters. After taking a quick look, I think those are separate issues that should be addressed in their own PR. In particular, any changes there would relate to the compressed streams design, which we haven't fully fleshed out yet. Let's close this off meanwhile, and I'll add a separate issue with my notes on the other areas. |
It looks like something we did has damaged the Huffman LZ decompression algorithm. The program executes without an exception, but the results appear partially mangled. It looks worse then just the file endings that we discussed earlier. I'll post the results in the forum. -Brett |
I did some more troubleshooting. Both x64 and x86 produce the same garbled results. Unpacking a version of sheets.vol that is not compressed produces proper results. I switched back to the master branch of OP2Utility, but still had the same garbled results. So it looks like we broke this somewhere before this branch. I reverted OP2Utility back to 7/28/2018 "Fix warnings produced when writing strings to map files and unknown tileGroup variable" After modifying OP2Archive to work with this version of OP2Utility, I was able to extract a proper version of sheets.vol. It looks like the error was added sometime between 7/28/2018 and before the current branch. It may bear fruit to check for changes in both how VolFile calls the decompress code and any changes to the decompress code between then and now. Brett |
Nice work discovering that bug, and determining it was pre-existing. Sounds a bit like you were doing a manual Given the error is pre-existing, do we go ahead and merge the current change set, or hold until the error is fixed. It seems like the proper thing to do in a large widely used repository, would be to fix the error first, before committing new code. However, for a small library, it may make sense to just keep moving forward, and fix things along the way. At the very least, there would be fewer merge conflicts that way. I'll look into adding some additional unit tests. Edit: Forgot to mention, the repeated runs of previously existing text seems to imply a problem with the LZ part of the HuffLZ class. There is a buffer of previously output bytes which is consulted to output repeated runs of text. |
We can make do with a single constructor. The BitStreamReader object can be constructed as an inline temporary while constructing the HuffLZ object.
This makes the parameter order consistent with the Stream API.
This makes the parameter order consistent with the Stream API.
This makes the parameter order consistent with the Stream API.
If an empty bit stream is desired, it should be constructed explicitly with `BitStreamReader(nullptr, 0)`.
After looking at this with fresh eyes, I realized the additional changes I had proposed in Issue #150 are actually quite simple, so I went ahead and did them. |
Ok, I went and wrote a quick and dirty decompressor unit test, and then ran git biset with the updated makefile with unit test support. It came back with the first bad commit being fb8e8cb. Somehow I expected it to be that commit. It stood out to me when I ran a git blame earlier. Most of the changes in there look trivial, except for the extraction of the data/code into the I'll investigate further. |
All the changes look good. Thanks for taking the time to make them. After merging the master into this branch, I tested by decompressing sheets.vol again. Everything checked out fine. -Brett |
I was looking at issue #134. My knowledge of what the code is accomplishing is insufficient to correct the issue. However, I did fix a couple of typos and the construction process in the HuffLZ which I thought were worth committing.
I forgot to update the master branch with merged changes which created a merge conflict with the new name for BitStream. I was happy that GitHub chose to ignore the changes in the merge commit so they are not confusing the actual changes to the code base.