Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor huff lz #147

Merged
merged 17 commits into from
Aug 22, 2018
Merged

Refactor huff lz #147

merged 17 commits into from
Aug 22, 2018

Conversation

Brett208
Copy link
Member

I was looking at issue #134. My knowledge of what the code is accomplishing is insufficient to correct the issue. However, I did fix a couple of typos and the construction process in the HuffLZ which I thought were worth committing.

I forgot to update the master branch with merged changes which created a merge conflict with the new name for BitStream. I was happy that GitHub chose to ignore the changes in the merge commit so they are not confusing the actual changes to the code base.

Copy link
Member

@DanRStevens DanRStevens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Using the constructor initializer lists opens up the possibility of embedding objects, rather than pointers to them. That could simplify memory management a little.

HuffLZ::HuffLZ(BitStream *bitStream) :
m_BitStream(bitStream),
m_ConstructedBitStream(0), // Don't need to delete stream in destructor
m_HuffTree(new AdaptHuffTree(314)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably get rid of the new here, and update the datatype to not be a pointer. I likely used new because at the time, I didn't know how to use constructor initializer lists, and so couldn't figure out how to initialize the AdaptHuffTree in place.

Hmm, maybe should rename the type name to avoid abbreviations. Suggestion: AdaptiveHuffmanTree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both accounts sound good. I'll update before merging. I didn't think about the fact that we could remove the pointers.

// Initialize the decompress buffer to spaces
memset(m_DecompressBuffer, ' ', 4096);
}

// Creates an internal bit stream for the buffer
HuffLZ::HuffLZ(std::size_t bufferSize, void *buffer)
HuffLZ::HuffLZ(std::size_t bufferSize, void *buffer) :
m_BitStream(new BitStream(bufferSize, buffer)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concept here. Could have constructed the BitStream object in place using the initializer list.

@Brett208
Copy link
Member Author

AdaptHuffTree rename is complete. Still need to fix class variable pointers in HuffLZ. Should have time tomorrow for this.

@Brett208
Copy link
Member Author

@DanRStevens,

What do you think of the following three lines:

HuffLZ::HuffLZ(BitStreamReader& bitStream) :
	m_BitStreamReader(bitStream),
	m_ConstructedBitStreamReader(), // Don't need to delete stream in destructor 

I think this is the proper way to handle creating the HuffLZ class around an existing bitstream, but wanted to make sure it looked good to you.

I have compiled the code, but have not used it to actually read a HuffLZ compressed volume yet. I will do that before merging.

Thanks,
Brett

Copy link
Member

@DanRStevens DanRStevens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just unlocked more potential for simplification.

m_BitStreamReader(bitStream),
m_ConstructedBitStreamReader(0), // Don't need to delete stream in destructor
m_AdaptiveHuffmanTree(new AdaptiveHuffmanTree(314)),
m_ConstructedBitStreamReader(), // Don't need to delete stream in destructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the empty parenthesis a bit odd here. I'm wondering if we should be more explicit about this. Passing a nullptr would be appropriate for the initialization value.

BitStreamReader *m_ConstructedBitStreamReader;
AdaptiveHuffmanTree *m_AdaptiveHuffmanTree;
BitStreamReader m_BitStreamReader;
BitStreamReader m_ConstructedBitStreamReader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see you've changed the type of this field as well. Hmm, this field actually makes much less sense now. We should probably eliminate it. Previously the combination of m_BitStreamReader and m_ConstructedBitStreamReader together functioned somewhat like a shared_ptr. It would either reference an existing object which it did not own (and hence would not free), or it would allocated the object itself and thus be responsible for the lifetime and cleanup of that object.

By using containment, we are implying ownership. There is no question about it. If we are passed an object, it must be copied before it can be used. That is, for the constructor that takes the BitStreamReader object, the object must be copied rather than simply storing a reference to it.

In hindsight, I probably wrote the code as it was thinking there was a bigger performance gain by avoiding the copy. I had probably erroneously assumed the entire data buffer would be copied. In actuality though, it's a shallow copy, in that only the 4 fields defined in BitStreamReader are copied, such as the buffer pointer. It doesn't need to copy the actual data buffer itself. The copy operation is actually fairly cheap, and not frequently occurring. Performance implications should be fairly minimal. The copy operation is slightly more expensive this way, though there is less indirection when accessing the object, so some slight efficiency gains there. This might even be a net gain in terms of performance, and a big gain in terms of code readability.

My advice is to drop the m_ConstructedBitStreamReader field now that the referenced objects are contained in the host class.

As a side note, it would be reasonable to eliminate the alternate constructor that takes a pointer and size, and instead only take a constructed BitStreamReader. Such an object is cheap to construct and copy. The parameter could be const, since a (non-const) copy is made. That could allow a temporarily constructed object to be passed to the constructor. Eliminating the alternate constructor that takes individual pointer and size values cuts down on code, and cuts down on the number of ways the library could be used to accomplish the same task. That might actually be a usability gain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side side note: The Stream code uses the parameter order (void* buffer, std::size_t size), but the HuffLZ object reverses it to (std::size_t size, void* buffer). We should probably try to be consistent about the ordering.

Recent changes remove the need for this field. Previous design was of
questionable value as well.
@DanRStevens
Copy link
Member

Ok, I removed the field in question.

I did not remove any of the constructors, nor address the order of constructor parameters. After taking a quick look, I think those are separate issues that should be addressed in their own PR. In particular, any changes there would relate to the compressed streams design, which we haven't fully fleshed out yet. Let's close this off meanwhile, and I'll add a separate issue with my notes on the other areas.

@Brett208
Copy link
Member Author

It looks like something we did has damaged the Huffman LZ decompression algorithm. The program executes without an exception, but the results appear partially mangled. It looks worse then just the file endings that we discussed earlier. I'll post the results in the forum.

-Brett

@Brett208
Copy link
Member Author

I did some more troubleshooting. Both x64 and x86 produce the same garbled results. Unpacking a version of sheets.vol that is not compressed produces proper results.

I switched back to the master branch of OP2Utility, but still had the same garbled results. So it looks like we broke this somewhere before this branch.

I reverted OP2Utility back to 7/28/2018 "Fix warnings produced when writing strings to map files and unknown tileGroup variable" After modifying OP2Archive to work with this version of OP2Utility, I was able to extract a proper version of sheets.vol. It looks like the error was added sometime between 7/28/2018 and before the current branch.

It may bear fruit to check for changes in both how VolFile calls the decompress code and any changes to the decompress code between then and now.

Brett

@DanRStevens
Copy link
Member

DanRStevens commented Aug 22, 2018

Nice work discovering that bug, and determining it was pre-existing. Sounds a bit like you were doing a manual git biset to narrow in on a range of commits where the error was introduced. This seems like additional justification to have more unit tests.

Given the error is pre-existing, do we go ahead and merge the current change set, or hold until the error is fixed. It seems like the proper thing to do in a large widely used repository, would be to fix the error first, before committing new code. However, for a small library, it may make sense to just keep moving forward, and fix things along the way. At the very least, there would be fewer merge conflicts that way.

I'll look into adding some additional unit tests.

Edit: Forgot to mention, the repeated runs of previously existing text seems to imply a problem with the LZ part of the HuffLZ class. There is a buffer of previously output bytes which is consulted to output repeated runs of text.

We can make do with a single constructor. The BitStreamReader object can
be constructed as an inline temporary while constructing the HuffLZ
object.
This makes the parameter order consistent with the Stream API.
This makes the parameter order consistent with the Stream API.
This makes the parameter order consistent with the Stream API.
If an empty bit stream is desired, it should be constructed explicitly
with `BitStreamReader(nullptr, 0)`.
@DanRStevens
Copy link
Member

After looking at this with fresh eyes, I realized the additional changes I had proposed in Issue #150 are actually quite simple, so I went ahead and did them.

@DanRStevens
Copy link
Member

DanRStevens commented Aug 22, 2018

Ok, I went and wrote a quick and dirty decompressor unit test, and then ran git biset with the updated makefile with unit test support. It came back with the first bad commit being fb8e8cb. Somehow I expected it to be that commit. It stood out to me when I ran a git blame earlier. Most of the changes in there look trivial, except for the extraction of the data/code into the GetNumExtraBits and GetOffsetBitMod methods.

I'll investigate further.
Edit: Found the cause. Fix pushed as PR #151.

@Brett208
Copy link
Member Author

All the changes look good. Thanks for taking the time to make them.

After merging the master into this branch, I tested by decompressing sheets.vol again. Everything checked out fine.

-Brett

@Brett208 Brett208 merged commit 7da12e6 into master Aug 22, 2018
@Brett208 Brett208 deleted the Refactor-HuffLZ branch August 22, 2018 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants