Refactor huff lz #147

Brett208 · 2018-08-19T02:33:47Z

I was looking at issue #134. My knowledge of what the code is accomplishing is insufficient to correct the issue. However, I did fix a couple of typos and the construction process in the HuffLZ which I thought were worth committing.

I forgot to update the master branch with merged changes which created a merge conflict with the new name for BitStream. I was happy that GitHub chose to ignore the changes in the merge commit so they are not confusing the actual changes to the code base.

DanRStevens

Looks good.

Using the constructor initializer lists opens up the possibility of embedding objects, rather than pointers to them. That could simplify memory management a little.

DanRStevens · 2018-08-19T15:57:59Z

src/Archives/HuffLZ.cpp

+	HuffLZ::HuffLZ(BitStream *bitStream) :
+		m_BitStream(bitStream),
+		m_ConstructedBitStream(0), // Don't need to delete stream in destructor
+		m_HuffTree(new AdaptHuffTree(314)), 


We can probably get rid of the new here, and update the datatype to not be a pointer. I likely used new because at the time, I didn't know how to use constructor initializer lists, and so couldn't figure out how to initialize the AdaptHuffTree in place.

Hmm, maybe should rename the type name to avoid abbreviations. Suggestion: AdaptiveHuffmanTree

Both accounts sound good. I'll update before merging. I didn't think about the fact that we could remove the pointers.

DanRStevens · 2018-08-19T15:58:41Z

src/Archives/HuffLZ.cpp

 		// Initialize the decompress buffer to spaces
 		memset(m_DecompressBuffer, ' ', 4096);
 	}

 	// Creates an internal bit stream for the buffer
-	HuffLZ::HuffLZ(std::size_t bufferSize, void *buffer)
+	HuffLZ::HuffLZ(std::size_t bufferSize, void *buffer) :
+		m_BitStream(new BitStream(bufferSize, buffer)),


Same concept here. Could have constructed the BitStream object in place using the initializer list.

Brett208 · 2018-08-21T03:07:33Z

AdaptHuffTree rename is complete. Still need to fix class variable pointers in HuffLZ. Should have time tomorrow for this.

- Remove destructor

Brett208 · 2018-08-21T14:08:26Z

@DanRStevens,

What do you think of the following three lines:

HuffLZ::HuffLZ(BitStreamReader& bitStream) :
	m_BitStreamReader(bitStream),
	m_ConstructedBitStreamReader(), // Don't need to delete stream in destructor

I think this is the proper way to handle creating the HuffLZ class around an existing bitstream, but wanted to make sure it looked good to you.

I have compiled the code, but have not used it to actually read a HuffLZ compressed volume yet. I will do that before merging.

Thanks,
Brett

DanRStevens

I think you just unlocked more potential for simplification.

DanRStevens · 2018-08-21T15:02:03Z

src/Archives/HuffLZ.cpp

 		m_BitStreamReader(bitStream),
-		m_ConstructedBitStreamReader(0), // Don't need to delete stream in destructor
-		m_AdaptiveHuffmanTree(new AdaptiveHuffmanTree(314)), 
+		m_ConstructedBitStreamReader(), // Don't need to delete stream in destructor


I find the empty parenthesis a bit odd here. I'm wondering if we should be more explicit about this. Passing a nullptr would be appropriate for the initialization value.

DanRStevens · 2018-08-21T15:26:56Z

src/Archives/HuffLZ.h

-		BitStreamReader *m_ConstructedBitStreamReader;
-		AdaptiveHuffmanTree *m_AdaptiveHuffmanTree;
+		BitStreamReader m_BitStreamReader;
+		BitStreamReader m_ConstructedBitStreamReader;


Ahh, I see you've changed the type of this field as well. Hmm, this field actually makes much less sense now. We should probably eliminate it. Previously the combination of m_BitStreamReader and m_ConstructedBitStreamReader together functioned somewhat like a shared_ptr. It would either reference an existing object which it did not own (and hence would not free), or it would allocated the object itself and thus be responsible for the lifetime and cleanup of that object.

By using containment, we are implying ownership. There is no question about it. If we are passed an object, it must be copied before it can be used. That is, for the constructor that takes the BitStreamReader object, the object must be copied rather than simply storing a reference to it.

In hindsight, I probably wrote the code as it was thinking there was a bigger performance gain by avoiding the copy. I had probably erroneously assumed the entire data buffer would be copied. In actuality though, it's a shallow copy, in that only the 4 fields defined in BitStreamReader are copied, such as the buffer pointer. It doesn't need to copy the actual data buffer itself. The copy operation is actually fairly cheap, and not frequently occurring. Performance implications should be fairly minimal. The copy operation is slightly more expensive this way, though there is less indirection when accessing the object, so some slight efficiency gains there. This might even be a net gain in terms of performance, and a big gain in terms of code readability.

My advice is to drop the m_ConstructedBitStreamReader field now that the referenced objects are contained in the host class.

As a side note, it would be reasonable to eliminate the alternate constructor that takes a pointer and size, and instead only take a constructed BitStreamReader. Such an object is cheap to construct and copy. The parameter could be const, since a (non-const) copy is made. That could allow a temporarily constructed object to be passed to the constructor. Eliminating the alternate constructor that takes individual pointer and size values cuts down on code, and cuts down on the number of ways the library could be used to accomplish the same task. That might actually be a usability gain.

Side side note: The Stream code uses the parameter order (void* buffer, std::size_t size), but the HuffLZ object reverses it to (std::size_t size, void* buffer). We should probably try to be consistent about the ordering.

Recent changes remove the need for this field. Previous design was of questionable value as well.

DanRStevens · 2018-08-21T16:19:43Z

Ok, I removed the field in question.

I did not remove any of the constructors, nor address the order of constructor parameters. After taking a quick look, I think those are separate issues that should be addressed in their own PR. In particular, any changes there would relate to the compressed streams design, which we haven't fully fleshed out yet. Let's close this off meanwhile, and I'll add a separate issue with my notes on the other areas.

Brett208 · 2018-08-21T21:35:50Z

It looks like something we did has damaged the Huffman LZ decompression algorithm. The program executes without an exception, but the results appear partially mangled. It looks worse then just the file endings that we discussed earlier. I'll post the results in the forum.

-Brett

Brett208 · 2018-08-22T02:46:10Z

I did some more troubleshooting. Both x64 and x86 produce the same garbled results. Unpacking a version of sheets.vol that is not compressed produces proper results.

I switched back to the master branch of OP2Utility, but still had the same garbled results. So it looks like we broke this somewhere before this branch.

I reverted OP2Utility back to 7/28/2018 "Fix warnings produced when writing strings to map files and unknown tileGroup variable" After modifying OP2Archive to work with this version of OP2Utility, I was able to extract a proper version of sheets.vol. It looks like the error was added sometime between 7/28/2018 and before the current branch.

It may bear fruit to check for changes in both how VolFile calls the decompress code and any changes to the decompress code between then and now.

Brett

DanRStevens · 2018-08-22T04:41:37Z

Nice work discovering that bug, and determining it was pre-existing. Sounds a bit like you were doing a manual git biset to narrow in on a range of commits where the error was introduced. This seems like additional justification to have more unit tests.

Given the error is pre-existing, do we go ahead and merge the current change set, or hold until the error is fixed. It seems like the proper thing to do in a large widely used repository, would be to fix the error first, before committing new code. However, for a small library, it may make sense to just keep moving forward, and fix things along the way. At the very least, there would be fewer merge conflicts that way.

I'll look into adding some additional unit tests.

Edit: Forgot to mention, the repeated runs of previously existing text seems to imply a problem with the LZ part of the HuffLZ class. There is a buffer of previously output bytes which is consulted to output repeated runs of text.

We can make do with a single constructor. The BitStreamReader object can be constructed as an inline temporary while constructing the HuffLZ object.

This makes the parameter order consistent with the Stream API.

If an empty bit stream is desired, it should be constructed explicitly with `BitStreamReader(nullptr, 0)`.

DanRStevens · 2018-08-22T10:18:04Z

After looking at this with fresh eyes, I realized the additional changes I had proposed in Issue #150 are actually quite simple, so I went ahead and did them.

DanRStevens · 2018-08-22T11:23:31Z

Ok, I went and wrote a quick and dirty decompressor unit test, and then ran git biset with the updated makefile with unit test support. It came back with the first bad commit being fb8e8cb. Somehow I expected it to be that commit. It stood out to me when I ran a git blame earlier. Most of the changes in there look trivial, except for the extraction of the data/code into the GetNumExtraBits and GetOffsetBitMod methods.

I'll investigate further.
Edit: Found the cause. Fix pushed as PR #151.

Brett208 · 2018-08-22T21:30:03Z

All the changes look good. Thanks for taking the time to make them.

After merging the master into this branch, I tested by decompressing sheets.vol again. Everything checked out fine.

-Brett

Brett208 added 5 commits August 18, 2018 20:26

Fix typo in HuffLZ member variable name

e37647c

Use constructor initialize list in HuffLZ

2f302b0

Reduce code duplication in HuffLZ constructors

d733292

Fix typo in HuffLZ constructor comment

db5c02b

Merge branch 'master' into Refactor-HuffLZ

77530cf

Brett208 requested a review from DanRStevens August 19, 2018 02:33

DanRStevens approved these changes Aug 19, 2018

View reviewed changes

Brett208 added 2 commits August 20, 2018 22:59

Remove Abbreviations from AdaptiveHuffmanTree class name

1fa37c9

Rename ApativeHuffmanTree sourcecode filenames

6391446

Remove pointers as class variables within HuffLZ

9560e80

- Remove destructor

DanRStevens reviewed Aug 21, 2018

View reviewed changes

DanRStevens added 3 commits August 21, 2018 22:43

Fix whitespace

3bef8be

Remove m_ConstructedBitStreamReader field

c7e73e8

Recent changes remove the need for this field. Previous design was of questionable value as well.

Merge branch 'master' into Refactor-HuffLZ

a3070b2

DanRStevens added 5 commits August 22, 2018 16:38

Remove extra constructor for HuffLZ class

be3f60e

We can make do with a single constructor. The BitStreamReader object can be constructed as an inline temporary while constructing the HuffLZ object.

Re-order BitStreamReader constructor parameters

f1e5982

This makes the parameter order consistent with the Stream API.

Re-order HuffLZ::GetData parameters

719366a

This makes the parameter order consistent with the Stream API.

Re-order HuffLZ::CopyAvailableData parameters

18331e4

This makes the parameter order consistent with the Stream API.

Remove BitStreamReader empty constructor

269de3b

If an empty bit stream is desired, it should be constructed explicitly with `BitStreamReader(nullptr, 0)`.

DanRStevens mentioned this pull request Aug 22, 2018

Fix decompression bug #151

Merged

Merge branch 'master' into Refactor-HuffLZ

45fee53

Brett208 merged commit 7da12e6 into master Aug 22, 2018

Brett208 deleted the Refactor-HuffLZ branch August 22, 2018 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor huff lz #147

Refactor huff lz #147

Brett208 commented Aug 19, 2018

DanRStevens left a comment

DanRStevens Aug 19, 2018

Brett208 Aug 19, 2018

DanRStevens Aug 19, 2018

Brett208 commented Aug 21, 2018

Brett208 commented Aug 21, 2018

DanRStevens left a comment

DanRStevens Aug 21, 2018

DanRStevens Aug 21, 2018

DanRStevens Aug 21, 2018

DanRStevens commented Aug 21, 2018

Brett208 commented Aug 21, 2018

Brett208 commented Aug 22, 2018

DanRStevens commented Aug 22, 2018 •

edited

Loading

DanRStevens commented Aug 22, 2018

DanRStevens commented Aug 22, 2018 •

edited

Loading

Brett208 commented Aug 22, 2018

Refactor huff lz #147

Refactor huff lz #147

Conversation

Brett208 commented Aug 19, 2018

DanRStevens left a comment

Choose a reason for hiding this comment

DanRStevens Aug 19, 2018

Choose a reason for hiding this comment

Brett208 Aug 19, 2018

Choose a reason for hiding this comment

DanRStevens Aug 19, 2018

Choose a reason for hiding this comment

Brett208 commented Aug 21, 2018

Brett208 commented Aug 21, 2018

DanRStevens left a comment

Choose a reason for hiding this comment

DanRStevens Aug 21, 2018

Choose a reason for hiding this comment

DanRStevens Aug 21, 2018

Choose a reason for hiding this comment

DanRStevens Aug 21, 2018

Choose a reason for hiding this comment

DanRStevens commented Aug 21, 2018

Brett208 commented Aug 21, 2018

Brett208 commented Aug 22, 2018

DanRStevens commented Aug 22, 2018 • edited Loading

DanRStevens commented Aug 22, 2018

DanRStevens commented Aug 22, 2018 • edited Loading

Brett208 commented Aug 22, 2018

DanRStevens commented Aug 22, 2018 •

edited

Loading

DanRStevens commented Aug 22, 2018 •

edited

Loading