-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review HuffLZ methods GetNumExtraBits
and GetRepeatOffset
#134
Comments
I'm spending some time refactoring HuffLZ. It may not lead to a solution to this problem, but wanted to flag my work to reduce duplicated efforts in HuffLZ. |
I was thinking about this today. I was tempted to close it as not a priority, but it managed to grab my curiosity. If we go with a data structure approach, we need a way to map a point to a range, or rather, values associated with a range. Important details here are the intervals are non-overlapping, and adjacent, and there is a default case that covers all possible values. That is, no point belongs to more than one range, and no point falls between or around ranges. That is, every point belongs to exactly one range. There are various algorithms that map points to ranges, such as Interval Trees. Specifically, it mentions in the naive case, where intervals do not overlap, a binary search tree will do. As our data is static, we could implement a binary search of a sorted table. However, I think a linear search would actually be better here. In particular, we want lookup to be fast. Normally, a binary search would have better worst case lookup time. Here however, we are dealing with a decompression algorithm, where input data is not evenly distributed. The codes themselves are variable length to account for this, in that shorter codes are assigned to more likely patterns. As such, we want lookup of shorter codes to be fastest, as they will be most common. Hence, a linear search is most suited here, since it will have a lower amortized lookup cost over the set of compression codes. The current cascaded In terms of the data needed, lookup is done by the range end value, though the start value is also used in calculations. There is also the Seems like a static array of structs, along with a method to do a linear search over the data table with a |
I went down a bit of a rabbit hole about returning multiple values from functions in C++. It was educational though. I think I like the In terms of this issue, the code to range lookup needs to find and return multiple values. That might either be the raw parameters, stored in a static const table, or it might be the result of the
As a side note, after looking at the code a bit more, I realized the "mod" value was actually the upper bits of the offset value. The code translates some variable portion of the high bits, depending on the ranges, into an expanded set of high bits. Extra low order bits need to be read in to accommodate the increased range of the longer bit strings. In terms of implementation, a transition implementation for the new method might be: HuffLZ::OffsetModifiers HuffLZ::GetOffsetModifiers(unsigned int offset) {
auto extraBitCount = GetNumExtraBits(offset);
auto offsetUpperBits = GetOffsetBitMod(offset);
return OffsetModifiers { extraBitCount, offsetUpperBits };
} |
I might be okay with condensing to:
-Brett |
True. At any rate, it's likely to be a transitional point in the code. I think we should get rid of the separate Get functions, so the value lookup can be done once. I figure having a transitional point using the old code through a new interface allows for better tracking in case a bug is introduced. Changes will be needed in both the calling function, and in the called code. The transition point allows the two to be done separately without breaking the code in the process. That can potentially aid testing and debugging. |
I was watching a video about how testability relates to good design. I think I agreed with it. One point was about how you should try to test your components through their public interface. In particular, avoid testing private methods. If you feel the need to test private methods, your class is probably too big, and doing too many things, and you should perhaps extract a component that will then have a testable public interface. The Google Test documentation makes a similar claim in their documentation. Black box testing is about testing components through their public interface. If you feel the need to test private methods, perhaps there is a better design. In regards to this issue, the |
The
GetNumExtraBits
andGetRepeatOffset
methods contain cascadedif
/else
blocks. This is slightly undesirable as it embeds data into the code in a hard to read way. Perhaps it can be rewritten as a data table lookup.Some attention should be paid to performance, as this is an inner part of the decompression algorithm. It is however a branch of the inner part, which might not execute too frequently, so performance might not be a critical concern in practice.
Closing this issue without changes is also a valid option. The code as written is correct, and does not suffer from known performance issues. The issue is merely one of code style.
The text was updated successfully, but these errors were encountered: