Skip to content

Commit

Permalink
Improve performance of padding removal when parsing
Browse files Browse the repository at this point in the history
The current algorithm to remove right padding of left justified strings
first reverses the String, removes leading pad characters using
dropWhile, and then reverses the result. The two reverses are linear in
the length of the String, and requires allocating multiple String
instances and copying characters from one to the other. And this is done
regardless of how many, if any, pad chars exist in the String. This
logic is very clear, but is fairly inefficient, enough to show up while
profiling.

To improve performance, this rewrites the algorithm to scan through the
String in reverse to find the index of the last pad character and then
uses the substring() function to create a new String with those pad
characters removed. This is now linear in the number of pad characters
in a String instead of the full length of the string. Additionally, the
use of substring() avoids character copies, since it just allocates a
new String using the same underlying String value but with different
indices.

I have not looked into detail how scala implements dropWhile() for
Strings (skimming the code, it looks like it will allocate a new String
and copy characters), but for consistency and maximum performance, this
also updates the algorithm that removes left padding of right justified
strings to use similar logic as the new right padding algorithm. By
using substring() we should avoid possible copies.

In one test with lots of left justified strings, many of which are
padded, this saw about a 15% improvement in parse times (excluding
infoset creating using the null infoset outputter), and padding removal
no longer shows up while profiling.

DAFFODIL-2868
  • Loading branch information
stevedlawrence committed Jan 8, 2024
1 parent 20d6ea3 commit 10cdb9e
Showing 1 changed file with 26 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,32 @@ trait PaddingRuntimeMixin {
protected def justificationTrim: TextJustificationType.Type
protected def parsingPadChar: MaybeChar

private def removeRightPadding(str: String): String =
if (parsingPadChar.isEmpty) str
else str.reverse.dropWhile(c => c == parsingPadChar.get).reverse
private def removeLeftPadding(str: String): String =
if (parsingPadChar.isEmpty) str else str.dropWhile(c => c == parsingPadChar.get)
private def removeRightPadding(str: String): String = {
if (parsingPadChar.isEmpty) {
str

Check warning on line 29 in daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala

View check run for this annotation

Codecov / codecov/patch

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala#L29

Added line #L29 was not covered by tests
} else {
val padChar = parsingPadChar.get
var index = str.length - 1
while (index >= 0 && str(index) == padChar) {
index -= 1
}
str.substring(0, index + 1)
}
}

private def removeLeftPadding(str: String): String = {
if (parsingPadChar.isEmpty) {
str

Check warning on line 42 in daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala

View check run for this annotation

Codecov / codecov/patch

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala#L42

Added line #L42 was not covered by tests
} else {
val padChar = parsingPadChar.get
var index = 0
while (index < str.length && str(index) == padChar) {
index += 1
}
str.substring(index)
}
}

private def removePadding(str: String): String = removeRightPadding(removeLeftPadding(str))

def trimByJustification(str: String): String = {
Expand Down

0 comments on commit 10cdb9e

Please sign in to comment.