Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String method emulation for Java 11, align Character.isWhitespace with Java 11 #9975

Merged
merged 4 commits into from
Jul 24, 2024

Conversation

zbynek
Copy link
Contributor

@zbynek zbynek commented Jul 6, 2024

Adds emulation for java.lang.String methods added in Java 11, partially completing #9831:

  • isBlank
  • lines
  • repeat
  • strip
  • stripLeading
  • stripTrailing

Fixes #9973
Partial #9831

user/super/com/google/gwt/emul/java/lang/String.java Outdated Show resolved Hide resolved
assertFalse("\u00a0".isBlank());
}

public void testStrip() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Esp given #9973, would you mind also checking that the trim() tests correctly leave these non-ascii whitespace chars intact?

Also, from the same issue, should we have tests for \ufeff and \u180e to ensure they are preserved? Granted they would fail at this time, but might be good to have them while touching relevant tests (and comment them out, link them to the new issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more testcases, then I stumbled upon


  /**
   * Helper method for testTrim to avoid compiler optimizations.
   *
   * TODO: insufficient, compiler now inlines.
   */
  public void trimRightAssertEquals(String left, String right) {
    assertEquals(left, right.trim());
  }

Is there a way to check the compiler output to see if the test is actually testing the emulation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, here's what I tend to do:

  • add -setProperty compiler.useSourceMaps=true to test.args, this disables the stack trace emulation that makes the output hard to read. Note that this may slightly break your stack traces, the line numbers won't quite be accurate (but close)
  • add -style PRETTY also to test.args, this makes the output more or less readable
  • add some apparent nondeterminism, or go through JS to your code to make the compiler unable to optimize - for example, an always-true condition like Math.random() < 5 in an if or ternary, or a JSNI unchecked cast wrapping a value

From the first two, when I look at the JS file in build/out/user/test/web-htmlunit/www/com.google.gwt.emultest.EmulSuite.JUnit/ I see

_.testStrip = function testStrip(){
  assertEquals_1('', '');
  assertEquals_1('', '');
  assertEquals_1('x', 'x');
  assertEquals_1('x', 'x');
  assertEquals_1('\xA0x\xA0', '\xA0x\xA0');
  assertEquals_1('\uFFEFx\u180E', '\uFFEFx\u180E');
}

Clearly that's no good - testRepeat at least is the same:

_.testRepeat = function testRepeat(){
  var ex;
  assertEquals_1('', '');
  assertEquals_1('foo', 'foo');
  assertEquals_1('foofoofoo', 'foofoofoo');
  try {
    checkCriticalArgument(false, 'count is negative: -1');
    'foo'.repeat(-1);
    throw toJs(new Error_0('Should fail with negative arg'));
  }
  // ...

Note however that the draft tests do better, since the compiler doesnt optimize as heavily:

_.stripRightAsssertEquals = function stripRightAsssertEquals(expected, arg){
  jf.assertEquals_1(expected, jl.strip__Ljava_lang_String___devirtual$(arg));
}
;
//...
_.testStrip = function testStrip(){
  this.stripRightAsssertEquals('', '');
  this.stripRightAsssertEquals('', '  ');
  this.stripRightAsssertEquals('x', ' x ');
  this.stripRightAsssertEquals('x', '\x1Cx\x1C');
  this.stripRightAsssertEquals('\xA0x\xA0', '\xA0x\xA0 ');
  this.stripRightAsssertEquals('\uFFEFx\u180E', '\uFFEFx\u180E ');
}
;
_.testRepeat = function testRepeat(){
  var ex, noFoo;
  jf.assertEquals_1('', jl.repeat_I_Ljava_lang_String___devirtual$('foo', 0));
  jf.assertEquals_1('foo', jl.repeat_I_Ljava_lang_String___devirtual$('foo', 1));
  jf.assertEquals_1('foofoofoo', jl.repeat_I_Ljava_lang_String___devirtual$('foo', 3));

This is caused by DeadCodeElimination.tryOptimizeStringCall, where it finds a string literal with a plain method call on it, and calls it reflectively in the JVM at compile time, saving the compiled JS from needing to do it.

So we need to "hide" the string constants from the compiler so it doesnt just call trim() or repeat() on it, for example, StringTest.java does this (but doesn't use it for trim tests?):

private <T> T hideFromCompiler(T value) {
if (Math.random() < -1) {
// Can never happen, but fools the compiler enough not to optimize this call.
fail();
}
return value;
}

Used like this:

  private void stripRightAsssertEquals(String expected, String arg) {
    assertEquals(expected, hideFromCompiler(arg).strip());
  }

This makes for some awful output:

_.testStrip = function testStrip(){
  assertEquals_1('', $strip(($wnd.Math.random() < -1 && fail() , '')));
  assertEquals_1('', $strip(($wnd.Math.random() < -1 && fail() , '  ')));
  assertEquals_1('x', $strip(($wnd.Math.random() < -1 && fail() , ' x ')));
  assertEquals_1('x', $strip(($wnd.Math.random() < -1 && fail() , '\x1Cx\x1C')));
  assertEquals_1('\xA0x\xA0', $strip(($wnd.Math.random() < -1 && fail() , '\xA0x\xA0 ')));
  assertEquals_1('\uFFEFx\u180E', $strip(($wnd.Math.random() < -1 && fail() , '\uFFEFx\u180E ')));
}
;
_.testStripLeading = function testStripLeading(){
  assertEquals_1('', $stripLeading(($wnd.Math.random() < -1 && fail() , '')));
  assertEquals_1('', $stripLeading(($wnd.Math.random() < -1 && fail() , '  ')));
  assertEquals_1('x ', $stripLeading(($wnd.Math.random() < -1 && fail() , ' x ')));
  assertEquals_1('x\x1C', $stripLeading(($wnd.Math.random() < -1 && fail() , '\x1Cx\x1C')));
  assertEquals_1('\xA0x\xA0', $stripLeading(($wnd.Math.random() < -1 && fail() , '\xA0x\xA0')));
}
;
_.testStripTrailing = function testStripTrailing(){
  assertEquals_1('', $stripTrailing(($wnd.Math.random() < -1 && fail() , '')));
  assertEquals_1('', $stripTrailing(($wnd.Math.random() < -1 && fail() , '  ')));
  assertEquals_1(' x', $stripTrailing(($wnd.Math.random() < -1 && fail() , ' x ')));
  assertEquals_1('\x1Cx', $stripTrailing(($wnd.Math.random() < -1 && fail() , '\x1Cx\x1C')));
  assertEquals_1('\xA0x\xA0', $stripTrailing(($wnd.Math.random() < -1 && fail() , '\xA0x\xA0 ')));
}
;

but we can clearly see that it is no longer able to optimize away the call. I'll put up the change for the plain StringTest class, no idea how that was ignored for so long...

Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience while I found time to review. Added a lot of comments, none are critical to merging, but mostly making sure I understand and that there isn't a reason to make changes now while we're thinking about it.

@@ -0,0 +1,5 @@
<component name="ProjectCodeStyleConfiguration">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this from the diff?

user/super/com/google/gwt/emul/java/lang/String.java Outdated Show resolved Hide resolved
user/super/com/google/gwt/emul/java/lang/String.java Outdated Show resolved Hide resolved
user/super/com/google/gwt/emul/java/lang/String.java Outdated Show resolved Hide resolved
user/super/com/google/gwt/emul/java/lang/String.java Outdated Show resolved Hide resolved
assertFalse("\u00a0".isBlank());
}

public void testStrip() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, here's what I tend to do:

  • add -setProperty compiler.useSourceMaps=true to test.args, this disables the stack trace emulation that makes the output hard to read. Note that this may slightly break your stack traces, the line numbers won't quite be accurate (but close)
  • add -style PRETTY also to test.args, this makes the output more or less readable
  • add some apparent nondeterminism, or go through JS to your code to make the compiler unable to optimize - for example, an always-true condition like Math.random() < 5 in an if or ternary, or a JSNI unchecked cast wrapping a value

From the first two, when I look at the JS file in build/out/user/test/web-htmlunit/www/com.google.gwt.emultest.EmulSuite.JUnit/ I see

_.testStrip = function testStrip(){
  assertEquals_1('', '');
  assertEquals_1('', '');
  assertEquals_1('x', 'x');
  assertEquals_1('x', 'x');
  assertEquals_1('\xA0x\xA0', '\xA0x\xA0');
  assertEquals_1('\uFFEFx\u180E', '\uFFEFx\u180E');
}

Clearly that's no good - testRepeat at least is the same:

_.testRepeat = function testRepeat(){
  var ex;
  assertEquals_1('', '');
  assertEquals_1('foo', 'foo');
  assertEquals_1('foofoofoo', 'foofoofoo');
  try {
    checkCriticalArgument(false, 'count is negative: -1');
    'foo'.repeat(-1);
    throw toJs(new Error_0('Should fail with negative arg'));
  }
  // ...

Note however that the draft tests do better, since the compiler doesnt optimize as heavily:

_.stripRightAsssertEquals = function stripRightAsssertEquals(expected, arg){
  jf.assertEquals_1(expected, jl.strip__Ljava_lang_String___devirtual$(arg));
}
;
//...
_.testStrip = function testStrip(){
  this.stripRightAsssertEquals('', '');
  this.stripRightAsssertEquals('', '  ');
  this.stripRightAsssertEquals('x', ' x ');
  this.stripRightAsssertEquals('x', '\x1Cx\x1C');
  this.stripRightAsssertEquals('\xA0x\xA0', '\xA0x\xA0 ');
  this.stripRightAsssertEquals('\uFFEFx\u180E', '\uFFEFx\u180E ');
}
;
_.testRepeat = function testRepeat(){
  var ex, noFoo;
  jf.assertEquals_1('', jl.repeat_I_Ljava_lang_String___devirtual$('foo', 0));
  jf.assertEquals_1('foo', jl.repeat_I_Ljava_lang_String___devirtual$('foo', 1));
  jf.assertEquals_1('foofoofoo', jl.repeat_I_Ljava_lang_String___devirtual$('foo', 3));

This is caused by DeadCodeElimination.tryOptimizeStringCall, where it finds a string literal with a plain method call on it, and calls it reflectively in the JVM at compile time, saving the compiled JS from needing to do it.

So we need to "hide" the string constants from the compiler so it doesnt just call trim() or repeat() on it, for example, StringTest.java does this (but doesn't use it for trim tests?):

private <T> T hideFromCompiler(T value) {
if (Math.random() < -1) {
// Can never happen, but fools the compiler enough not to optimize this call.
fail();
}
return value;
}

Used like this:

  private void stripRightAsssertEquals(String expected, String arg) {
    assertEquals(expected, hideFromCompiler(arg).strip());
  }

This makes for some awful output:

_.testStrip = function testStrip(){
  assertEquals_1('', $strip(($wnd.Math.random() < -1 && fail() , '')));
  assertEquals_1('', $strip(($wnd.Math.random() < -1 && fail() , '  ')));
  assertEquals_1('x', $strip(($wnd.Math.random() < -1 && fail() , ' x ')));
  assertEquals_1('x', $strip(($wnd.Math.random() < -1 && fail() , '\x1Cx\x1C')));
  assertEquals_1('\xA0x\xA0', $strip(($wnd.Math.random() < -1 && fail() , '\xA0x\xA0 ')));
  assertEquals_1('\uFFEFx\u180E', $strip(($wnd.Math.random() < -1 && fail() , '\uFFEFx\u180E ')));
}
;
_.testStripLeading = function testStripLeading(){
  assertEquals_1('', $stripLeading(($wnd.Math.random() < -1 && fail() , '')));
  assertEquals_1('', $stripLeading(($wnd.Math.random() < -1 && fail() , '  ')));
  assertEquals_1('x ', $stripLeading(($wnd.Math.random() < -1 && fail() , ' x ')));
  assertEquals_1('x\x1C', $stripLeading(($wnd.Math.random() < -1 && fail() , '\x1Cx\x1C')));
  assertEquals_1('\xA0x\xA0', $stripLeading(($wnd.Math.random() < -1 && fail() , '\xA0x\xA0')));
}
;
_.testStripTrailing = function testStripTrailing(){
  assertEquals_1('', $stripTrailing(($wnd.Math.random() < -1 && fail() , '')));
  assertEquals_1('', $stripTrailing(($wnd.Math.random() < -1 && fail() , '  ')));
  assertEquals_1(' x', $stripTrailing(($wnd.Math.random() < -1 && fail() , ' x ')));
  assertEquals_1('\x1Cx', $stripTrailing(($wnd.Math.random() < -1 && fail() , '\x1Cx\x1C')));
  assertEquals_1('\xA0x\xA0', $stripTrailing(($wnd.Math.random() < -1 && fail() , '\xA0x\xA0 ')));
}
;

but we can clearly see that it is no longer able to optimize away the call. I'll put up the change for the plain StringTest class, no idea how that was ignored for so long...

Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from the build, testing locally.

niloc132 added a commit to niloc132/gwt that referenced this pull request Jul 21, 2024
Resolves TODOs in the old tests, so that the DeadCodeElimination pass
doesn't rewrite the test to do nothing, but actually confirm that the
String class does what is expected.

Related gwtproject#9975
@niloc132
Copy link
Member

Please give #9986 a look to resolve those other TODOs.

Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests pass at this time.

@zbynek zbynek changed the title String method emulation for Java 11 String method emulation for Java 11, align Character.isWhitespace with Java 11 Jul 22, 2024
niloc132 added a commit that referenced this pull request Jul 22, 2024
Resolves TODOs in the old tests, so that the DeadCodeElimination pass
doesn't rewrite the test to do nothing, but actually confirm that the
String class does what is expected.

Related #9975
Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, will give it a day or so for other reviewers, then merge if there are no other comments.

@niloc132 niloc132 mentioned this pull request Jul 23, 2024
@niloc132 niloc132 added this to the 2.12 milestone Jul 23, 2024
@niloc132 niloc132 merged commit e9fecc5 into gwtproject:main Jul 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Whitespace check inconsistency
2 participants