-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64: Implement arraytranslateTRTO255 #7499
base: master
Are you sure you want to change the base?
Conversation
I will add support for other arraytranslateTRTO and arraytranslateTROT operations in separate PRs later. |
a0b9585
to
1ad8c13
Compare
Fixed a variable name. |
TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0)); | ||
TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1)); | ||
TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4)); | ||
TR::Register *outputLenReg = cg->allocateRegister(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Children 2,3,5 still need to be evaluated even if they are "dummy".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arraytranslate evaluators of p and x platforms leave those dummy children unevaluated.
- p:
omr/compiler/p/codegen/OMRTreeEvaluator.cpp
Lines 5087 to 5090 in be39f05
TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0)); TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1)); TR::Register *stopCharReg = arraytranslateTRTO255 ? NULL : cg->gprClobberEvaluate(node->getChild(3)); TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4)); - x:
omr/compiler/x/codegen/OMRTreeEvaluator.cpp
Lines 3462 to 3465 in be39f05
bool stopUsingCopyReg1 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(0), srcPtrReg, cg); bool stopUsingCopyReg2 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(1), dstPtrReg, cg); bool stopUsingCopyReg4 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(3), termCharReg, cg); bool stopUsingCopyReg5 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(4), lengthReg, cg);
Reference counts of all the child nodes are decremented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think you can only get away with that if they are constants. Are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those dummy nodes are constants.
See an example arraytranslate node at #7499 (comment).
if (!disableTRTO255) | ||
{ | ||
cg->setSupportsArrayTranslateTRTO255(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think support for this opcode should be enabled in OMR when there isn't an implementation for it in OMR (the helper implementation is missing). In the absence of that, this logic should be enabled in the downstream projects that do provide the helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both p and x platforms do the same: Implement the helper functions in the downstream project, and enable the feature in OMR.
On the other hand, z inlines the vectorized code instead of calling helper functions.
I can move the implementation of the AArch64 helper functions to OMR. In that case, maybe we want to move the helper functions for p and x to OMR for consistency.
[p]
- https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/p/runtime/J9PPCArrayTranslate.spp
- https://github.com/eclipse/omr/blob/205c540402c01986f433a34412f9d7fcfe61d90f/compiler/p/codegen/OMRCodeGenerator.cpp#L250-L262
[x]
// (2) translation table (dummy) | ||
// (3) stop character (terminal character, either 0xff00ff00 (ISO8859) or 0xff80ff80 (ASCII) | ||
// (4) input length (in elements) | ||
// (5) stopping char (dummy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the dummy nodes used by other architectures? Just wondering why they're dummy on AArch64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is different between p/x/aarch64 and z. See the code below:
omr/compiler/optimizer/ValuePropagationCommon.cpp
Lines 2535 to 2565 in 205c540
if (cg()->getSupportsArrayTranslateTRTO255()|| cg()->getSupportsArrayTranslateTRTO()) | |
{ | |
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0); | |
if (isISO88591Encoder) | |
termchar = 0xff00ff00; | |
else | |
termchar = 0xff80ff80; | |
} | |
else //z | |
{ | |
bool genSIMD = comp()->cg()->getSupportsVectorRegisters() && !comp()->getOption(TR_DisableSIMDArrayTranslate); | |
stopIndex = isISO88591Encoder ? 255: 127; | |
termchar = isISO88591Encoder ? 0x0B: 0xff; | |
if (genSIMD) | |
{ | |
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0); //dummy table node, it's not gonna be used | |
} | |
else | |
{ | |
uint8_t *table = (uint8_t*)comp()->trMemory()->allocateMemory(65536, stackAlloc); | |
int i; | |
for (i = 0; i <= stopIndex; i++) | |
table[i] = (uint8_t)i; | |
for (i = stopIndex+1; i < 65536; i++) | |
table[i] = (uint8_t)termchar; | |
tableNode = createTableLoad(comp(), callNode, 16, 8, table, false); | |
stopIndex=-1; | |
} | |
} |
The code for z from line 2543 generates a non-dummy node for child(2) at line 2562, while the non-z path generates a dummy aconst 0
at line 2537.
AArch64 takes the non-z path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of the arraytranslate
node for TRTO255
:
n819n ( 0) treetop [ 0x12313efb0] bci=[0,13,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1
n818n ( 1) arraytranslate <arraytranslate>[#275 helper Method] [flags 0x400 0x0 ] (in GPR_0115) (char2byteXlate byteArrayXlate tableBackedByRawStorage ) [ 0x12313ef60] bci=[0,13,180] rc=1 vc=2947 vn=- li=57 udi=13472 nc=6 flg=0xa020
n795n ( 0) aladd (in GPR_0112) (X>=0 internalPtr ) [ 0x12313e830] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=12416 nc=2 flg=0x8100
n460n ( 0) ==>aRegLoad (in &GPR_0016)
n797n ( 0) lsub (in GPR_0017) (X>=0 cannotOverflow ) [ 0x12313e8d0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=29968 nc=2 flg=0x1100
n798n ( 0) lshl (in GPR_0017) (X>=0 ) [ 0x12313e920] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=29968 nc=2 flg=0x100
n799n ( 0) i2l (highWordZero X>=0 ) [ 0x12313e970] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1 flg=0x4100
n12n ( 0) ==>iRegLoad (in GPR_0017) (cannotOverflow )
n803n ( 0) iconst 1 (Unsigned X!=0 X>=0 ) [ 0x12313eab0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x4104
n804n ( 0) lconst -16 (X!=0 X<=0 ) [ 0x12313eb00] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
n805n ( 0) aladd (in GPR_0113) (X>=0 internalPtr ) [ 0x12313eb50] bci=[0,34,185] rc=0 vc=2947 vn=- li=57 udi=12960 nc=2 flg=0x8100
n7n ( 0) ==>newarray (in &GPR_0038) (X!=0 )
n807n ( 0) lconst 16 (highWordZero X!=0 X>=0 cannotOverflow ) [ 0x12313ebf0] bci=[0,34,185] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x5104
n1286n ( 0) iconst 0 (X==0 X>=0 X<=0 ) [ 0x1234d81b0] bci=[-1,0,161] rc=0 vc=2947 vn=- li=- udi=- nc=0 flg=0x302
n816n ( 0) iconst 0xff00ff00 (X!=0 X<=0 ) [ 0x12313eec0] bci=[0,10,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
n815n ( 0) i2l (in GPR_0114) (highWordZero X>=0 ) [ 0x12313ee70] bci=[0,5,179] rc=0 vc=2947 vn=- li=57 udi=13216 nc=1 flg=0x4100
n5n ( 1) ==>iRegLoad (in GPR_0018) (cannotOverflow )
n1287n ( 0) iconst -1 (X!=0 X<=0 ) [ 0x1234d8200] bci=[-1,0,161] rc=0 vc=2947 vn=- li=- udi=- nc=0 flg=0x204
This commit implements arraytranslateTRTO255 for AArch64. Signed-off-by: KONNO Kazuhiro <[email protected]>
1ad8c13
to
ea32e0b
Compare
Jenkins build aarch64,amac |
I responded to all the review comments so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assembly code looks reasonable to me.
TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0)); | ||
TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1)); | ||
TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4)); | ||
TR::Register *outputLenReg = cg->allocateRegister(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think you can only get away with that if they are constants. Are they?
Jenkins build alinux,amac |
This commit implements arraytranslateTRTO255 for AArch64.