Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applies various fixes to writing and reading of Ion 1.1 system symbols in binary. #994

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

tgregg
Copy link
Contributor

@tgregg tgregg commented Nov 15, 2024

Description of changes:

  • Write system symbol tokens using their SymbolSymbols_1_1 value, allowing binary to use the 0xEE opcode and FlexSym system opcodes.
  • Clean up / correct IVM writing.
  • Clean up / correct context resetting when an IVM is encountered in both the application and system binary readers.
  • Allow the core reader to resolve symbol text when it is available, which is only the case in Ion 1.1. This allows the system reader wrapper to correctly roundtrip Ion 1.1 data, though this is not the lowest-level transcode that can be done because it consumes encoding directives. A future PR will need to add an "encoding-level" transcode that can preserve encoding directives and macro invocations.
  • Add tests to exercise the fixes.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

}
if (valueTid == null || IonType.STRING != valueTid.type) {
throwDueToInvalidType(IonType.STRING);
String value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from IonReaderContinuableApplicationBinary. Behavior for Ion 1.0 remains the same because getSymbol(sid) always returns null in that case. For Ion 1.1, symbols with known text (due to being system symbols or being read from encoding directives processed by the core reader) are now resolved. This enables a system-level (not encoding-level) transcode.

@@ -1031,34 +1009,6 @@ public SymbolTable getSymbolTable() {
return cachedReadOnlySymbolTable;
}

@Override
public String stringValue() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to IonReaderContinuableCoreBinary.

@@ -1116,7 +1118,7 @@ String getSymbol(int sid) {
*/
boolean matchesSystemSymbol_1_1(Marker marker, SystemSymbols_1_1 systemSymbol) {
if (marker.typeId == IonTypeID.SYSTEM_SYMBOL_VALUE) {
return marker.endIndex == systemSymbol.getId();
return systemSymbol.getText().equals(getSystemSymbolToken(marker).getText());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System symbol values are marked using the start and end index of the FixedUInt byte containing the symbol ID. getSystemSymbolToken properly handles that case.

int minorVersion = getIonMinorVersion();
resetImports(getIonMajorVersion(), minorVersion);
if (minorVersion > 0) {
// TODO reset macro table
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future PR will work on adding encoding-level transcoding of Ion 1.1. This will be resolved in that PR.

@@ -2464,6 +2505,7 @@ protected final SymbolToken getSystemSymbolToken(Marker marker) {
if (marker.startIndex == -1) {
id = marker.endIndex;
} else {
prepareScalar();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, failures occurred when the system symbol token's FixedUInt byte occurred beyond the data currently buffered.

@@ -292,7 +306,7 @@ public <T> T asFacet(Class<T> facetType) {

@Override
public SymbolTable getSymbolTable() {
// TODO generalize for Ion 1.1
// TODO generalize for Ion 1.1, whose system symbol table is not necessarily active.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't affect the functionality of the reader because this method is no longer used internally and is unlikely to be used often externally. Nevertheless, this should be fixed once we have a SymbolTable-abstraction view of the Ion 1.1 system symbol table.

// TODO change the following once the Ion 1.1 symbol table is finalized. Probably:
// sid10 -> sid10 - SystemSymbols.ION_1_0_MAX_ID;
IntTransformer ION_1_0_SID_TO_ION_1_1_SID = IDENTITY_INT_TRANSFORMER;
IntTransformer ION_1_0_SID_TO_ION_1_1_SID = sid -> sid - SystemSymbols.ION_1_0_MAX_ID;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: ion-java-benchmark-cli uses this when transcoding data from Ion 1.0 to Ion 1.1.

@@ -761,7 +765,6 @@ internal class IonManagedWriter_1_1(

// Just in case—call finish to flush the current system values, then user values, and then write the IVM.
finish()
userData.writeIVM()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually harmful, as it would result in an IVM being written before the user data but after the system data. This caused any systemData encoding context to be dropped. It can be removed entirely because finish() sets needsIvm to true.

@@ -34,7 +34,6 @@ class Ion_1_1_RoundTripTest {
override val newWriterForAppendable: (Appendable) -> IonWriter = builder::build
}

@Disabled("Disabled because the text reader does not support Ion 1.1 encoding directives.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to delete this when text support for encoding directives was added. The tests pass.

0xE7, 0x01, 0x6A, // One FlexSym annotation, with opcode, opcode 6A = system symbol A = $ion_encoding
0xC6, // (
0xC5, 0xEE, 0x0F, // S-exp, system symbol 0xF = symbol_table
0xB2, 0x91, 'a', // ["a"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'a'

Nice! I had forgotten you can do this.

@tgregg tgregg merged commit 10b489d into ion-11-encoding Nov 15, 2024
16 of 17 checks passed
@tgregg tgregg deleted the ion-11-encoding-fix-system-symbols branch November 15, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants