Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict mode for CBOR #2662

Open
timmc opened this issue May 4, 2024 · 16 comments
Open

Strict mode for CBOR #2662

timmc opened this issue May 4, 2024 · 16 comments
Labels

Comments

@timmc
Copy link
Contributor

timmc commented May 4, 2024

What is your use-case and why do you need this feature?

My app needs to be able to sign and verify serialized data, which requires that the parse be unambiguous. The main problem is duplicate map keys. If there are repeated keys, then two recipients may disagree on what the signed data says, which is obviously a problem. :-) I've written on this kind of parser mismatch vulnerability previously: https://www.brainonfire.net/blog/2022/04/29/preventing-parser-mismatch/

The CBOR implementation currently accepts duplicate keys by using the last one encountered, but the spec says in §2.1:

A map that has duplicate keys may be well-formed, but it is not valid

§3.10 "Strict Mode" then waffles a bit and puts the onus on the sender to "avoid ambiguously decodable data" and defines an optional strict mode that decoders are not required to implement, particularly calling out duplicate key detection as a possibly heavy-weight requirement.

However, kotlinx.serialization already has the ability to detect duplicate keys because it builds up a HashMap or similar which can simply be consulted before insertion. (There's no streaming interface that I'm aware of.) The language in §3.10 is likely aimed at embedded and other highly constrained environments, which isn't particularly relevant to Kotlin.

Describe the solution you'd like

  1. Generally support rejection of repeated map keys, as an optional feature.
  2. Expose this as part of a new strict-mode feature in the Cbor implementation, enabled by default to comply with §2.1.

(I'd also love for the duplicate key rejection to be enabled by default for all formats, as this is a known vulnerability in many, many protocols and people should be protected by default—but still able to explicitly opt out of this protection if they know what they're doing. Major version bump, I know.)

Note: #1990 covers rejection of duplicate keys, but it wasn't clear if that was meant to be specific to JSON.

@timmc timmc added the feature label May 4, 2024
@timmc
Copy link
Contributor Author

timmc commented May 4, 2024

(Updated with more info from the spec.)

@sandwwraith
Copy link
Member

Yes, it sounds very reasonable. I do not mind adding it at some point in the future or via contribution.

@timmc
Copy link
Contributor Author

timmc commented May 6, 2024

Great, I may try to work up a PR. Where would you recommend getting started? I found insertKeyValuePair in CollectionSerializers—but it didn't seem that the tests actually exercise that code path, and I can't find anything that calls insertKeyValuePair.

@pdvrieze
Copy link
Contributor

pdvrieze commented May 6, 2024

@timmc What happens from a format perspective is that it recognizes that this is a collection serializer. Then it will call the update decoder (at some point it was part of DeserializerStrategy, but it is no longer) with the previous list value.That update will create/update the collection with the new value (or not). It would be possible to create a new version of this function that takes a strict argument (with the default implementation of the previous one passing false for strictness).

@timmc
Copy link
Contributor Author

timmc commented May 7, 2024

Would you be able to point me to a code location? I'm not finding useful references to update, CollectionSerializer, etc. I just discovered that LinkedHashMapSerializer extends LinkedHashMap, but I'm having trouble finding the code that actually puts key-value pairs into a map when deserializing map data.

timmc added a commit to timmc/kotlinx.serialization that referenced this issue May 10, 2024
@timmc
Copy link
Contributor Author

timmc commented May 10, 2024

OK, found it -- builder[key] = value in MapLikeSerializer. Starting a branch here: dev...timmc:timmc/cbor-strict

@timmc
Copy link
Contributor Author

timmc commented May 11, 2024

I've found it straightforward to reject duplicate keys. Making it configurable would be a larger undertaking—I can either plumb it through via the serializers or via the descriptors, but there's not really a clean way to do it since there's no config object being passed along. Instead, I would have to add allowDuplicateKeys: Boolean to a large number of method and constructor signatures, and I still haven't found the end of that chain.

Would you accept a PR that just forbids duplicate keys in maps and objects, with an option to later add a configuration option that permits duplicate keys?

@sandwwraith
Copy link
Member

@timmc I don't think it is a good idea to change existing behavior so that people would start receiving exceptions on a code that worked before, especially without an option to bring old behavior back. So it's better to have it. Flags are already stored in Cbor object, which is available in every CborReader. So you need to pass this object further down to CborDecoder, and that's likely it.

@pdvrieze
Copy link
Contributor

@timmc The code involved is:

final override fun readElement(decoder: CompositeDecoder, index: Int, builder: Builder, checkIndex: Boolean) {
val key: Key = decoder.decodeSerializableElement(descriptor, index, keySerializer)
val vIndex = if (checkIndex) {
decoder.decodeElementIndex(descriptor).also {
require(it == index + 1) { "Value must follow key in a map, index for key: $index, returned index for value: $it" }
}
} else {
index + 1
}
val value: Value = if (builder.containsKey(key) && valueSerializer.descriptor.kind !is PrimitiveKind) {
decoder.decodeSerializableElement(descriptor, vIndex, valueSerializer, builder.getValue(key))
} else {
decoder.decodeSerializableElement(descriptor, vIndex, valueSerializer)
}
builder[key] = value

and

public fun merge(decoder: Decoder, previous: Collection?): Collection {
val builder = previous?.toBuilder() ?: builder()
val startIndex = builder.builderSize()
val compositeDecoder = decoder.beginStructure(descriptor)
if (compositeDecoder.decodeSequentially()) {
readAll(compositeDecoder, builder, startIndex, readSize(compositeDecoder, builder))
} else {
while (true) {
val index = compositeDecoder.decodeElementIndex(descriptor)
if (index == CompositeDecoder.DECODE_DONE) break
readElement(compositeDecoder, startIndex + index, builder)
}
}
compositeDecoder.endStructure(descriptor)
return builder.toResult()
}
override fun deserialize(decoder: Decoder): Collection = merge(decoder, null)
private fun readSize(decoder: CompositeDecoder, builder: Builder): Int {
val size = decoder.decodeCollectionSize(descriptor)
builder.checkCapacity(size)
return size
}
protected abstract fun readElement(decoder: CompositeDecoder, index: Int, builder: Builder, checkIndex: Boolean = true)

It is important to note that for some formats it is valid to read list/map items one by one (for example in Protobuf):

private fun <T> deserializeMap(deserializer: DeserializationStrategy<T>, previousValue: T?): T {
val serializer = (deserializer as MapLikeSerializer<Any?, Any?, T, *>)
// Yeah thanks different resolution algorithms
val mapEntrySerial =
kotlinx.serialization.builtins.MapEntrySerializer(serializer.keySerializer, serializer.valueSerializer)
val oldSet = (previousValue as? Map<Any?, Any?>)?.entries
val setOfEntries = (SetSerializer(mapEntrySerial) as AbstractCollectionSerializer<Map.Entry<Any?, Any?>, Set<Map.Entry<Any?, Any?>>, *>).merge(this, oldSet)
return setOfEntries.associateBy({ it.key }, { it.value }) as T
}

This later case requires special handling of the MapSerializer. So you have two separate cases to support this generically (reading multiple items at a time, directly in MapSerializer; and repeatedly reading sublists).

@sandwwraith
Copy link
Member

I'm not sure it is a good idea to add changes to CollectionSerializer since it is used by all formats. You also need to check whether regular objects have duplicate keys. It seems you have to track keys manually, with some kind of Set.

@pdvrieze
Copy link
Contributor

@sandwwraith Indeed modifying CollectionSerializer would be a major undertaking with various compatibility concerns/challenges.

@timmc
Copy link
Contributor Author

timmc commented May 19, 2024

I don't think CborMapReader has enough information to build and check a list of keys. By the time decodeElementIndex has been called (in the parent class CborListReader), MapLikeSerializer.readElement has already called decodeSerializableElement to read the key.

Maybe it would make sense for MapLikeSerializer to call a new CompositeDecoder.visitedKey method that accepts a key and then has the option of throwing an exception?

@pdvrieze
Copy link
Contributor

pdvrieze commented May 19, 2024

Maybe it would make sense for MapLikeSerializer to call a new CompositeDecoder.visitedKey method that accepts a key and then has the option of throwing an exception?

I like this solution (even though some bits of the name/semantics could be "improved"). The function can have a default implementation that results in the current behaviour. Note that you should probably also think about the set serializer (sets also don't allow duplicate values/only keep the last one).

I had a look at the library source code. You probably want to do something that checks the result of put/add which normally return the previous value. Doing this will maintain the efficiency of the implementation (contains checks are not cheap)

timmc added a commit to timmc/kotlinx.serialization that referenced this issue May 19, 2024
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected.

Also fixes a typo in an unrelated method docstring.
timmc added a commit to timmc/kotlinx.serialization that referenced this issue May 19, 2024
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected.

Also fixes a typo in an unrelated method docstring.
timmc added a commit to timmc/kotlinx.serialization that referenced this issue May 19, 2024
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected.

Also fixes a typo in an unrelated method docstring.
timmc added a commit to timmc/kotlinx.serialization that referenced this issue May 20, 2024
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected.

A new config option `Cbor.allowDuplicateKeys` can be set to false to
enable this new behavior. This can form the basis of a Strict Mode in the
future.

Also fixes a typo in an unrelated method docstring.
@timmc
Copy link
Contributor Author

timmc commented May 20, 2024

Yeah, it seems to work out OK -- I have #2681 which still needs some work but should be on the right path.

@timmc
Copy link
Contributor Author

timmc commented Nov 30, 2024

I'm coming back to this work and before I try to get my PR updated against recent changes, I'm wondering if there's still any interest in this issue? There are some significant merge conflicts and I don't want to work through them if the changes would be unwelcome.

I also noticed that there is now a COSE compliance mode, but the COSE spec indicates that parsing duplicate keys is illegal:

Applications MUST NOT parse and process messages with the same label used twice as a key in a single map.

(from https://www.rfc-editor.org/rfc/rfc8152.txt section 14 "CBOR Encoder Restrictions")

Did this restriction get implemented somewhere after all, or is it still missing?

timmc added a commit to timmc/kotlinx.serialization that referenced this issue Nov 30, 2024
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected.

A new config option `forbidDuplicateKeys` can be set to true to enable this
new behavior. This can form the basis of a Strict Mode in the future.
@timmc
Copy link
Contributor Author

timmc commented Nov 30, 2024

(I ended up just replaying my changes on top of dev, since that was more expedient. But I'd still like to know if the changes are wanted, and the answer to my COSE question.)

timmc added a commit to timmc/kotlinx.serialization that referenced this issue Nov 30, 2024
Fixes Kotlin#2662 by adding
a `visitKey` method to `CompositeDecoder`; map and set serializers should
call this so that decoders have an opportunity to throw an error when a
duplicate key is detected. Calls to this method are added to:

- `MapLikeSerializer.readElement` to handle maps
- `CborReader.decodeElementIndex` to handle data classes

A new config option `forbidDuplicateKeys` can be set to true to enable this
new behavior. This can form the basis of a Strict Mode in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants