Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Added transaction to kafka bindings as a first draft #257

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pkonopacki1
Copy link

Description

Information about transacional topic is important from the consumer perspective as it requires appropriate properties being setup on its side (isolation.level='read_committed'). Moreover, property like commit.interval.ms might affect latency on consumer side. This is why I propose adding transaction configuration object to kafka bindings. I'm curious about your opinion on that topic.

@dalelane
Copy link
Collaborator

In principle, I agree with the objective here - knowing if a producer is using transactions is important information, and would be good to include somewhere in the spec.

As for the specifics of where to put this, I feel that - as a description of a producer behaviour - it would be better to put this information in the operation binding (rather than the channel binding where you have proposed it). This is just my personal opinion though - I'm interested to see what others think.

@chrispatmore
Copy link

Throwing in my 2 cents:

I agree that knowing whether or not a particular channel is transactional, or used in a transactional way is important. From the POV
of an application implementing the AsyncAPI I think two things are important to know:

  1. If they should filter out aborted transactions from a particular channel (use read_committed).
  2. If a transaction should be performed between two channels in the document. e.g. the application should receive from channel A, and send to channel B via a transaction.

It looks like the proposal addresses case 1, which is simpler and a good start in this area.

In my personal opinion, this belongs in the receiver operation binding. i.e. the AsyncAPI document says this application should read only committed transactions from the referenced channel. Rather than saying this channel may contain aborted transactions.

Additionally, I wonder if this fits better with how problem 2 may be solved, which I would propose as a new action alongside send / receive. To describe a specific single operation the application should perform using two channels. Interested to hear what others think about this area

@pkonopacki1
Copy link
Author

Throwing in my 2 cents:

I agree that knowing whether or not a particular channel is transactional, or used in a transactional way is important. From the POV of an application implementing the AsyncAPI I think two things are important to know:

  1. If they should filter out aborted transactions from a particular channel (use read_committed).
  2. If a transaction should be performed between two channels in the document. e.g. the application should receive from channel A, and send to channel B via a transaction.

It looks like the proposal addresses case 1, which is simpler and a good start in this area.

In my personal opinion, this belongs in the receiver operation binding. i.e. the AsyncAPI document says this application should read only committed transactions from the referenced channel. Rather than saying this channel may contain aborted transactions.

Additionally, I wonder if this fits better with how problem 2 may be solved, which I would propose as a new action alongside send / receive. To describe a specific single operation the application should perform using two channels. Interested to hear what others think about this area

So, are you proposing that the transacional options should contain field like "read_committed: true"? I'm not sure if this is necessary, because this is the result of using transaction and can be concluded from the fact of using transaction itself.

Regarding mapping more channels within transactions options, I was thinking about that but could not think of a good solution. This would require mapping on API root level, otherwise all channels would have to be mapped several times on each channel itself. And this also does not solve the cross-api transactions, which are possible.

@chrispatmore
Copy link

chrispatmore commented Aug 9, 2024

The way I see it, the AysncAPI document exists as a set of instructions for how a client application should behave. and in this case we are dealing with the issue of a channel containing non commited messages (due to a transaction) and that because of this the consumer should be told whether it should read only commited messages or whether it should accept uncommitted ones.

This is a choice for the client that we should enable document writers to be specific about, it may be that they want the client to read non commited messages from this topic for some reason for example.

Additionally I am not sure how we can include transaction.id in the spec, as, as I understand it this must be unique to a client to avoid zombie instances breaking exactly once semantics for the transaction

I guess the way I am thinking about it is this (written as much for my own clarity as anything else):

There are three relevant client operations for this problem:

  1. Client receiving messages from a channel which contains uncommited messages
    • Relevant point IMO is should said client receive the uncommited messages or not
  2. Client sending messages to a channel which contains uncommited messages
    • I don't think this case matters / is particularly important, could be wrong
  3. Client performs a transaction, receiving from 1..n channels and sending to 1..n channels as part of it
    • There is no capacity to describe this at all at the moment in the document
    • I consider a transaction to be a single operation (akin to receive / send) as it's intended to be an atomic action
    • things relevant to the client here would be:
      • what channels (from what servers (different protocols?)) should the client receive from, and what channels should the client send to
      • how often should it try to commit

@pkonopacki1
Copy link
Author

The way I see it, the AysncAPI document exists as a set of instructions for how a client application should behave. and in this case we are dealing with the issue of a channel containing non commited messages (due to a transaction) and that because of this the consumer should be told whether it should read only commited messages or whether it should accept uncommitted ones.

This is a choice for the client that we should enable document writers to be specific about, it may be that they want the client to read non commited messages from this topic for some reason for example.

Additionally I am not sure how we can include transaction.id in the spec, as, as I understand it this must be unique to a client to avoid zombie instances breaking exactly once semantics for the transaction

I guess the way I am thinking about it is this (written as much for my own clarity as anything else):

There are three relevant client operations for this problem:

  1. Client receiving messages from a channel which contains uncommited messages

    • Relevant point IMO is should said client receive the uncommited messages or not
  2. Client sending messages to a channel which contains uncommited messages

    • I don't think this case matters / is particularly important, could be wrong
  3. Client performs a transaction, receiving from 1..n channels and sending to 1..n channels as part of it

    • There is no capacity to describe this at all at the moment in the document

    • I consider a transaction to be a single operation (akin to receive / send) as it's intended to be an atomic action

    • things relevant to the client here would be:

      • what channels (from what servers (different protocols?)) should the client receive from, and what channels should the client send to
      • how often should it try to commit

You are right, I though initially that consumer should always read only committed messages but it could be other case also. But than in the end, the binding could be as simple as:

opeartions
  operationName:
    bindings:
      kafka:
        transaction:
          transactional: true
          readCommitted: true

@chrispatmore
Copy link

Is the extra field transactional: true implicit in having transaction.readCommitted = true / false or some other field in there being present (if there were other fields)?

@pkonopacki1
Copy link
Author

Hmm, it would be implicitly true therefore maybe this one is not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants