Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraneous bytes in RotateEvent file name #189

Open
andrenth opened this issue Apr 22, 2022 · 7 comments
Open

Extraneous bytes in RotateEvent file name #189

andrenth opened this issue Apr 22, 2022 · 7 comments

Comments

@andrenth
Copy link

Hello

I'm using mysql_async to write a program that tails the MariaDB binlog. I'm using MariaDB 10.5.15. The server is configured with the following binlog-related settings:

[mysqld]
log-bin = mariadb-bin
binlog-format = row
binlog-row-metadata = full

The following binlogs are available:

> show binary logs;
+--------------------+-----------+
| Log_name           | File_size |
+--------------------+-----------+
| mariadb-bin.000001 |      1074 |
| mariadb-bin.000002 |      5664 |
+--------------------+-----------+
2 rows in set (0.000 sec)

When running the program below, I'm seeing extraneous characters in the result of the name() and name_raw() methods in the RotateEvent struct:

use futures_util::StreamExt;
use mysql_async::{binlog::events::EventData, BinlogRequest, Conn, OptsBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let opts = OptsBuilder::default()
        .ip_or_hostname("localhost")
        .tcp_port(3306)
        .user(Some("user"))
        .pass(Some("pass"));

    let conn = Conn::new(opts).await?;
    let req = BinlogRequest::new(1)
        .with_filename("mariadb-bin.000001".as_bytes())
        .with_pos::<u64>(4);

    let mut stream = conn.get_binlog_stream(req).await?;
    while let Some(Ok(event)) = stream.next().await {
        let data = event.read_data()?.unwrap();
        match data {
            EventData::RotateEvent(e) => {
                println!("[-] rotate event");
                println!("    - header pos={}", event.header().log_pos());
                println!("    - name={}", e.name().to_string());
                println!("    - raw={:?}", e.name_raw());
                println!("    - position={}", e.position());
            }
            _ => {}
        }
    }

    Ok(())
}

The program output is as follows:

[-] rotate event
    - header pos=0
    - name=mariadb-bin.000001R<sT
    - raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 49, 82, 60, 115, 84]
    - position=4
[-] rotate event
    - header pos=0
    - name=mariadb-bin.000002
    - raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50]
    - position=4

Notice that in the first event, there are 4 extra bytes after the ones corresponding to mariadb-bin.000001, 82, 60, 115, 84, which I've found to be unexpected.

If I change the BinlogRequest as follows (pointing it to the end of the last binlog file):

    let req = BinlogRequest::new(1)
        .with_filename("mariadb-bin.000002".as_bytes())
        .with_pos::<u64>(5664);

then I get extra bytes in the name of that file:

[-] rotate event
    - header pos=0
    - name=mariadb-bin.000002?���
    - raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50, 63, 188, 203, 26]
    - position=5446

Do you know what could be causing this?

@blackbeam
Copy link
Owner

Hi. Most likely a bug in the parser. Could you, please, attach the files?

@andrenth
Copy link
Author

Here they are

mariadb-bin.000001.zip
mariadb-bin.000002.zip

@blackbeam
Copy link
Owner

Well, it seems that this rotate event is FAKE_ROTATE_EVENT, i.e. it is not written into the binlog file. I believe the issue is the following:

Client must read FORMAT_DESCRIPTION_EVENT (FDE) to be able to correctly handle event footer (i.e. to know that there should be (or should not be) a checksum), but FAKE_ROTATE_EVENT goes first and there is no way for a client to guess server configuration and to distinguish whether the footer is empty or it is not empty and contains a checksum. It seems that your server configured in a way that it adds checksums to binlog events, so this 4 additional bytes is most likely a checksum.

I'm not sure how to correctly handle this. For now I suggest you to delay this event until FDE is received. Then you should be able to chop off a checksum if it's necessary (you can decide by looking at (fde.footer().get_checksum_alg())

@andrenth
Copy link
Author

Yeah, I’ve just checked and the server is configured with binlog_checksum = CRC32.

The Go canal package seems to identify these fake events by checking if the header position is zero. But in my first example above, where I read from the first binlog file, the header position is also zero in the second rotate event, which presumably is not a fake one…

I’ll try to use the FDE and report back, but probably only on Monday.

@andrenth
Copy link
Author

This seems to have worked, though it seems weird to me because the stripped-off bytes don't match the result from event.calc_checksum():

use futures_util::StreamExt;
use mysql_async::{
    binlog::events::{BinlogEventFooter, EventData},
    BinlogRequest, Conn, OptsBuilder, Result,
};

#[tokio::main]
async fn main() -> Result<()> {
    let opts = OptsBuilder::default()
        .ip_or_hostname("localhost")
        .tcp_port(3306)
        .user(Some("user"))
        .pass(Some("pass"));

    let conn = Conn::new(opts).await?;
    let req = BinlogRequest::new(1).with_pos::<u64>(4);

    let mut stream = conn.get_binlog_stream(req).await?;
    let mut name = "".to_owned();

    while let Some(Ok(event)) = stream.next().await {
        let data = event.read_data()?.unwrap();
        match data {
            EventData::RotateEvent(e) => {
                println!("[-] rotate event");
                if name.len() == 0 && event.header().timestamp() == 0 {
                    // first fake rotate event
                    name = if let Ok(Some(alg)) = event.fde().footer().get_checksum_alg() {
                        let cksum = event.calc_checksum(alg);
                        println!("    - checksum={:?}", u32::to_be_bytes(cksum));
                        let raw = e.name_raw();
                        let raw = &raw[..raw.len() - BinlogEventFooter::BINLOG_CHECKSUM_LEN];
                        String::from_utf8_lossy(raw).to_string()
                    } else {
                        e.name().to_string()
                    }
                } else {
                    name = e.name().to_string();
                }
                println!("    - name={}", name);
                println!("    - orig name={:?}", e.name_raw());
            }
            _ => {}
        }
    }

    Ok(())
}

Output:

[-] rotate event
    - checksum=[33, 68, 223, 28]
    - name=mariadb-bin.000001
    - orig name=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 49, 82, 60, 115, 84]
[-] rotate event
    - name=mariadb-bin.000002
    - orig name=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50]

@andrenth
Copy link
Author

Do you think this workaround makes sense? I'm not sure if the calculated checksums are really supposed to match with extra bytes after the file name.

@andrenth
Copy link
Author

I've tried replacing MariaDB with MySQL, and in that setup the extra bytes do not appear, even with binlog_checksum = CRC32. In MySQL, event.fde().footer().get_checksum_alg() returns BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_OFF, while in MariaDB it returns BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_CRC32.

I've also noticed that the extra bytes only seem to appear in the first event yielded by the BinlogStream, and this seems to happen regardless of the file and position parameters given to the BinlogRequest. So currently I'm using this code to extract the binlog file name from a RotateEvent, where the first parameter is only true for the first event of the stream.

fn binlog_file_name(event: &RotateEvent, footer: &BinlogEventFooter, first: bool) -> String {
    if !first {
        return event.name().to_string();
    }   
    if let Ok(Some(BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_CRC32)) = footer.get_checksum_alg() {
        let name = event.name_raw();
        let name = &name[..name.len() - BinlogEventFooter::BINLOG_CHECKSUM_LEN];
        return String::from_utf8_lossy(name).to_string();
    }   
    event.name().to_string()
}

blackbeam added a commit to blackbeam/rust_mysql_common that referenced this issue May 31, 2022
xmas7 pushed a commit to RubyOnWorld/rust_mysql_common that referenced this issue Sep 6, 2022
abcpro1 added a commit to abcpro1/dozer that referenced this issue Aug 27, 2023
MariaDB generates some artificial binlog events that should be skipped; check out https://mariadb.com/kb/en/fake-rotate_event/.
This event, in particular, sometimes includes an invalid binlog filename. See blackbeam/mysql_async#189.
I found that it's safe to just skip artificial events.

MariaDB reports `longtext` as a `column_type` for JSON columns. I've added a special check to distinguish MariaDB JSON columns from text columns.
abcpro1 added a commit to abcpro1/dozer that referenced this issue Aug 28, 2023
MariaDB generates some artificial binlog events that should be skipped; check out https://mariadb.com/kb/en/fake-rotate_event/.
This event, in particular, sometimes includes an invalid binlog filename. See blackbeam/mysql_async#189.
I found that it's safe to just skip artificial events.

MariaDB reports `longtext` as a `column_type` for JSON columns. I've added a special check to distinguish MariaDB JSON columns from text columns.
abcpro1 added a commit to abcpro1/dozer that referenced this issue Aug 28, 2023
MariaDB generates some artificial binlog events that should be skipped; check out https://mariadb.com/kb/en/fake-rotate_event/.
This event, in particular, sometimes includes an invalid binlog filename. See blackbeam/mysql_async#189.
I found that it's safe to just skip artificial events.

MariaDB reports `longtext` as a `column_type` for JSON columns. I've added a special check to distinguish MariaDB JSON columns from text columns.
github-merge-queue bot pushed a commit to getdozer/dozer that referenced this issue Aug 28, 2023
* test: unittests for the MySQL connector

Increase the coverage of the MySQL connector implementation above 90% with a combination of unittests and integration tests.

* fix: mariadb

MariaDB generates some artificial binlog events that should be skipped; check out https://mariadb.com/kb/en/fake-rotate_event/.
This event, in particular, sometimes includes an invalid binlog filename. See blackbeam/mysql_async#189.
I found that it's safe to just skip artificial events.

MariaDB reports `longtext` as a `column_type` for JSON columns. I've added a special check to distinguish MariaDB JSON columns from text columns.

* test: mariadb

Perform integration tests on both MySQL and MariaDB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants