Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TC358743: Corrupted image on capturing 1080p@50Hz at RGB24 with 4 lanes #6322

Open
mdevaev opened this issue Aug 23, 2024 · 32 comments
Open

Comments

@mdevaev
Copy link
Contributor

mdevaev commented Aug 23, 2024

Describe the bug

When I'm trying to capture image with RGB24 format at 1080p@50Hz, I'm getting corrupted image like this:

image

At the same time, 60Hz is working fine and 30Hz is fine too. There seems to be something wrong with either stride or synchronization. UYVY is working fine at all frequencies.

Steps to reproduce the behaviour

  • Typical setup with TC358743 board with CM4 and 4 lanes.
  • EDID: https://github.com/pikvm/kvmd/blob/master/configs/kvmd/edid/v4plus.hex
  • Run capture software and choose input format RGB24 instead of the default UYVY:
    • gst-launch-1.0 -v -e v4l2src io-mode=4 ! video/x-raw,format=RGB ! capssetter caps="video/x-raw,format=BGR" ! v4l2convert output-io-mode=5 capture-io-mode=4 ! video/x-raw,width=1920,height=1080,format=BGRA ! kmssink (see on HDMI display)
    • ustreamer --host:: --port=8080 --format=rgb24 --format-swap-rgb --dv-timings --persistent (see on http://pi:8080).

Device (s)

Raspberry Pi CM4

System

  • Arch Linux ARM
  • Jul 25 2024 15:00:11
    Copyright (c) 2012 Broadcom
    version 8a2829a2ff4d4808cc14e10de7068e6afb45c96d (tainted) (release) (start)
    
  • Linux pikvm 6.6.45-2-rpi #1 SMP Wed Aug 21 00:26:23 UTC 2024 armv7l GNU/Linux

Logs

[root@pikvm ~]# v4l2-ctl --log

Status Log:

   [176760.132681] unicam fe801000.csi: =================  START STATUS  =================
   [176760.144093] tc358743 10-000f: -----Chip status-----
   [176760.150469] tc358743 10-000f: Chip ID: 0x00
   [176760.155914] tc358743 10-000f: Chip revision: 0x00
   [176760.161176] tc358743 10-000f: Reset: IR: 1, CEC: 1, CSI TX: 0, HDMI: 0
   [176760.168314] tc358743 10-000f: Sleep mode: off
   [176760.173386] tc358743 10-000f: Cable detected (+5V power): yes
   [176760.180317] tc358743 10-000f: DDC lines enabled: yes
   [176760.186446] tc358743 10-000f: Hotplug enabled: yes
   [176760.192419] tc358743 10-000f: CEC enabled: no
   [176760.197315] tc358743 10-000f: -----Signal status-----
   [176760.202902] tc358743 10-000f: TMDS signal detected: yes
   [176760.208703] tc358743 10-000f: Stable sync signal: yes
   [176760.214266] tc358743 10-000f: PHY PLL locked: yes
   [176760.219467] tc358743 10-000f: PHY DE detected: yes
   [176760.231390] tc358743 10-000f: Detected format: 1920x1080p50.00 (2640x1125)
   [176760.238793] tc358743 10-000f: horizontal: fp = 0, -sync = 720, bp = 0
   [176760.245752] tc358743 10-000f: vertical: fp = 0, -sync = 45, bp = 0
   [176760.252406] tc358743 10-000f: pixelclock: 148500000
   [176760.257739] tc358743 10-000f: flags (0x0):
   [176760.262322] tc358743 10-000f: standards (0x0):
   [176760.267215] tc358743 10-000f: Configured format: 1920x1080p50.00 (2640x1125)
   [176760.274765] tc358743 10-000f: horizontal: fp = 0, -sync = 720, bp = 0
   [176760.281632] tc358743 10-000f: vertical: fp = 0, -sync = 45, bp = 0
   [176760.288236] tc358743 10-000f: pixelclock: 148500000
   [176760.293564] tc358743 10-000f: flags (0x0):
   [176760.298055] tc358743 10-000f: standards (0x0):
   [176760.302886] tc358743 10-000f: -----CSI-TX status-----
   [176760.308336] tc358743 10-000f: Lanes needed: 3
   [176760.313071] tc358743 10-000f: Lanes in use: 3
   [176760.318503] tc358743 10-000f: Waiting for particular sync signal: no
   [176760.325898] tc358743 10-000f: Transmit mode: no
   [176760.331384] tc358743 10-000f: Receive mode: no
   [176760.336802] tc358743 10-000f: Stopped: no
   [176760.341163] tc358743 10-000f: Color space: RGB 888 24-bit
   [176760.347434] tc358743 10-000f: -----HDMI status-----
   [176760.352667] tc358743 10-000f: HDCP encrypted content: no
   [176760.358343] tc358743 10-000f: Input color space: xvYCC 709 limited range
   [176760.365948] tc358743 10-000f: AV Mute: off
   [176760.370934] tc358743 10-000f: Deep color mode: 8-bits per channel
   [176760.379888] tc358743 10-000f: HDMI infoframe: Auxiliary Video Information (AVI), version 2, length 13
   [176760.389810] tc358743 10-000f:     colorspace: YCbCr 4:4:4
   [176760.395638] tc358743 10-000f:     scan mode: No Data
   [176760.401001] tc358743 10-000f:     colorimetry: Extended
   [176760.406624] tc358743 10-000f:     picture aspect: 16:9
   [176760.412193] tc358743 10-000f:     active aspect: Same as Picture
   [176760.418605] tc358743 10-000f:     itc: No Data
   [176760.423473] tc358743 10-000f:     extended colorimetry: xvYCC 709
   [176760.429959] tc358743 10-000f:     quantization range: Default
   [176760.436099] tc358743 10-000f:     nups: Unknown Non-uniform Scaling
   [176760.442802] tc358743 10-000f:     video code: 31
   [176760.447806] tc358743 10-000f:     ycc quantization range: Limited
   [176760.454315] tc358743 10-000f:     hdmi content type: Graphics
   [176760.460449] tc358743 10-000f:     pixel repeat: 0
   [176760.465537] tc358743 10-000f:     bar top 0, bottom 0, left 0, right 0
   [176760.472471] unicam fe801000.csi: -----Receiver status-----
   [176760.478367] unicam fe801000.csi: V4L2 width/height:   1920x1080
   [176760.484745] unicam fe801000.csi: Mediabus format:     0000100a
   [176760.490984] unicam fe801000.csi: V4L2 format:         33424752
   [176760.497237] unicam fe801000.csi: Unpacking/packing:   0 / 0
   [176760.503254] unicam fe801000.csi: ----Live data----
   [176760.508467] unicam fe801000.csi: Programmed stride:   5760
   [176760.514375] unicam fe801000.csi: Detected resolution: 2880x1080
   [176760.520751] unicam fe801000.csi: Write pointer:       f015d9f0
   [176760.527057] unicam fe801000.csi: ==================  END STATUS  ==================

Additional context

No response

@6by9
Copy link
Contributor

6by9 commented Aug 23, 2024

I suspect it's the FIFO trigger point in the TC358743 for matching HDMI rate to CSI2 rate.
Toshiba provide a massive spreadsheet for calculating the value, but it's not replicated in the driver.
https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/media/i2c/tc358743.c#L1942-L1943

The mainline driver only cared about 720p60 and 1080p60 over 4 lanes, and only using a CSI2 link data rate of 594Mbit/s/lane.
Downstream we added 972Mbit/s/lane to extend the modes supported over 2 lanes.

Revert to the 594Mbit/s mode by using dtoverlay=tc358743,link-frequency=297000000 and you may find it works better in those modes, however there may be other modes that don't work with that setting. I'm not sure that 24bpp RGB 1080p60 fits to the bandwidth available there (Ignoring blanking, 1920108060*24 = 2,985,984,000, whereas 594Mbit/s/lane * 4 lanes = 2,376,000,000). I vaguely remember mainline saying they only cared about the UYVY mode.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

Sorry, ignore my stupid deleted comment :) I'll try link-frequency now.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

Added link-frequency=297000000 doesn't work for 1080p@50Hz at RGB at all (tried with and without4lane=1 with same results):

[   58.956154] unicam fe801000.csi: Device has requested 5 data lanes, which is >4 configured in DT

Toshiba provide a massive spreadsheet for calculating the value, but it's not replicated in the driver.

I have this spreadsheet, but I don't understand exactly what I should do here, my expertise in this area is not enough :/

I vaguely remember mainline saying they only cared about the UYVY mode.

I also read something similar. But without RGB, the display of the screen in DRM will not work (for the same function for which you recently fixed H.264 for us).

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

I could probably add a fifo level for 1080p@50hz/RGB, but I don't quite understand what the more general condition for switching to this level is. Apparently, some other video modes may have problems too.

@6by9
Copy link
Contributor

6by9 commented Aug 23, 2024

Added link-frequency=297000000 doesn't work for 1080p@50Hz at RGB at all (tried with and without4lane=1 with same results):

[   58.956154] unicam fe801000.csi: Device has requested 5 data lanes, which is >4 configured in DT

My hunch was right then that the default configuration had insufficient bandwidth for 1080p60 RGB24.

Toshiba provide a massive spreadsheet for calculating the value, but it's not replicated in the driver.

I have this spreadsheet, but I don't understand exactly what I should do here, my expertise in this area is not enough :/

Put your timing (1080p50 = 148.500 1920/528/44/148/+ 1080/4/5/36/+) in in row 13, select "original" in B14.
Clocking registers - C29 = 27, C30 = 4, C31 = 500M-1G, C32 = 144. Should give C34 = 972.

"FIFO Delay" field is the one you're looking for at H24. I get min 120, max 511, which implies that the driver setting of 374 isn't being set.

i2ctransfer -f -y 10 w2@0x0f 0x00 0x06 r2@0x0f should read back the value, and I'd hope it would be 0x01 0x76 (374 decimal).
i2ctransfer -f -y 10 w4@0x0f 0x00 0x06 0x01 0x76 to write that value, or alter it to see if you can find a setting that works.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

Hm, I've tried 511 and it broke both 60 and 50 Hz modes.

Something wrong:

[root@pikvm ~]# i2ctransfer -f -y 10 r2@0x0f 0x00 0x06 r2@0x0f
Error: Invalid direction
Error: faulty argument is '0x00'
[root@pikvm ~]# i2ctransfer -f -y 10 r4@0x0f 0x00 0x06 0x01 0x76
Error: Invalid direction
Error: faulty argument is '0x00'

@6by9
Copy link
Contributor

6by9 commented Aug 23, 2024

Typo by me
i2ctransfer -f -y 10 w2@0x0f 0x00 0x06 r2@0x0f
wN = write N bytes.
rN = read N bytes.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

Unchanged 374 value:

[root@pikvm ~]# i2ctransfer -f -y 10 w2@0x0f 0x00 0x06 r2@0x0f
0x76 0x01

But is it a right order?..

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

Ok, the order is right. I found that minimal value you calculated makes 50Hz working, and also fixing 720p@30Hz.

i2ctransfer -f -y 10 w4@0x0f 0x00 0x06 0x78 0x00

@6by9
Copy link
Contributor

6by9 commented Aug 23, 2024

But is it a right order?..

4.13.3 I2C Read Access Translation
Note that data transferred on the I2C bus is sent LSB first

I found that minimal value you calculated makes 50Hz working, and also fixing 720p@30Hz too.

0x78 being 120.
The spreadsheet implies that any setting up to the max of 511 is permissible. I have no other additional information to say what numbers to use. At one point I had removed the protection from the spreadsheet so that I could see the formulae, but I appear to have misplaced that copy :-(

Looking at it, I suspect 1080p50 RGB is only using 3 lanes (spreadsheet says it needs 898Mbit/s on 3 lanes), so it's likely to be the modes that are at the top end of the bandwidth requirements for the number of lanes that have issues. If tc358743_num_csi_lanes_needed used the full line time rather than the active width, I wonder if it would solve the issue by giving it a little more time to play with.

Actually I think that is totally it - it's using bt->height in the calculation, therefore the time per line (which is the critical thing for FIFO usage) is going to be wrong and quite likely to result in FIFO over/under flows. Use a mode with a long blanking period and the numbers will be totally garbage.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

The spreadsheet implies that any setting up to the max of 511 is permissible.

Well, I tried, it's not working. Very weird.

If tc358743_num_csi_lanes_needed used the full line time rather than the active width, I wonder if it would solve the issue by giving it a little more time to play with.

Sorry, I don't get it, about "little more time". Could you explain a bit?

Meanwhile, I made a shitty patch with fifo_level tuning. I can keep it for my work, or I can submit it as PR here. What you say? In my opinion, this is too specific a thing and resembles a dirty hack, on the other hand, it fully provides the capabilities of the chip (or almost completely, I can test other resolutions from my EDID).

diff --git a/drivers/media/i2c/tc358743.c b/drivers/media/i2c/tc358743.c
index 5b6f74e56..d99d1c84b 100644
--- a/drivers/media/i2c/tc358743.c
+++ b/drivers/media/i2c/tc358743.c
@@ -54,6 +54,8 @@ MODULE_LICENSE("GPL");
 #define POLL_INTERVAL_CEC_MS	10
 #define POLL_INTERVAL_MS	1000
 
+#define FIFO_LEVEL_DEFAULT 374
+
 static const struct v4l2_dv_timings_cap tc358743_timings_cap = {
 	.type = V4L2_DV_BT_656_1120,
 	/* keep this initialization for compatibility with GCC < 4.4.6 */
@@ -546,6 +548,24 @@ static inline void enable_stream(struct v4l2_subdev *sd, bool enable)
 		i2c_wr32(sd, TXOPTIONCNTRL, MASK_CONTCLKMODE);
 		/* Unmute video */
 		i2c_wr8(sd, VI_MUTE, MASK_AUTO_MUTE);
+
+		if (state->mbus_fmt_code == MEDIA_BUS_FMT_RGB888_1X24) {
+			/* https://github.com/raspberrypi/linux/issues/6322 */
+			struct v4l2_bt_timings *bt = &state->timings.bt;
+			struct tc358743_platform_data *pdata = &state->pdata;
+			unsigned f = fps(bt);
+			unsigned fifo_level = FIFO_LEVEL_DEFAULT;
+			if (
+				(bt->width == 1920 && bt->height == 1080 && f == 50)
+				|| (bt->width == 1280 && bt->height == 720 && f == 30)
+			) {
+				fifo_level = 120;
+			}
+			if (i2c_rd16(sd, FIFOCTL) != fifo_level) {
+				pdata->fifo_level = fifo_level;
+				i2c_wr16(sd, FIFOCTL, pdata->fifo_level);
+			}
+		}
 	} else {
 		/* Mute video so that all data lanes go to LSP11 state.
 		 * No data is output to CSI Tx block. */
@@ -1289,6 +1309,8 @@ static int tc358743_log_status(struct v4l2_subdev *sd)
 			"yes" : "no");
 	v4l2_info(sd, "CEC enabled: %s\n",
 			(i2c_rd16(sd, CECEN) & MASK_CECEN) ?  "yes" : "no");
+	v4l2_info(sd, "FIFO level: %d\n",
+			(i2c_rd16(sd, FIFOCTL)));
 	v4l2_info(sd, "-----Signal status-----\n");
 	v4l2_info(sd, "TMDS signal detected: %s\n",
 			hdmi_sys_status & MASK_S_TMDS ? "yes" : "no");
@@ -1940,7 +1962,7 @@ static int tc358743_probe_of(struct tc358743_state *state)
 	state->pdata.ddc5v_delay = DDC5V_DELAY_100_MS;
 	state->pdata.enable_hdcp = false;
 	/* A FIFO level of 16 should be enough for 2-lane 720p60 at 594 MHz. */
-	state->pdata.fifo_level = 374;
+	state->pdata.fifo_level = FIFO_LEVEL_DEFAULT;
 	/*
 	 * The PLL input clock is obtained by dividing refclk by pll_prd.
 	 * It must be between 6 MHz and 40 MHz, lower frequency is better.

@6by9
Copy link
Contributor

6by9 commented Aug 23, 2024

If tc358743_num_csi_lanes_needed used the full line time rather than the active width, I wonder if it would solve the issue by giving it a little more time to play with.

Sorry, I don't get it, about "little more time". Could you explain a bit?

https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/media/i2c/tc358743.c#L690
replace

	u32 bps = bt->width * bt->height * fps(bt) * bits_pr_pixel;

with (typos excepted)

	u32 bps = (bt->width + bt->hfrontporch + bt->hsync + bt->hbackporch) *
      (bt->height + bt->vfrontporch + bt->vsync + bt->vbackparch) * fps(bt) * bits_pr_pixel;

If using only height from 1080p50, then you'd get 20ms / 1080 as the line time, when actually you've only got 20ms / 1125. It therefore thinks it has 4% longer to send each line than it actually has.

Meanwhile, I made a shitty patch with fifo_level tuning. I can keep it for my work, or I can submit it as PR here. What you say? In my opinion, this is too specific a thing and resembles a dirty hack, on the other hand, it fully provides the capabilities of the chip (or almost completely, I can test other resolutions from my EDID).

Sorry I won't accept that hack. Any fix needs to be justifiable, and ideally upstreamed.
The above is justifiable as the line time is being computed incorrectly. (Whether it solves the whole problem is a different question).
Ideally the fifo trigger level should be computed based on the calcs in the spreadsheet for the min FIFO level, but that requires reverse engineering or unprotecting it first.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

with (typos excepted)

I see. I'll test it and come back, thank you.

Sorry I won't accept that hack. Any fix needs to be justifiable, and ideally upstreamed.

Oh, don't be sorry. As I said, this is a ugly hack and I don't like it too. If your formula works well, then I will prefer it ofc.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

 	u32 bps = (bt->width + bt->hfrontporch + bt->hsync + bt->hbackporch) *
      (bt->height + bt->vfrontporch + bt->vsync + bt->vbackporch) * fps(bt) * bits_pr_pixel;

It's not working at all for any RGB24, I'm just getting device timeout on select().

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 23, 2024

Here is a list of modes for which my hack helps:

            if (
                (bt->width == 1920 && bt->height == 1080 && f == 50)
                || (bt->width == 1280 && bt->height == 720 && (f == 30 || f == 25 || f == 24))
                || (bt->width == 1280 && bt->height == 960 && f == 60)
            ) {
                fifo_level = 120;
            }

I also found a mode where glitches look different and fifo_level tuning does not help (something changes a little, but only slightly):
image

@6by9
Copy link
Contributor

6by9 commented Aug 27, 2024

I also found a mode where glitches look different and fifo_level tuning does not help (something changes a little, but only slightly):

You don't say what that mode is.

I'm afraid that as this isn't a Raspberry Pi product the support level provided for it is relatively limited (same as third party cameras). The vendor of the board really should be providing the support, and I'm afraid I can't dedicate any significant amount of time to investigating issues.

I believe TC358743 is EOL from Toshiba now, and I don't believe we (Raspberry Pi) have any support contacts there anymore, so it rather limits options for getting support from Toshiba.
Almost any further work therefore needs to be based on reading between the lines of the datasheet/spreadsheet and effectively reverse engineering how the chip may be working. Your best bet would be using something like kmstest -r ... to generate modes at varying HDMI pixel clocks to see if there is a pattern between HDMI clock rate and CSI data rate (per lane), and therefore be able to make a hypothesis as to what is going wrong. Testing each with a low, mid, and high FIFO trigger level, and RGB24 and UYVY would expand out the data set.

You may get some further support from mainline (ie linux-media mailing list) as AIUI Cisco are using the chip in some of their video conferencing products. I think they were using a TI based platform (certainly not configured via device tree as they passed in platform data) .... https://www.spinics.net/lists/linux-media/msg116360.html they use a fifo_level of 300.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 27, 2024

You don't say what that mode is.

Sorry, 1280x1024@60Hz with RGB24, for example.

I'm afraid that as this isn't a Raspberry Pi product the support level provided for it is relatively limited

I understand and appreciate the time you are taking to do this. If you find time to throw up some hypotheses or suggest changes to the code, I'm ready to continue testing anything to help with this, again and again.

I believe TC358743 is EOL from Toshiba now, and I don't believe we (Raspberry Pi) have any support contacts there anymore, so it rather limits options for getting support from Toshiba.

In one of the topics about TC358743 and Pi5, you wrote that Raspberry will do some kind of project on this chip, so maybe fixing the magic fifo_level computing will be a win-win for us.

If I still bother you with these endless problems with Toshiba, please let me know and we will try to figure it out on our side, although it will be hard and long.

@6by9
Copy link
Contributor

6by9 commented Aug 27, 2024

You don't say what that mode is.

Sorry, 1280x1024@60Hz with RGB24, for example.

OK, I may have a quick look when time allows.

I'm afraid that as this isn't a Raspberry Pi product the support level provided for it is relatively limited

I understand and appreciate the time you are taking to do this. If you find time to throw up some hypotheses or suggest changes to the code, I'm ready to continue testing anything to help with this, again and again.

As above, I expect that the issue will be modes where:

  • the CSI2 data rate is almost the same as the HDMI data rate. Blanking will then be critical, and the FIFO trigger may need to be reduced.
  • the CSI2 data rate is significantly higher than the HDMI rate and therefore needs the FIFO trigger point to be high enough to not underflow.

Identifying which category it is would be the first step. Second category would be fixed if the FIFO trigger can be set dynamically. The first may need to increase the number of lanes (which may then trigger the second).

I believe TC358743 is EOL from Toshiba now, and I don't believe we (Raspberry Pi) have any support contacts there anymore, so it rather limits options for getting support from Toshiba.

In one of the topics about TC358743 and Pi5, you wrote that Raspberry will do some kind of project on this chip, so maybe fixing the magic fifo_level computing will be a win-win for us.

Our project is largely for CI purposes, so as long as the main resolutions work then it's probably good enough. Issues in the TC358743 are more likely to be worked around rather than investigated and fixed.

If I still bother you with these endless problems with Toshiba, please let me know and we will try to figure it out on our side, although it will be hard and long.

Feel free to ask, but you may well find the response is fairly limited.

@mdevaev
Copy link
Contributor Author

mdevaev commented Aug 27, 2024

Thanks for the tips. My HW engineer and I will carefully read the datasheet and maybe find the right way to use blanking and other things.

Our project is largely for CI purposes, so as long as the main resolutions work then it's probably good enough.

Ah, I see. This is what we are already doing with PiKVM hardware. I have some experience with this and have written code to test video capture. Let me know if you interested.

Issues in the TC358743 are more likely to be worked around rather than investigated and fixed.

Same. Solving problems in general can take too long, so a patch for certain conditions is often an acceptable solution for me.

Feel free to ask, but you may well find the response is fairly limited.

Thank you, I appreciate it.

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 2, 2024

I found a bug: in some cases we got incorrect lanes number. Adding hsync/vsync to calculations (like you did before) brokes calculations at all because of u32 overflow due to DIV_ROUND_UP() formula. So the fix looks like this:

diff --git a/drivers/media/i2c/tc358743.c b/drivers/media/i2c/tc358743.c
index 5b6f74e56..715ca429c 100644
--- a/drivers/media/i2c/tc358743.c
+++ b/drivers/media/i2c/tc358743.c
@@ -687,10 +687,15 @@ static unsigned tc358743_num_csi_lanes_needed(struct v4l2_subdev *sd)
 	struct tc358743_platform_data *pdata = &state->pdata;
 	u32 bits_pr_pixel =
 		(state->mbus_fmt_code == MEDIA_BUS_FMT_UYVY8_1X16) ?  16 : 24;
-	u32 bps = bt->width * bt->height * fps(bt) * bits_pr_pixel;
+	u32 bps = (bt->width + bt->hsync) * (bt->height + bt->vsync)
+		* fps(bt) * bits_pr_pixel;
 	u32 bps_pr_lane = (pdata->refclk_hz / pdata->pll_prd) * pdata->pll_fbd;
 
-	return DIV_ROUND_UP(bps, bps_pr_lane);
+	/*
+	 * We're don't using DIV_ROUND_UP() here because its formula
+	 * causes u32 overflow when values are too big.
+	 */
+	return (bps / bps_pr_lane) + (((bps % bps_pr_lane) != 0) ? 1 : 0);
 }
 
 static void tc358743_set_csi(struct v4l2_subdev *sd)

Also the fifo_level tuning is not required anymore. I have tested this patch for RGB24 and UYVY mode on all resolutions in my EDID above (including all 1080p variants and 1920x1200). Described bugs are not reproducible anymore.

UPD: But this is for 4lane=1 and CM4. On Pi4 and UYVY, this overestimates the required number of lanes to 3 for 1080p@50Hz (2 without a patch). It is unclear how it is supposed to be calculated in the right way.

BTW I'm not sure if porch offvalues are needed because it doesn't seem to be used in the case of TC358743. They are not used in current frame geometry calculations, so they are not needed for accurate calculation of lanes either. Correct me if I'm wrong: https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/media/i2c/tc358743.c#L342

PS: Thanks for the research direction, now I understand a little more.

@6by9
Copy link
Contributor

6by9 commented Sep 2, 2024

Great news that the theory is looking fairly plausible.

Per line period on CSI2 you need the pixel data plus 48 bits of header/footer, a line start packet , and a line end packet (each 32bits). Each packet will also need an LP to HS transition beforehand, and an HS to LP transition afterwards. The time period of those can be computed, but I forget the numbers off the top of my head.
The horizontal time period in the calculation should therefore be bt->width plus a small overhead (likely a fixed time period rather than related to resolution)

AIUI Vertically we do want to factor in all the blanking lines to avoid the CSI2 line readout period exceeding the HDMI line period.

...

Feeding the relevant values for 1080p50 UYVY into the spreadsheet. It's claiming a minimum lane speed of 898.12Mbit/s/lane on 2 lanes. (Ah, it computes the LP<>HS transition time in B61 as 675ns, based on all the magic clock times in C42-C60).
The formula in J35 for min CSI2 lane speed is only using N14 (HDMI line period), B61 (transition time), F14 (horizontal number of active pixels), and L22 (bits/pixel). So it is total line period (including blanking) on the HDMI side, but width + a smidge on the CSI2 side.

So the formula in the driver should replicate that.
The relevant frame width (including blanking) is read back from the chip in tc358743_get_detected_timings (registers H_SIZE_[HI|LO] and V_SIZE_[HI|LO]), and looking at that it is all stuck in the [hv]sync field of the timings (as you observed).
Technically userspace can write different timings to that detected by the chip, so they could redistribute the sync timing to the porch fields, and we do want to consider all of them.
I'll see what I can do to come up with something clean.

DIV_ROUND_UP is going to be preferred, even if we need to expand to u64 instead of u32 (that then can mean issues with needing do_div to make it work on 32bit systems). The "modulo == 0" case is going to be so unlikely in this situation that it can probably be a simple +1.

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 2, 2024

That's interesting... I'll check the documentation again, maybe I missed some recommendations about calculating periods.

By the way, do we really need to calculate the number of lanes dynamically? Why can't we always use the maximum number of lanes available in hardware for any resolutions, even small? Perhaps the limitations of the chip? I'm going to check it.

DIV_ROUND_UP is going to be preferred, even if we need to expand to u64 instead of u32

But why? I thought modulo might be banned in the kernel, but a cursory search showed that it is used, so it shouldn't be a big problem. And u64 is not available on 32-bit.

@6by9
Copy link
Contributor

6by9 commented Sep 2, 2024

That's interesting... I'll check the documentation again, maybe I missed some recommendations about calculating periods.

By the way, do we really need to calculate the number of lanes dynamically? Why can't we always use the maximum number of lanes available in hardware for any resolutions, even small? Perhaps the limitations of the chip? I'm going to check it.

If you increase the CSI2 total data rate, then you really come down to FIFO trigger points.

1Gbit/s/lane and 4 lanes will shift 250MPix/s. An HDMI pixel clock of say 65MHz for 1024x768@60 is 65MPix/s.
[email protected] 65.000 1024/24/136/160/- 768/3/6/29/- 60 (60.00) D
A line will take 15.75usec for the active pixels on the HDMI interface, but 4usec to forward on CSI2.
Trigger at 511pixels means that you're 7.875usec through receiving the 1024x768@60 frame. The TC358743 will complete sending the line on CSI2 in 4usec, at which point you haven't finished receiving the frame on HDMI, and the FIFO has underflowed :-(

(Plumbing the numbers into the spreadsheet claims that a FIFO delay of 376 is acceptable, but it agrees with my 4usec active period on CSI2, so I think it just doesn't take that possibility into account).

DIV_ROUND_UP is going to be preferred, even if we need to expand to u64 instead of u32

But why? I thought modulo might be banned in the kernel, but a cursory search showed that it is used, so it shouldn't be a big problem. And u64 is not available on 32-bit.

Just preference upstream - use the macros wherever possible, or rearrange the computation.
Generally u8/u16/u32/u64 in the kernel should be reserved for accessing registers, which we aren't doing here. The computations can be done with whatever sized variables as makes sense.

@6by9
Copy link
Contributor

6by9 commented Sep 2, 2024

Revised (but not tested on hardware) version of the function.

static unsigned tc358743_num_csi_lanes_needed(struct v4l2_subdev *sd)
{
	struct tc358743_state *state = to_state(sd);
	struct v4l2_bt_timings *bt = &state->timings.bt;
	struct tc358743_platform_data *pdata = &state->pdata;
	unsigned int bits_pr_pixel =
		(state->mbus_fmt_code == MEDIA_BUS_FMT_UYVY8_1X16) ?  16 : 24;
	unsigned int hdmi_hfreq = bt->pixelclock / V4L2_DV_BT_FRAME_WIDTH(bt);
	unsigned int csi2_data_rate = bt->width * bits_pr_pixel * hdmi_hfreq;
	unsigned int bps_pr_lane = (pdata->refclk_hz / pdata->pll_prd) * pdata->pll_fbd;

	if (!V4L2_DV_BT_FRAME_HEIGHT(bt) || !V4L2_DV_BT_FRAME_WIDTH(bt))
		return 0;

	return (csi2_data_rate / bps_pr_lane) + 1;
}

I believe it gives the right numbers for me with the 3 resolutions I've just tried it with (1024x768@60, 1080p50, and 1080p60).
The old code didn't check that we didn't ask for more than 4 lanes, so I'm not worrying about it here either.

The fact that tc358743_set_csi handles a return value from tc358743_num_csi_lanes_needed of 0 lanes seems a little funky. It disables lane 0 and the clock lane, which leaves it in a dead state, but why bother?

The one bit I haven't addressed is the overhead of LP<>HS transitions. If I can quantify it approximately with regard the pixel rate, then adding a bit to bt->width when computing csi2_data_rate is the easiest thing.

The commit message will need to mention that the maths is close to overflow and you're right that it can't use DIV_ROUND_UP. Max csi2_data_rate will be around 3.66Gbit/s (1920x1200@60 RGB24), so on a 972Mbit/s link you will get 4.62Gbit/s which overflows a 32bit value.

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 3, 2024

Thank you, I'll test it and let you know. The algorithm sounds very plausible.

The fact that tc358743_set_csi handles a return value from tc358743_num_csi_lanes_needed of 0 lanes seems a little funky. It disables lane 0 and the clock lane, which leaves it in a dead state, but why bother?

The strange thing is, it didn't occur to me why this might be necessary.

The commit message will need to mention that the maths is close to overflow and you're right that it can't use DIV_ROUND_UP.

I would still add rounding for paranoid reasons. We can divide both csi2_data_rate and bps_pr_lane by 1000 to get kbps, and then use DIV_ROUND_UP. Or modulo as I suggested earlier. At least in the code, this demonstrates the intention that the largest rounded value is needed.

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 9, 2024

Sorry for the delay. I encountered that division to V4L2_DV_BT_BLANKING_WIDTH() causes the error: ERROR: modpost: "__aeabi_uldivmod" [drivers/media/i2c/tc358743.ko] undefined!

@6by9
Copy link
Contributor

6by9 commented Sep 9, 2024

The "joys" of 64 bit arithmetic in 32bit kernels.

Seeing as the maximum pixel clock the TC358743 supports is around 165MHz, and that fits into a 32bit word, we can do a simple cast down from the u64 of pixelclock to an unsigned int.

Over and above the last diff

@@ -687,7 +687,7 @@ static unsigned tc358743_num_csi_lanes_needed(struct v4l2_subdev *sd)
        struct tc358743_platform_data *pdata = &state->pdata;
        u32 bits_pr_pixel =
                (state->mbus_fmt_code == MEDIA_BUS_FMT_UYVY8_1X16) ?  16 : 24;
-       unsigned int hdmi_line_freq = bt->pixelclock / V4L2_DV_BT_FRAME_WIDTH(bt);
+       unsigned int hdmi_line_freq = (unsigned int)bt->pixelclock / V4L2_DV_BT_FRAME_WIDTH(bt);
        unsigned int csi2_data_rate = bt->width * bits_pr_pixel * hdmi_line_freq;
        unsigned int bps_pr_lane = (pdata->refclk_hz / pdata->pll_prd) * pdata->pll_fbd;

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 9, 2024

Okay, it's compiling. But for 1080p@50Hz it gaves 3 lanes instead of 2.

@6by9
Copy link
Contributor

6by9 commented Sep 9, 2024

I haven't got it set up, but my numbers say 2 lanes.

hdmi_line_freq = 148500000 / (1920+528+44+148) = 56250 (56.250kHz matches HDMI spec for VIC 31).
csi2_data_rate = 1920 * 16 * 56250 = 1728000000
bps_pr_lane = (27M / 4) * 144 = 972000000

returned value = (1728000000 / 972000000) + 1 = 2.

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 9, 2024

Ah, you're using 16 bit per pixel, I'm testing on 24 bit, so csi2_data_rate=2592000000 in my case.

My previous patch gave 4 lines for 1080p@50Hz at RGB24, while the picture was correct. With your patch and 3 lines, I see artifacts like on the picture in the issue, divided into three color sections, and most likely we need to adjust the fifo_level for this.

@6by9
Copy link
Contributor

6by9 commented Sep 10, 2024

I'm confused

But for 1080p@50Hz it gaves 3 lanes instead of 2.

1080p50 has only ever worked as UYVY on 2 lanes.
1080p50 RGB has always required 3 lanes (or more), so was limited to practically limited 1080p30 when running on 2 lanes.

Was your comment meant to be "it gaves 3 lanes instead of 4"?

4 lanes are not required according to the spreadsheet. I'll accept that reality may differ from the spreadsheet, but need to be able to justify that difference.
Supposedly for 1080p50 RGB, 3 lanes will work with a FIFO delay of 0, whereas 4 lanes will require a minimum of 120.

@mdevaev
Copy link
Contributor Author

mdevaev commented Sep 10, 2024

Was your comment meant to be "it gaves 3 lanes instead of 4"?

I'm already confused myself.

Talking about 1080p50: your formula gaves 2 lanes for UYVY and 3 lanes for RGB24. And it requires fifo_level correction, as you indicated, I'll try.

My previous patch gaves 4 lanes for RGB24, and it was usable with default fifo_level=374.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants