-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have you experienced odd instability issues with Gowin FPGAs / Sipeed Tang Nano 4K, 9K and 20K? #169
Comments
Very interesting. We have been having a particularly persistent issue with bigger designs breaking down, but I doubt it's related. An obvious test would be to run the flipflop drainer on Apicula and see what happens. It's also possible to synthesize a design with yosys and pnr with vendor tools. In particular I'd be curious to see what happens when synthesizing with yosys with the In our particular issue, these options improve reliability a lot, and we don't know why. We suspect some timing problem or very insidious PnR bug, so it'd be interesting to see if these options have any effect on vendor tools. For the very adventurous, there are also these Nextpnr alpha and beta options to tweak that adjust the density of PnR, which might help to prove the interference theory. I have seen some weak evidence that messing with those sometimes makes the failure go away. But note that in our case they are pretty hard failures, not occasional glitches. Still, who knows if there is a connection. |
Hi @pepijndevos , so great to read your insights.
This description actually matches what I am seeing in our real project quite closely. We have a project that works seemingly well when the FPGA utilization is around 30%-40%, but when including more sub-components that increase to higher utilization, things begin to fall apart, even when timing closure should be achieved. When all of our features are active, we have about 90% utilization rate of the FPGA, and even though we should have timing closure, the code is still wholly unstable, and we have been struggling to find fault in our timing constraints, or other electrical design or other considerations. Removing most of the nonessential features, and just running with any one of the subfeatures active, we find that subfeature to be stable and pass testing - but just are unable to activate all of them at the same time. Video sync stability is the main/most sensitive issue that we are seeing, but other aspects of FPGA computation are also failing, not just as often/immediately as video sync deteriorates. Typically making random trivial one-liner changes more or less anywhere in the project verilog files gives a random chances of the build to start working. The gowin_flipflop_drainer repository has been my best attempt so far to capture the issue out to a reproducible repository. I used a HDMI output as a test since that is where we most easily see the issues to happen. There is this one interesting behavior I have repeatedly observed, which I find to be counterintuitive and cannot currently explain. The hypothesis we got from Gowin was that that the flip flops in the FPGA would be generating noise in the clock route of the chip, which then in turn would manifest in clock jitter. And the solution would be to utilize However, in our tests, we found that neither a) playing with In fact, when eyeballing, the achieved timing Fmax does not seem to correlate at all with the issues. If we have +100MHz of timing slack, the design might be faulty in stress tests, and also even the opposite, sometimes having a git commit with a negative slack of -20MHz or so, we might have passing stress tests. I.e. timing analysis does not seem to correlate with the issue. But the one thing we repeatedly have found to correlate with improved stability, is to minimize the amount of FPGA resources used, but even more specifically, minimize the number of flip flop registers that are used, at the expense of achieving worse timing. As an example story, in https://github.com/juj/gowin_flipflop_drainer/blob/main/src/hdmi.v#L1-L221 I implement this massive 19 clock cycle long pipeline that performs TMDS encoding. This encoding runs at the fastest video pixel clock speeds. It is unit tested with cocotb against the reference TMDS encoding function implementation, written in C++ against the DVI-D spec PDF pafe 29 to test it provides the correct TMDS output on all possible inputs. It is excessively pipelined to provide as much timing slack as possible on the Sipeed Tang Nano 9K with Gowin speed grade C6/I5. This implementation has not worked out particularly stable. It is included in that gowin_flipflop_drainer where it shows. But it performs well in timing closure in Gowin's timing analyzer reports. Then we got a hold of Gowin's faster speed grade C7/I6 chips, and repeated the same tests with that variant. Stability of the design was not improved. However, with the faster speed grade, I was now able to tear down a lot of that long pipelined TMDS encoding, and in the end I shortened it down to just 5 clock cycles from the original 19. Timing analysis would perform much worse at Fmax, although still just fast enough to close our timing constraints. Same cocotb unit tests were run to ensure that the new implementation computed the exact same function. Surprisingly, in test runs, I found that this 5 clock cycle implementation was much more stable than the original 19 clock cycles one. With this implementation, we actually got our first "completely stripped down from nonessential features"-build to pass all stress tests. This kind of observation is odd, because it looks like worsening timing behavior by introducing more combinational logic is helping stability, instead of the opposite. Which makes me believe that clock jitter would not be the root problem. I'd love to try out Apicula, although I am currently on Windows and it seems Apicula would work best on a Linux system? I may have to set one up to give it a go. Would you be able to say, glancing at the code in https://github.com/juj/gowin_flipflop_drainer/tree/main/src , does it look like the primitives that it currently uses would be expected to work with Apicula e.g. on Sipeed Tang Nano 4K or 9K board? |
It's not as heavily tested on Windows, but... should hopefully work. You might give Yowasp or OSS CAD Suite a try for easy installation. So PLL support is fairly experimental on our end, and some of the IO primitives like DDR and SERDES are also kinda new or not supported. At a glance I did not see immediately what you are using. But the Yosys+vendor combination should support all the primitives. Not a lot of people use that combination, but it should support all the PLL and IO primitives the vendor knows about. This is the path I'm most interested in seeing if you see any difference with the |
Gowin_flipflop_drainer is using:
All other code is generic Verilog. |
Unfortunately, OSER10 and the entire OSERx family still only operates on the dying GW1N-1 chip. Support for 4k and 9k is planned to appear not earlier than the middle of March. CLKDIV and ELVDS - no information about the timing of support, although work in the latter direction was underway. |
Understood. Thanks for working on it, this has amazing potential! |
I noticed that you place registers in IO cells. Have you tried turning that off? |
The workload is tuned to maximize utilization achieved on Gowin 1.9.8.10 on Windows. Gowin attempts to be deterministic with its PnR in a specific version, not sure what are the conditions that would cause it to pass or fail. You can adjust the workload size smoothly by modifying one line of code at Try e.g. replacing that with See the paragraph in README:
I actually haven't, that will be an interesting test to try. I'll give this a go to see how it behaves. |
Tried now adjusting the Also gave another go at the However, something that I now find that does have an effect is that a few weeks ago I got a new Tang Nano 4K from Sipeed by mail. It does perform slightly better than the old one. Old one is a C6/I5 speed grade: New one is a C7/I6 speed grade: Old Nano 4K starts glitching at 1024x768@70Hz @ 65.88 MHz pixel clock, whereas I see the new Nano 4K to start glitching only at 1280x1024@60Hz @ 108.00 MHz pixel clock. In both cases timing closure is good, e.g. with a ~+27MHz margin for the C6/I5 speed grade. And in both cases removing the flip_flop_drainer module from the build with line fixes both boards up so they output a stable video at 1600x1200@57Hz @ 118.80 MHz pixel clock. (btw there is a separate branch |
was able to build, "-noalu -nowidelut -nodffe" do not fix the situation. Picture with artifacts. GW1N-9k c6/i5 out <= ^a700; |
Thanks for testing! Do you also see the same effect that if you set |
Do the images I sent you also show artifacts? Maybe I have a bad cable to the TV :) I'll try ^a0, it will take some time - there are a certain number of manual edits. |
Trying these .fs files out on Sipeed Tang Nano 9K, I see that |
I see.
|
I should maybe clarify that the failure I described above (individual vertical stripes of glitching pixels that periodically repeat on different x coordinate columns) I have seen is quite similar to a failure mode I have observed before on this test case, so on my end I see the similar looking problem with these builds as I am seeing when using Gowin's toolchain. The nodffe-noalu-nowidelut option does seem to do something good for the signal, since it does then at least produce output video, rather than keeping the video completely blank. |
Official Gowinsemi representative said: "We have people reviewed this post a while ago. The conclusion is the Tang Nano boards did not properly bring the True LVDS Ios to the HDMI/DVI port. The user case is using an emulated LVDS which is not the best performance IO for such application. When the video resolution increase, they are just not up to the tasks." |
Thanks for the reply here. This is something that I am well aware, and disappointing to hear that they ignored my follow-up email. When they wrote their original report to me, they did state "the issue is with TLVDS vs ELVDS", so I diligently tested the effect of True LVDS vs Emulated LVDS, and then reported back to them that actually the behavior of True LVDS in the given test case was found to be even worse than with Emulated LVDS. I found it surprising that they even brought this up as an "issue". The test case I had provided to them did use TLVDS by default on Tang Nano 4K: Their report had stated that they had only Tang Nano 4K to test, and the above code I had given to them did use TLVDS and not ELVDS in those tests. The above code utilizes ELVDS only on Tang Nano 9K, which is because Sipeed has wired 9K in a way that using TLVDS for HDMI output is not possible. (it is not available on the HDMI pins). From Reddit I have read that the reason that Sipeed would have done did this is that Gowin's TLVDS implementation was found to provide unsuitable voltage levels for HDMI output use cases, that some displays might not be compatible, and ELVDS allowed changing the voltage levels to be more appropriate. We have observed the same voltage level difference in our own tests, but I do not know enough to say how much that would actually affect compatibility. In any case, I did reply to Gowin's report as a follow-up that the test case does use TLVDS, and actually utilizing ELVDS and not TLVDS was observed in practice to provide better stability on Tang Nano 4K, complete opposite to what their report was claiming - but they never replied back again on any of this. In summary:
I have gotten a silent treatment from Gowin after this, unfortunately. One of their sales representatives did reply briefly afterwards, and suggested that I would try using As an anecdotal data point: In our own tests since, I have found some remedy in our own actual design by "blacklisting" certain PLL frequencies. We use a 27 MHz oscillator, and based on input video, generate a varying video output pixel clock frequency between 25 MHz - 118.8 MHz. I.e. the max rated PLL pixel clock for video by Gowin is 118.8 MHz . (I have tried overclocking up to 148.5 MHz). I find that banning PLL output frequencies between 100.8 - 111.6 MHz helps video signal stability in our case. That is, we only allow >= 113.4 MHz and <= 99.9 MHz video pixel clocks. With that blacklist, we have seen drastically fewer issues in practice. What is peculiar is that I can overclock the board to 145.8 MHz pixel clock and have it be stable, but then lower the video pixel clock to e.g. 102.6 MHz, and all the signal stability issues come back, depending on random luck. However I don't know if this "there are suspect PLL frequencies" issue is the same as the https://github.com/juj/gowin_flipflop_drainer/ test case in particular, so I have tried to not conflate this issue with the general conversation/test case repro in juj/gowin_flipflop_drainer. |
Another communication problem we had was Gowin said that their DVI TX IP Core (IPUG938.pdf) has a timing limitation that it is only specced to work up to 80 MHz, and when they were seeing the issue occur on Tang Nano 4K at 83.7 MHz, they stated that is faster than what the DVI TX IP block would support. I tried to explain to them that but unfortunately we were not able to reach a reply from them afterwards. |
juj do you see this behaviour also on official gowinsemi development boards? Gowin representative is pretty confident that the board design is faulty. The only way to get Gowinsemi involved again is to demo the fault on their board. |
I unfortunately do not have one of Gowin's devboards to test. :( |
Can you get one? |
I do not at the moment operate under a registered company that would be able to order from electronics wholesalers (I am however looking into organizing that to change in the future). So Mouser is the only company that serves retail customers, but unfortunately they are giving "restricted availability" to the boards I see to have a HDMI port, and the prices that they do list for some devkits (without a HDMI port) would be too high (177eur and 452 eur). I could try asking Gowin if they would be able to send me one directly. About a year and a half ago a friend of mine did, to which they politely replied that their policy was not to send devkits to individuals, but maybe this situation would be different. |
Send me your postal address to [email protected] with the board you want to have. I will figure out something. Which one would you prefer: http://www.gowinsemi.com.cn/clients_view.aspx?TypeId=21&Id=747 |
Thanks for the very kind offer. I sent you an email now for a follow-up. |
Hi Apicula authors,
I'd like to cross-reference your experience regarding a board stability issue that I am seeing to affect Gowin's FPGAs when using Gowin's own tools.
Check out https://github.com/juj/gowin_flipflop_drainer/ and https://www.reddit.com/r/FPGA/comments/101pagf/sipeed_tang_nano_4k_9k_gowin_fpgas_become/ for details.
I am wondering if there have been anything similar happening in your experience?
The text was updated successfully, but these errors were encountered: