Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT when using more than 1 queue #30

Open
cyanide-burnout opened this issue Sep 15, 2020 · 6 comments
Open

SEGFAULT when using more than 1 queue #30

cyanide-burnout opened this issue Sep 15, 2020 · 6 comments

Comments

@cyanide-burnout
Copy link

cyanide-burnout commented Sep 15, 2020

We have issue when we are using than 1 queue on ixgbe

Stack trace:
1 pkt_buf_free
2 ixgbe_tx_batch

Both queues are processed in single thread, we tried to use single mempool as well as mempool-per-queue. The result is the same - any other queue except #0 caught.

@marcofaltelli
Copy link

Same problem for me. Fixed it in ixgbe.c, these instructions must be applied for both the RX and TX structures:

  • declaring MAX_RX_QUEUE_ENTRIES through a #define rather than a const int
  • in the ixgbe_rx_queue, the *virtual addresses[] field is now void* virtual_addresses[MAX_RX_QUEUE_ENTRIES]
  • at this calloc(), take away the + sizeof(void*) * ... part

There are also other ways to solve this. However, this is probably the most immediate way.
BTW, shout-out to the authors for the nice work!

@cyanide-burnout
Copy link
Author

Yes, this patch works very well. And by using it we reached performance of DPDK‘s ixgbe twice with less CPU resources.

@emmericp
Copy link
Owner

And by using it we reached performance of DPDK‘s ixgbe twice with less CPU resources.

That seems unlikely, can you elaborate on the exact comparison? The batch from the comment above basically turns a flexible array member into a fixed-size array which, since we don't have any bounds checks anyways, should not make a difference for performance...

The only thing that I can think of is false sharing, but

  • the size of th calloc doesn't change (with the patch it gets it from the now bigger struct vs. a calculation, should be the exact same)
  • this is not DMA-able memory, so it would only be relevant to multi-threaded code which this bug doesn't mention so far

@cyanide-burnout
Copy link
Author

cyanide-burnout commented Feb 18, 2022

After applying the fix i rewrote my code. Now transmits in multiple threads. 2-4 queues per thread. It’s production system, which can use different methods to accelerate transmission/reception of UDP. Since I wrote several backends to provide transmission by using optimised sockets, PACKET_MMAP, XDP, DPDK and Ixy, we did some bench tests. @stefansaraev did testing, so i would like to ask him to publish results. But anyway the performance of ixy is quite surprised, probably due to buggly implementation of ixgbe by Intel and unnecessarily complicated code. At least I have found some bugs in the code of their implementation of XDP (xdp-project/xdp-tutorial#273).

@cyanide-burnout
Copy link
Author

cyanide-burnout commented Feb 19, 2022

Your bug in two words: wrong interpretation of array access. Just imagine, which address has, for example, rx_queue[1] since the size of struct ixgbe_rx_queue doesn't include actual length of virtual_addresses[] and virtual_addresses is not a pointer to array somewhere else. Typical overlapping, where rx_queue[1] overlaps rx_queue[0].virtual_addresses.

@cyanide-burnout
Copy link
Author

cyanide-burnout commented Feb 19, 2022

I've fixed my comment. Correct explanation is there. 01:30 AM here, I have to be in bed ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants