Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlx5网卡 运行dpvs出现Segmentation fault #991

Open
lsq647 opened this issue Aug 23, 2024 · 5 comments
Open

mlx5网卡 运行dpvs出现Segmentation fault #991

lsq647 opened this issue Aug 23, 2024 · 5 comments
Assignees

Comments

@lsq647
Copy link

lsq647 commented Aug 23, 2024

环境:

  • linux: Ubuntu 20.04
  • dpdk:dpdk-stable-20.11.1
  • nic:mlx5 ConnectX-6 Dx

使用的配置文件:./conf/dpvs.conf.single-nic.sample

启动指令,只绑定一个网卡:
./bin/dpvs -c ./conf/dpvs.conf.single-nic.sample -- -a 0000:b1:00.1

日志:

current thread affinity is set to FFFFFFFFFFFFFFFF
EAL: Detected 112 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:b1:00.1 (socket 1)
common_mlx5: RTE_MEM is selected.
mlx5_pci: Size 0xFFFF is not power of 2, will be aligned to 0x10000.
EAL: No legacy callbacks, legacy socket not created
DPVS: dpvs version: 1.9-6, build on 2024.08.23.17:26:48
DPVS: dpvs-conf-file: ./conf/dpvs.conf.single-nic.sample
DPVS: dpvs-pid-file: /var/run/dpvs.pid
DPVS: dpvs-ipc-file: /var/run/dpvs.ipc
CFG_FILE: Opening configuration file './conf/dpvs.conf.single-nic.sample'.
CFG_FILE: log_level = WARNING
NETIF: dpdk0:rx_queue_number = 8
NETIF: worker cpu1:dpdk0 rx_queue_id += 0
NETIF: worker cpu1:dpdk0 tx_queue_id += 0
NETIF: worker cpu2:dpdk0 rx_queue_id += 1
NETIF: worker cpu2:dpdk0 tx_queue_id += 1
NETIF: worker cpu3:dpdk0 rx_queue_id += 2
NETIF: worker cpu3:dpdk0 tx_queue_id += 2
NETIF: worker cpu4:dpdk0 rx_queue_id += 3
NETIF: worker cpu4:dpdk0 tx_queue_id += 3
NETIF: worker cpu5:dpdk0 rx_queue_id += 4
NETIF: worker cpu5:dpdk0 tx_queue_id += 4
NETIF: worker cpu6:dpdk0 rx_queue_id += 5
NETIF: worker cpu6:dpdk0 tx_queue_id += 5
NETIF: worker cpu7:dpdk0 rx_queue_id += 6
NETIF: worker cpu7:dpdk0 tx_queue_id += 6
NETIF: worker cpu8:dpdk0 rx_queue_id += 7
NETIF: worker cpu8:dpdk0 tx_queue_id += 7
SAPOOL: sapool_filter_enable = on
IPVS: dp_vs_conn_init: lcore 9: nothing to do.
IPVS: dp_vs_conn_init: lcore 10: nothing to do.
IPVS: dp_vs_conn_init: lcore 11: nothing to do.
IPVS: dp_vs_conn_init: lcore 12: nothing to do.
IPVS: dp_vs_conn_init: lcore 13: nothing to do.
IPVS: dp_vs_conn_init: lcore 14: nothing to do.
IPVS: dp_vs_conn_init: lcore 15: nothing to do.
IPVS: dp_vs_conn_init: lcore 16: nothing to do.
IPVS: dp_vs_conn_init: lcore 17: nothing to do.
IPVS: dp_vs_conn_init: lcore 18: nothing to do.
IPVS: dp_vs_conn_init: lcore 19: nothing to do.
IPVS: dp_vs_conn_init: lcore 20: nothing to do.
IPVS: dp_vs_conn_init: lcore 21: nothing to do.
IPVS: dp_vs_conn_init: lcore 22: nothing to do.
IPVS: dp_vs_conn_init: lcore 23: nothing to do.
IPVS: dp_vs_conn_init: lcore 24: nothing to do.
IPVS: dp_vs_conn_init: lcore 25: nothing to do.
IPVS: dp_vs_conn_init: lcore 26: nothing to do.
IPVS: dp_vs_conn_init: lcore 27: nothing to do.
IPVS: dp_vs_conn_init: lcore 28: nothing to do.
IPVS: dp_vs_conn_init: lcore 29: nothing to do.
IPVS: dp_vs_conn_init: lcore 30: nothing to do.
IPVS: dp_vs_conn_init: lcore 31: nothing to do.
IPVS: dp_vs_conn_init: lcore 32: nothing to do.
IPVS: dp_vs_conn_init: lcore 33: nothing to do.
IPVS: dp_vs_conn_init: lcore 34: nothing to do.
IPVS: dp_vs_conn_init: lcore 35: nothing to do.
IPVS: dp_vs_conn_init: lcore 36: nothing to do.
IPVS: dp_vs_conn_init: lcore 37: nothing to do.
IPVS: dp_vs_conn_init: lcore 38: nothing to do.
IPVS: dp_vs_conn_init: lcore 39: nothing to do.
IPVS: dp_vs_conn_init: lcore 40: nothing to do.
IPVS: dp_vs_conn_init: lcore 41: nothing to do.
IPVS: dp_vs_conn_init: lcore 42: nothing to do.
IPVS: dp_vs_conn_init: lcore 43: nothing to do.
IPVS: dp_vs_conn_init: lcore 44: nothing to do.
IPVS: dp_vs_conn_init: lcore 45: nothing to do.
IPVS: dp_vs_conn_init: lcore 46: nothing to do.
IPVS: dp_vs_conn_init: lcore 47: nothing to do.
IPVS: dp_vs_conn_init: lcore 48: nothing to do.
IPVS: dp_vs_conn_init: lcore 49: nothing to do.
IPVS: dp_vs_conn_init: lcore 50: nothing to do.
IPVS: dp_vs_conn_init: lcore 51: nothing to do.
IPVS: dp_vs_conn_init: lcore 52: nothing to do.
IPVS: dp_vs_conn_init: lcore 53: nothing to do.
IPVS: dp_vs_conn_init: lcore 54: nothing to do.
IPVS: dp_vs_conn_init: lcore 55: nothing to do.
IPVS: dp_vs_conn_init: lcore 56: nothing to do.
IPVS: dp_vs_conn_init: lcore 57: nothing to do.
IPVS: dp_vs_conn_init: lcore 58: nothing to do.
IPVS: dp_vs_conn_init: lcore 59: nothing to do.
IPVS: dp_vs_conn_init: lcore 60: nothing to do.
IPVS: dp_vs_conn_init: lcore 61: nothing to do.
IPVS: dp_vs_conn_init: lcore 62: nothing to do.
IPVS: dp_vs_conn_init: lcore 63: nothing to do.
IPVS: dp_vs_conn_init: lcore 64: nothing to do.
IPVS: dp_vs_conn_init: lcore 65: nothing to do.
IPVS: dp_vs_conn_init: lcore 66: nothing to do.
IPVS: dp_vs_conn_init: lcore 67: nothing to do.
IPVS: dp_vs_conn_init: lcore 68: nothing to do.
IPVS: dp_vs_conn_init: lcore 69: nothing to do.
IPVS: dp_vs_conn_init: lcore 70: nothing to do.
IPVS: dp_vs_conn_init: lcore 71: nothing to do.
IPVS: dp_vs_conn_init: lcore 72: nothing to do.
IPVS: dp_vs_conn_init: lcore 73: nothing to do.
IPVS: dp_vs_conn_init: lcore 74: nothing to do.
IPVS: dp_vs_conn_init: lcore 75: nothing to do.
IPVS: dp_vs_conn_init: lcore 76: nothing to do.
IPVS: dp_vs_conn_init: lcore 77: nothing to do.
IPVS: dp_vs_conn_init: lcore 78: nothing to do.
IPVS: dp_vs_conn_init: lcore 79: nothing to do.
IPVS: dp_vs_conn_init: lcore 80: nothing to do.
IPVS: dp_vs_conn_init: lcore 81: nothing to do.
IPVS: dp_vs_conn_init: lcore 82: nothing to do.
IPVS: dp_vs_conn_init: lcore 83: nothing to do.
IPVS: dp_vs_conn_init: lcore 84: nothing to do.
IPVS: dp_vs_conn_init: lcore 85: nothing to do.
IPVS: dp_vs_conn_init: lcore 86: nothing to do.
IPVS: dp_vs_conn_init: lcore 87: nothing to do.
IPVS: dp_vs_conn_init: lcore 88: nothing to do.
IPVS: dp_vs_conn_init: lcore 89: nothing to do.
IPVS: dp_vs_conn_init: lcore 90: nothing to do.
IPVS: dp_vs_conn_init: lcore 91: nothing to do.
IPVS: dp_vs_conn_init: lcore 92: nothing to do.
IPVS: dp_vs_conn_init: lcore 93: nothing to do.
IPVS: dp_vs_conn_init: lcore 94: nothing to do.
IPVS: dp_vs_conn_init: lcore 95: nothing to do.
IPVS: dp_vs_conn_init: lcore 96: nothing to do.
IPVS: dp_vs_conn_init: lcore 97: nothing to do.
IPVS: dp_vs_conn_init: lcore 98: nothing to do.
IPVS: dp_vs_conn_init: lcore 99: nothing to do.
IPVS: dp_vs_conn_init: lcore 100: nothing to do.
IPVS: dp_vs_conn_init: lcore 101: nothing to do.
IPVS: dp_vs_conn_init: lcore 102: nothing to do.
IPVS: dp_vs_conn_init: lcore 103: nothing to do.
IPVS: dp_vs_conn_init: lcore 104: nothing to do.
IPVS: dp_vs_conn_init: lcore 105: nothing to do.
IPVS: dp_vs_conn_init: lcore 106: nothing to do.
IPVS: dp_vs_conn_init: lcore 107: nothing to do.
IPVS: dp_vs_conn_init: lcore 108: nothing to do.
IPVS: dp_vs_conn_init: lcore 109: nothing to do.
IPVS: dp_vs_conn_init: lcore 110: nothing to do.
IPVS: dp_vs_conn_init: lcore 111: nothing to do.
NETIF: Ethdev port_id=0 invalid tx_offload: 0x1000e, valid value: 0xc96af
mlx5_pci: Failed to init cache list FDB_ingress_0_matcher_cache entry (nil).
mlx5_pci: Failed to init cache list FDB_ingress_0_matcher_cache entry (nil).
mlx5_pci: Failed to init cache list FDB_ingress_0_matcher_cache entry (nil).
mlx5_pci: Failed to init cache list FDB_ingress_0_matcher_cache entry (nil).
Segmentation fault (core dumped)

请问一下,这个报错原因是不是因为:mlx5 驱动的问题?

@lsq647
Copy link
Author

lsq647 commented Aug 27, 2024

补充一下coredump的日志:

(gdb) bt
#0  __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:268
#1  0x0000559408d48e58 in inet_addr_equal (af=10, a1=0x1c, a2=0x7fcb02fc9290) at /home/infra/dpvs-1.9.6/src/inet.c:137
#2  0x0000559408d4bfed in imc_lookup (af=10, idev=0x2200201740, maddr=0x7fcb02fc9290) at /home/infra/dpvs-1.9.6/src/inetaddr.c:111
#3  0x0000559408d4c160 in idev_mc_add (af=10, idev=0x2200201740, maddr=0x7fcb02fc9290) at /home/infra/dpvs-1.9.6/src/inetaddr.c:139
#4  0x0000559408d4c5c3 in idev_add_mcast_init (args=0x22006002c0) at /home/infra/dpvs-1.9.6/src/inetaddr.c:246
#5  0x0000559408ad710f in eal_thread_loop.cold ()
#6  0x00007fcb2cec4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007fcb2c8be133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@ywc689
Copy link
Collaborator

ywc689 commented Aug 28, 2024

具体指定一下DPVS使用的CPU试试

./bin/dpvs -c ./conf/dpvs.conf.single-nic.sample -- -a 0000:b1:00.1 -l 1-9

@lsq647
Copy link
Author

lsq647 commented Aug 28, 2024

具体指定一下DPVS使用的CPU试试

./bin/dpvs -c ./conf/dpvs.conf.single-nic.sample -- -a 0000:b1:00.1 -l 1-9

非常感谢!这样看起来是可以正常运行的。👍🏻

但是对于报错的原因还是不太理解,请问这个为什么会报错呢?

@ywc689
Copy link
Collaborator

ywc689 commented Aug 30, 2024

Segmentation fault 原因是系统的 CPU合数超过了 DPVS定义的 DPVS_MAX_LCORE 宏,一些模块初始化时访问了非法的内存地址。
我们后面修复下这个问题。不过 DPVS 还是建议启动时用 DPDK 命令行参数明确指定使用的 CPU 和网卡。

@lsq647
Copy link
Author

lsq647 commented Aug 30, 2024

Segmentation fault 原因是系统的 CPU合数超过了 DPVS定义的 DPVS_MAX_LCORE 宏,一些模块初始化时访问了非法的内存地址。 我们后面修复下这个问题。不过 DPVS 还是建议启动时用 DPDK 命令行参数明确指定使用的 CPU 和网卡。

了解了,非常感谢!👍

ywc689 added a commit to ywc689/dpvs that referenced this issue Sep 2, 2024
…ber is over DPVS_MAX_LCORE.

Fixed issue iqiyi#991.

Signed-off-by: ywc689 <[email protected]>
@ywc689 ywc689 self-assigned this Sep 6, 2024
@ywc689 ywc689 mentioned this issue Sep 9, 2024
ywc689 added a commit to ywc689/dpvs that referenced this issue Sep 13, 2024
…ber is over DPVS_MAX_LCORE.

Fixed issue iqiyi#991.

Signed-off-by: ywc689 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants