-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ovn_northd.dl: Use new OVSDB bindings. #3
Open
ryzhyk
wants to merge
5
commits into
blp:ddlog
Choose a base branch
from
ryzhyk:ddlog_new_ovsdb_bindings
base: ddlog
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
With the new ovsdb2ddlog adapter generator, DDlog output tables have the exact same schema as OVSDB tables. The notions of uuid_names and uuid_or_str type are gone. Instead every output record must contain a `_uuid` field. UUIDs are also uniformly used as cross-table references. Portin northd logic to the new schema revealed two bugs that were somehow masked previously: - Handling of disabled routers: we did not generate a datapath_binding for a disabled router, but we would still generate Port_Bindings's for it, causing referential integrity violations. - Similar bug for HA_Chassis_Group's: we would not create a group in the SB database if the group has no chassis assigned to it, but we would still reference such a group from other tables. Most changes in this commit are straightforward, but there are a couple of issues that require work: - We sometimes use `hash128` from std lib to manufacture UUIDs by hashing the subset of unique columns of a record. In case of the `Logical_Flow` table this includes all columns, and is going to be expensive. - My changes to HA_Chassis logic are a bit hacky and need to be cleaned up.
libovn_northd_ddlog must preceed libovn and other libraries it depends on.
ryzhyk
force-pushed
the
ddlog_new_ovsdb_bindings
branch
from
May 1, 2020 19:05
c867bb2
to
90f0e07
Compare
These affect performance during scale testing.
The rule computed a complete Cartesian product of LSP and LRPs. I re-wrote it to perform an inner join instead.
blp
pushed a commit
that referenced
this pull request
Jun 3, 2020
When a new conntrack zone is entered, the ct_state field is zeroed in order to avoid using state information from different zones. One such scenario is when a packet is double NATed. Assuming two zones and 3 flows performing the following actions in order on the packet: 1. ct(zone=5,nat), recirc 2. ct(zone=1), recirc 3. ct(zone=1,nat) If at step #1 the packet matches an existing NAT entry, it will get translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT. At step #2 the new tuple might match an existing connection and pkt->md.ct_zone is set to 1. If at step #3 the packet matches an existing NAT entry in zone 1, handle_nat() will be called to perform the translation but it will return early because the packet's zone matches the conntrack zone and the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the translations in zone 5. In order to reliably detect when a packet enters a new conntrack zone we also need to make sure that the pkt->md.ct_zone is properly initialized if pkt->md.ct_state is non-zero. This already happens for most cases. The only exception is when matched conntrack connection is of type CT_CONN_TYPE_UN_NAT and the master connection is missing. To cover this path we now call write_ct_md() in that case too. Remove setting the CS_TRACKED flag as in this case as it will be done by the new call to write_ct_md(). CC: Darrell Ball <[email protected]> Fixes: 286de27 ("dpdk: Userspace Datapath: Introduce NAT Support.") Acked-by: Ilya Maximets <[email protected]> Acked-by: Aaron Conole <[email protected]> Signed-off-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
blp
pushed a commit
that referenced
this pull request
Nov 4, 2020
The 'nexthop' that passed to ic_route_hash() is not fully initialized in get_nexthop_from_lport_addresses(). 'nexthop' has type of 'struct v46_ip' which contains a union to share space for ipv4 and ipv6 address. If only ipv4 initialized where is a plenty of uninitialized space that goes to hash_bytes(nexthop, sizeof *nexthop, basis). Impact: there are two places where this function is called. 1. In add_to_routes_ad(), the nexthop is initialized in parse_route() before calling get_nexthop_from_lport_addresses(), luckily. 2. In add_network_to_routes_ad(), we are unlucky. When a directly connected network of a router is found to be advertised, if the route already existed in the global IC-SB, it may not be found due to the hash difference, and results in the existing route being deleted and the same one recreated, unnecessarily. This patch fixes the problem by initializing the struct to zero before setting the fields. From Ilya's report: > Report from MemorySanitizer: > > ==3074629==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x67177e in mhash_add__ ovs/./lib/hash.h:66:9 > #1 0x671668 in mhash_add ovs/./lib/hash.h:78:12 > #2 0x6701e9 in hash_bytes ovs/lib/hash.c:38:16 > #3 0x524b4a in add_network_to_routes_ad ic/ovn-ic.c:1095:5 > #4 0x51eea3 in route_run ic/ovn-ic.c:1424:21 > #5 0x51887b in main ic/ovn-ic.c:1674:17 > #6 0x7fd4ce7871a2 in __libc_start_main > #7 0x49c90d in _start (ic/ovn-ic+0x49c90d) > > Uninitialized value was created by an allocation of 'nexthop' in the > stack frame of function 'add_network_to_routes_ad' > #0 0x5245f0 in add_network_to_routes_ad ic/ovn-ic.c:1069 Reported-by: Ilya Maximets <[email protected]> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-October/376160.html Fixes: 57b347c ("ovn-ic: Route advertisement.") Acked-by: Numan Siddique <[email protected]> Signed-off-by: Han Zhou <[email protected]>
blp
pushed a commit
that referenced
this pull request
Nov 21, 2020
OVS uses memcmp to compare actions of existing and new flows, but 'struct ofp_ed_prop_nsh_md_type' and corresponding ofpact structure has 3 bytes of padding that never initialized and passed around within OF data structures and messages. Uninitialized bytes in MemcmpInterceptorCommon at offset 21 inside [0x7090000003f8, 136) WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x4a184e in bcmp (vswitchd/ovs-vswitchd+0x4a184e) #1 0x896c8a in ofpacts_equal lib/ofp-actions.c:9121:31 #2 0x564403 in replace_rule_finish ofproto/ofproto.c:5650:37 #3 0x563462 in add_flow_finish ofproto/ofproto.c:5218:13 #4 0x54a1ff in ofproto_flow_mod_finish ofproto/ofproto.c:8091:17 #5 0x5433b2 in handle_flow_mod__ ofproto/ofproto.c:6216:17 #6 0x56a2fc in handle_flow_mod ofproto/ofproto.c:6190:17 #7 0x565bda in handle_single_part_openflow ofproto/ofproto.c:8504:16 #8 0x540b25 in handle_openflow ofproto/ofproto.c:8685:21 #9 0x6697fd in ofconn_run ofproto/connmgr.c:1329:13 #10 0x668e6e in connmgr_run ofproto/connmgr.c:356:9 #11 0x53f1bc in ofproto_run ofproto/ofproto.c:1890:5 #12 0x4ead0c in bridge_run__ vswitchd/bridge.c:3250:9 #13 0x4e9bc8 in bridge_run vswitchd/bridge.c:3309:5 #14 0x51c072 in main vswitchd/ovs-vswitchd.c:127:9 #15 0x7f23a99011a2 in __libc_start_main (/lib64/libc.so.6) #16 0x46b92d in _start (vswitchd/ovs-vswitchd+0x46b92d) Uninitialized value was stored to memory at #0 0x4745aa in __msan_memcpy.part.0 (vswitchd/ovs-vswitchd) #1 0x54529f in rule_actions_create ofproto/ofproto.c:3134:5 #2 0x54915e in ofproto_rule_create ofproto/ofproto.c:5284:11 #3 0x55d419 in add_flow_init ofproto/ofproto.c:5123:17 #4 0x54841f in ofproto_flow_mod_init ofproto/ofproto.c:7987:17 #5 0x543250 in handle_flow_mod__ ofproto/ofproto.c:6206:13 #6 0x56a2fc in handle_flow_mod ofproto/ofproto.c:6190:17 #7 0x565bda in handle_single_part_openflow ofproto/ofproto.c:8504:16 #8 0x540b25 in handle_openflow ofproto/ofproto.c:8685:21 #9 0x6697fd in ofconn_run ofproto/connmgr.c:1329:13 #10 0x668e6e in connmgr_run ofproto/connmgr.c:356:9 #11 0x53f1bc in ofproto_run ofproto/ofproto.c:1890:5 #12 0x4ead0c in bridge_run__ vswitchd/bridge.c:3250:9 #13 0x4e9bc8 in bridge_run vswitchd/bridge.c:3309:5 #14 0x51c072 in main vswitchd/ovs-vswitchd.c:127:9 #15 0x7f23a99011a2 in __libc_start_main (/lib64/libc.so.6) Uninitialized value was created by an allocation of 'ofpacts_stub' in the stack frame of function 'handle_flow_mod' #0 0x569e80 in handle_flow_mod ofproto/ofproto.c:6170 This could cause issues with flow modifications or other operations. To reproduce, some NSH tests could be run under valgrind or clang MemorySantizer. Ex. "nsh - md1 encap over a veth link" test. Fix that by clearing padding bytes while encoding and decoding. OVS will still accept OF messages with non-zero padding from controllers. New tests added to tests/ofp-actions.at. Fixes: 1fc11c5 ("Generic encap and decap support for NSH") Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Jan Scheurich <[email protected]>
blp
pushed a commit
that referenced
this pull request
Nov 21, 2020
If datapath flow doesn't have one of the fields of gtpu metadata, e.g. 'tunnel(gtpu())', uninitialized stack memory will be used instead. ==3485429==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x853a1b in format_u8x lib/odp-util.c:3474:13 #1 0x86ee9c in format_odp_tun_gtpu_opt lib/odp-util.c:3713:5 #2 0x86a099 in format_odp_tun_attr lib/odp-util.c:3973:13 #3 0x83afe6 in format_odp_key_attr__ lib/odp-util.c:4179:9 #4 0x838afb in odp_flow_format lib/odp-util.c:4563:17 #5 0x738422 in log_flow_message lib/dpif.c:1750:5 #6 0x738e2f in log_flow_put_message lib/dpif.c:1784:9 #7 0x7371a4 in dpif_operate lib/dpif.c:1377:21 #8 0x7363ef in dpif_flow_put lib/dpif.c:1035:5 #9 0xc7aab7 in dpctl_put_flow lib/dpctl.c:1171:13 #10 0xc65a4f in dpctl_unixctl_handler lib/dpctl.c:2701:17 #11 0xaaad04 in process_command lib/unixctl.c:308:13 #12 0xaa87f7 in run_connection lib/unixctl.c:342:17 #13 0xaa842e in unixctl_server_run lib/unixctl.c:393:21 #14 0x51c09c in main vswitchd/ovs-vswitchd.c:128:9 #15 0x7f88344391a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) #16 0x46b92d in _start (vswitchd/ovs-vswitchd+0x46b92d) Uninitialized value was stored to memory at #0 0x87da17 in scan_gtpu_metadata lib/odp-util.c:5221:27 #1 0x874588 in parse_odp_key_mask_attr__ lib/odp-util.c:5862:9 #2 0x83ee14 in parse_odp_key_mask_attr lib/odp-util.c:5808:18 #3 0x83e8b5 in odp_flow_from_string lib/odp-util.c:6065:18 #4 0xc7a4f3 in dpctl_put_flow lib/dpctl.c:1145:13 #5 0xc65a4f in dpctl_unixctl_handler lib/dpctl.c:2701:17 #6 0xaaad04 in process_command lib/unixctl.c:308:13 #7 0xaa87f7 in run_connection lib/unixctl.c:342:17 #8 0xaa842e in unixctl_server_run lib/unixctl.c:393:21 #9 0x51c09c in main vswitchd/ovs-vswitchd.c:128:9 #10 0x7f88344391a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) Uninitialized value was created by an allocation of 'msgtype_ma' in the stack frame of function 'scan_gtpu_metadata' #0 0x87d440 in scan_gtpu_metadata lib/odp-util.c:5187 Fix that by initializing fields to all zeroes by default. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21426 Fixes: 3c6d05a ("userspace: Add GTP-U support.") Acked-by: Yi Yang <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
Length of nested attributes must be checked before storing to the header. If current length exceeds the maximum value parsing should fail, otherwise the length value will be truncated leading to corrupted netlink message and out-of-bound memory accesses: ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6310002cc838 at pc 0x000000575470 bp 0x7ffc6c322d60 sp 0x7ffc6c322d58 READ of size 1 at 0x6310002cc838 thread T0 SCARINESS: 12 (1-byte-read-heap-buffer-overflow) #0 0x57546f in format_generic_odp_key lib/odp-util.c:2738:39 #1 0x559e70 in check_attr_len lib/odp-util.c:3572:13 #2 0x56581a in format_odp_key_attr lib/odp-util.c:4392:9 #3 0x5563b9 in format_odp_action lib/odp-util.c:1192:9 #4 0x555d75 in format_odp_actions lib/odp-util.c:1279:13 ... Fix that by checking the length of nested netlink attributes before updating 'nla_len' inside the header. Additionally introduced assertion inside nl_msg_end_nested() to catch this kind of issues before actual overflow happened. Credit to OSS-Fuzz. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20003 Fixes: 65da723 ("odp-util: Format tunnel attributes directly from netlink.") Acked-by: Flavio Leitner <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
'child_port_list' is an array of pointers that should be freed too. Direct leak of 30 byte(s) in 6 object(s) allocated from: #0 0x501fff in malloc (/tests/ovstest+0x501fff) #1 0x6227e6 in xmalloc /lib/util.c:138:15 #2 0x6228b8 in xmemdup0 /lib/util.c:168:15 #3 0x8183d6 in parse_fwd_group_action /lib/actions.c:3374:30 #4 0x814b6e in parse_action /lib/actions.c:3610:9 #5 0x8139ef in parse_actions /lib/actions.c:3637:14 #6 0x8136a3 in ovnacts_parse /lib/actions.c:3672:9 #7 0x813c80 in ovnacts_parse_string /lib/actions.c:3699:5 #8 0x53a979 in test_parse_actions /tests/test-ovn.c:1372:21 #9 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17 #10 0x537c75 in test_ovn_main /tests/test-ovn.c:1630:5 #11 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17 #12 0x537359 in main /tests/ovstest.c:133:9 #13 0x7f06978f21a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) CC: Manoj Sharma <[email protected]> Fixes: edb2400 ("Forwarding group to load balance l2 traffic with liveness detection") Acked-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Signed-off-by: Numan Siddique <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
'dsts' should be freed in case of any error. Direct leak of 4 byte(s) in 1 object(s) allocated from: #0 0x502378 in realloc (/tests/ovstest+0x502378) #1 0x622826 in xrealloc /lib/util.c:149:9 #2 0x8194f4 in parse_select_action /lib/actions.c:1185:20 #3 0x814f49 in parse_set_action /lib/actions.c:3499:13 #4 0x814341 in parse_action /lib/actions.c:3554:9 #5 0x8139ef in parse_actions /lib/actions.c:3643:14 #6 0x8136a3 in ovnacts_parse /lib/actions.c:3678:9 #7 0x813c80 in ovnacts_parse_string /lib/actions.c:3705:5 #8 0x53a4e8 in test_parse_actions /tests/test-ovn.c:1321:17 #9 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17 #10 0x537c75 in test_ovn_main /tests/test-ovn.c:1630:5 #11 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17 #12 0x537359 in main /tests/ovstest.c:133:9 #13 0x7f9ce05ba1a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) CC: Han Zhou <[email protected]> Fixes: 85b3544 ("ovn-controller: A new action "select".") Acked-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Signed-off-by: Numan Siddique <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
'parse_ofp_meter_mod_str' allocates space for meter.bands that should be freed. Direct leak of 448 byte(s) in 7 object(s) allocated from: #0 0x52100f in malloc (/controller/ovn-controller+0x52100f) #1 0x7523a6 in xmalloc /lib/util.c:138:15 #2 0x6fd079 in ofpbuf_init /lib/ofpbuf.c:123:26 #3 0x6cba27 in parse_ofp_meter_mod_str /lib/ofp-meter.c:779:5 #4 0x5705b8 in add_meter_string /controller/ofctrl.c:1674:19 #5 0x56f736 in ofctrl_put /controller/ofctrl.c:2105:13 #6 0x59aebb in main /controller/ovn-controller.c:2627:25 #7 0x7f07873251a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) CC: Guoshuai Li <[email protected]> Fixes: c25094b ("ovn: OVN Support QoS meter") Acked-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Signed-off-by: Numan Siddique <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
'smap_clear()' doesn't free allocated memory, but 'smap_clone()' re-initializes hash map clearing internal pointers and leaking this memory. 'smap_destroy()' should be used instead. Also, all the records and array of datapaths should be freed on destruction of a cache entry. Direct leak of 16 byte(s) in 2 object(s) allocated from: #0 0x5211c7 in calloc (/controller/ovn-controller+0x5211c7) #1 0x752364 in xcalloc /lib/util.c:121:31 #2 0x576e76 in sync_dns_cache /controller/pinctrl.c:2517:25 #3 0x5758fb in pinctrl_run /controller/pinctrl.c:3158:5 #4 0x59b06c in main /controller/ovn-controller.c:2642:25 #5 0x7fb570fc11a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) Indirect leak of 26 byte(s) in 2 object(s) allocated from: #0 0x52100f in malloc (/controller/ovn-controller+0x52100f) #1 0x7523d6 in xmalloc /lib/util.c:138:15 #2 0x7524a8 in xmemdup0 /lib/util.c:168:15 #3 0x73d8fc in smap_clone /lib/smap.c:314:45 #4 0x576e2f in sync_dns_cache /controller/pinctrl.c:2513:13 #5 0x5758fb in pinctrl_run /controller/pinctrl.c:3158:5 #6 0x59b06c in main /controller/ovn-controller.c:2642:25 #7 0x7fb570fc11a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) Fixes: 6b72068 ("ovn-controller: Add a new thread in pinctrl module to handle packet-ins.") Acked-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Signed-off-by: Numan Siddique <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
shash contains pointers to the data that should be freed. Direct leak of 32 byte(s) in 2 object(s) allocated from: #0 0x52100f in malloc (/controller/ovn-controller+0x52100f) #1 0x752436 in xmalloc /lib/util.c:138:15 #2 0x5a2f0b in add_pending_ct_zone_entry /controller/ovn-controller.c:548:45 #3 0x5a2d1d in update_ct_zones /controller/ovn-controller.c:668:9 #4 0x59d8c6 in en_ct_zones_run /controller/ovn-controller.c:1495:5 #5 0x5dade4 in engine_run /lib/inc-proc-eng.c:377:9 #6 0x59adf4 in main /controller/ovn-controller.c #7 0x7f0799ef41a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) CC: xu rong <[email protected]> Fixes: 252e164 ("ovn-controller: pending_ct_zones should be destroyed") Acked-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Signed-off-by: Numan Siddique <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
When a port binding of type "l3gateway" is claimed its remote peer port_binding is also stored in local_datapath.peer_ports[].remote. If the remote peer port_binding is deleted first (i.e., before the local "l3gateway" one) then we need to remove the complete local_datapath.peer_ports[] entry in order to avoid ending up using dangling pointers to already freed port bindings. Also, properly reset local_datapath->has_local_l3gateway in remove_pb_from_local_datapath(). Ilya reported this issue found by AddressSanitizer during his testing: ==1816017==ERROR: AddressSanitizer: heap-use-after-free on address 0x6140000cb170 at pc 0x0000005ab574 bp 0x7fff68925a30 sp 0x7fff68925a28 READ of size 8 at 0x6140000cb170 thread T0 #0 0x5ab573 in put_replace_chassis_mac_flows git/ovn/controller/physical.c:550:9 #1 0x5a65eb in consider_port_binding git/ovn/controller/physical.c:1168:13 #2 0x5a8764 in physical_run git/ovn/controller/physical.c:1607:9 #3 0x5a0064 in flow_output_physical_flow_changes_handler git/ovn/controller/ovn-controller.c:2127:9 #4 0x5db423 in engine_compute git/ovn/lib/inc-proc-eng.c:306:18 #5 0x5dae1f in engine_run_node git/ovn/lib/inc-proc-eng.c:352:14 #6 0x5dac74 in engine_run git/ovn/lib/inc-proc-eng.c:377:9 #7 0x59ad64 in main git/ovn/controller/ovn-controller.c #8 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) #9 0x480b2d in _start (git/ovn/controller/ovn-controller+0x480b2d) 0x6140000cb170 is located 304 bytes inside of 408-byte region [0x6140000cb040,0x6140000cb1d8) freed by thread T0 here: #0 0x520d07 in free (git/ovn/controller/ovn-controller+0x520d07) #1 0x712de7 in ovsdb_idl_db_track_clear git/ovs/lib/ovsdb-idl.c:1984:21 #2 0x59b5cd in main git/ovn/controller/ovn-controller.c:2762:9 #3 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) Reported-by: Ilya Maximets <[email protected]> Fixes: 354bdba ("ovn-controller: I-P for SB port binding and OVS interface in runtime_data.") Tested-by: Ilya Maximets <[email protected]> Signed-off-by: Dumitru Ceara <[email protected]> Signed-off-by: Numan Siddique <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 2, 2020
Segfault is seen with the below trace. This patch fixes the issue by checking 'ovnsb_idl_txn' is not NULL before continuing in the function send_garp_locally(). #0 ovsdb_idl_txn_insert (txn=0x0, class=0x64b170 <sbrec_table_classes+816>, uuid=0x0) at ../lib/ovsdb-idl.c:3504 #1 0x000000000041b068 in mac_binding_add (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0, sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, logical_port=0xfd0d70 "lr0-pub", dp=0xfd0f20, ea=..., ip=0xf67be0 "172.24.4.221") at ../controller/pinctrl.c:3877 #2 0x000000000041b18b in send_garp_locally (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0, sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, local_datapaths=local_datapaths@entry=0xf766f0, in_pb=in_pb@entry=0xfd3370, ea=..., ip=3708033196) at ../controller/pinctrl.c:3913 #3 0x000000000041d1be in send_garp_rarp_update (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0, sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, local_datapaths=local_datapaths@entry=0xf766f0, binding_rec=0xfd3370, nat_addresses=nat_addresses@entry=0x7ffdb9295b80) at ../controller/pinctrl.c:4118 #4 0x0000000000425c88 in send_garp_rarp_prepare (active_tunnels=0xf76770, local_datapaths=0xf766f0, chassis=0xfdb4e0, br_int=<optimized out>, sbrec_mac_binding_by_lport_ip=0xf6a7e0, sbrec_port_binding_by_name=0xf6a090, sbrec_port_binding_by_datapath=<optimized out>, ovnsb_idl_txn=0x0) at ../controller/pinctrl.c:5491 #5 pinctrl_run (ovnsb_idl_txn=0x0, sbrec_datapath_binding_by_key=<optimized out>, sbrec_port_binding_by_datapath=<optimized out>, sbrec_port_binding_by_key=<optimized out>, sbrec_port_binding_by_name=0xf6a090, sbrec_mac_binding_by_lport_ip=0xf6a7e0, sbrec_igmp_groups=0xf6ab90, sbrec_ip_multicast_opts=0xf6a9c0, dns_table=0xf33960, ce_table=0xf33960, svc_mon_table=0xf33960, br_int=0xf67d50, chassis=0xfdb4e0, local_datapaths=0xf766f0, active_tunnels=0xf76770) at ../controller/pinctrl.c:3169 #6 0x0000000000408e91 in main (argc=<optimized out>, argv=<optimized out>) at ../controller/ovn-controller.c:2789 Fixes: a2b88dc("pinctrl: Directly update MAC_Bindings created by self originated GARPs.") Signed-off-by: Numan Siddique <[email protected]> Acked-by: Dumitru Ceara <[email protected]>
blp
pushed a commit
that referenced
this pull request
Dec 19, 2020
When idl removes orphan rows, those rows are inserted into the 'track_list'. This allows iterators such as *_FOR_EACH_TRACKED () to return orphan rows that never had any data to the IDL user. In this case, it is difficult for the user to understand whether it is a row with no data (there was no "insert" / "modify" for this row) or it is a row with zero data (columns were cleared by DB transaction). The main problem with this condition is that rows without data will have NULL pointers instead of references that should be there according to the database schema. For example, ovn-controller might crash: ERROR: AddressSanitizer: SEGV on unknown address 0x000000000100 (pc 0x00000055e9b2 bp 0x7ffef6180880 sp 0x7ffef6180860 T0) The signal is caused by a READ memory access. Hint: address points to the zero page. #0 0x55e9b1 in handle_deleted_lport /controller/binding.c #1 0x55e903 in handle_deleted_vif_lport /controller/binding.c:2072:5 #2 0x55e059 in binding_handle_port_binding_changes /controller/binding.c:2155:23 #3 0x5a6395 in runtime_data_sb_port_binding_handler /controller/ovn-controller.c:1454:10 #4 0x5e15b3 in engine_compute /lib/inc-proc-eng.c:306:18 #5 0x5e0faf in engine_run_node /lib/inc-proc-eng.c:352:14 #6 0x5e0e04 in engine_run /lib/inc-proc-eng.c:377:9 #7 0x5a03de in main /controller/ovn-controller.c #8 0x7f4fd9c991a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) #9 0x483f0d in _start (/controller/ovn-controller+0x483f0d) It doesn't make much sense to return non-real rows to the user, so it's best to exclude them from iteration. Test included. Without the fix, provided test will print empty orphan rows that was never received by idl as tracked changes. Fixes: 932104f ("ovsdb-idl: Add support for change tracking.") Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Dumitru Ceara <[email protected]>
blp
pushed a commit
that referenced
this pull request
Jul 9, 2021
This is benign but AddressSanitizer was complaining about it: Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7fa93cd6d667 in __interceptor_malloc (/lib64/libasan.so.6+0xb0667) #1 0x1be8d3e in xmalloc__ lib/util.c:137 #2 0x1be8e1a in xmalloc lib/util.c:172 #3 0x1b799d3 in json_create lib/json.c:1451 #4 0x1b74314 in json_integer_create lib/json.c:263 #5 0x1b7d38a in jsonrpc_create_id lib/jsonrpc.c:563 #6 0x1b7d3a5 in jsonrpc_create_request lib/jsonrpc.c:570 #7 0x1b8d851 in ovsdb_cs_send_transaction lib/ovsdb-cs.c:1376 #8 0x40b017 in northd_send_output_only_data_request northd/ovn-northd-ddlog.c:290 #9 0x40c802 in northd_run northd/ovn-northd-ddlog.c:568 #10 0x410225 in main northd/ovn-northd-ddlog.c:1289 #11 0x7fa93c4a9081 in __libc_start_main ../csu/libc-start.c:308 Signed-off-by: Dumitru Ceara <[email protected]> Acked-by: Ben Pfaff <[email protected]>
blp
pushed a commit
that referenced
this pull request
Aug 2, 2021
Whenever a Load_Balancer is updated, e.g., a VIP is added, the following sequence of events happens: 1. The Southbound Load_Balancer record is updated. 2. The Southbound Datapath_Binding records on which the Load_Balancer is applied are updated. 3. Southbound ovsdb-server sends updates about the Load_Balancer and Datapath_Binding records to ovn-controller. 4. The IDL layer in ovn-controller processes the updates at #3, but because of the SB schema references between tables [0] all logical flows referencing the updated Datapath_Binding are marked as "updated". The same is true for Logical_DP_Group records referencing the Datapath_Binding, and also for all logical flows pointing to the new "updated" datapath groups. 5. ovn-controller ends up recomputing (removing/readding) all flows for all these tracked updates. From the SB Schema: "Datapath_Binding": { "columns": { [...] "load_balancers": {"type": {"key": {"type": "uuid", "refTable": "Load_Balancer", "refType": "weak"}, "min": 0, "max": "unlimited"}}, [...] "Load_Balancer": { "columns": { "datapaths": { [...] "type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": "unlimited"}}, [...] "Logical_DP_Group": { "columns": { "datapaths": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding", "refType": "weak"}, "min": 0, "max": "unlimited"}}}, [...] "Logical_Flow": { "columns": { "logical_datapath": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": 1}}, "logical_dp_group": {"type": {"key": {"type": "uuid", "refTable": "Logical_DP_Group"}, In order to avoid this unnecessary Logical_Flow notification storm we now remove the explicit reference from Datapath_Binding to Load_Balancer and instead store raw UUIDs. This means that on the ovn-controller side we need to perform a Load_Balancer table lookup by UUID whenever a new datapath is added, but that doesn't happen too often and the cost of the lookup is negligible compared to the huge cost of processing the unnecessary logical flow updates. This change is backwards compatible because the contents stored in the database are not changed, just that the schema constraints are relaxed a bit. Some performance measurements, on a scale test deployment simulating an ovn-kubernetes deployment with 120 nodes and a large load balancer with 16K VIPs associated to each node's logical switch, the event processing loop time in ovn-controller, when adding a new VIP, is reduced from ~39 seconds to ~8 seconds. There's no need to change the northd DDlog implementation. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 Acked-by: Mark Michelson <[email protected]> Signed-off-by: Dumitru Ceara <[email protected]>
blp
pushed a commit
that referenced
this pull request
Aug 2, 2021
While decoding RAW_ENCAP action, decode_ed_prop() might re-allocate ofpbuf if there is no enough space left. However, function 'decode_NXAST_RAW_ENCAP' continues to use old pointer to 'encap' structure leading to write-after-free and incorrect decoding. ==3549105==ERROR: AddressSanitizer: heap-use-after-free on address 0x60600000011a at pc 0x0000005f6cc6 bp 0x7ffc3a2d4410 sp 0x7ffc3a2d4408 WRITE of size 2 at 0x60600000011a thread T0 #0 0x5f6cc5 in decode_NXAST_RAW_ENCAP lib/ofp-actions.c:4461:20 #1 0x5f0551 in ofpact_decode ./lib/ofp-actions.inc2:4777:16 #2 0x5ed17c in ofpacts_decode lib/ofp-actions.c:7752:21 #3 0x5eba9a in ofpacts_pull_openflow_actions__ lib/ofp-actions.c:7791:13 #4 0x5eb9fc in ofpacts_pull_openflow_actions lib/ofp-actions.c:7835:12 #5 0x64bb8b in ofputil_decode_packet_out lib/ofp-packet.c:1113:17 #6 0x65b6f4 in ofp_print_packet_out lib/ofp-print.c:148:13 #7 0x659e3f in ofp_to_string__ lib/ofp-print.c:1029:16 #8 0x659b24 in ofp_to_string lib/ofp-print.c:1244:21 #9 0x65a28c in ofp_print lib/ofp-print.c:1288:28 #10 0x540d11 in ofctl_ofp_parse utilities/ovs-ofctl.c:2814:9 #11 0x564228 in ovs_cmdl_run_command__ lib/command-line.c:247:17 #12 0x56408a in ovs_cmdl_run_command lib/command-line.c:278:5 #13 0x5391ae in main utilities/ovs-ofctl.c:179:9 #14 0x7f6911ce9081 in __libc_start_main (/lib64/libc.so.6+0x27081) #15 0x461fed in _start (utilities/ovs-ofctl+0x461fed) Fix that by getting a new pointer before using. Credit to OSS-Fuzz. Fuzzer regression test will fail only with AddressSanitizer enabled. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27851 Fixes: f839892 ("OF support and translation of generic encap and decap") Acked-by: William Tu <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
blp
pushed a commit
that referenced
this pull request
Aug 2, 2021
ovsdb_cs_send_transaction() returns the pointer to the same 'request_id' object that is used internally. This leads to situation where transaction in idl and CS module has the same 'request_id' object. However, CS module is able to destroy this transaction id at any time, e.g. if connection state chnaged, but idl transaction might be still around at this moment and application might still use it. Found by running 'make check-ovsdb-cluster' with AddressSanitizer: ==79922==ERROR: AddressSanitizer: heap-use-after-free on address 0x604000167a98 at pc 0x000000626acf bp 0x7ffcdb38a4c0 sp 0x7ffcdb38a4b8 READ of size 8 at 0x604000167a98 thread T0 #0 0x626ace in json_destroy lib/json.c:354:18 #1 0x56d1ab in ovsdb_idl_txn_destroy lib/ovsdb-idl.c:2528:5 #2 0x53a908 in do_vsctl utilities/ovs-vsctl.c:3008:5 #3 0x539251 in main utilities/ovs-vsctl.c:203:17 #4 0x7f7f7e376081 in __libc_start_main (/lib64/libc.so.6+0x27081) #5 0x461fed in _start (utilities/ovs-vsctl+0x461fed) 0x604000167a98 is located 8 bytes inside of 40-byte region [0x604000167a90,0x604000167ab8) freed by thread T0 here: #0 0x503ac7 in free (utilities/ovs-vsctl+0x503ac7) #1 0x626aae in json_destroy lib/json.c:378:9 #2 0x6adfa2 in ovsdb_cs_run lib/ovsdb-cs.c:625:13 #3 0x567731 in ovsdb_idl_run lib/ovsdb-idl.c:394:5 #4 0x56fed1 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3187:9 #5 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14 #6 0x539251 in main utilities/ovs-vsctl.c:203:17 #7 0x7f7f7e376081 in __libc_start_main previously allocated by thread T0 here: #0 0x503dcf in malloc (utilities/ovs-vsctl+0x503dcf) #1 0x594656 in xmalloc lib/util.c:138:15 #2 0x626431 in json_create lib/json.c:1451:25 #3 0x626972 in json_integer_create lib/json.c:263:25 #4 0x62da0f in jsonrpc_create_id lib/jsonrpc.c:563:12 #5 0x62d9a8 in jsonrpc_create_request lib/jsonrpc.c:570:23 #6 0x6af3a6 in ovsdb_cs_send_transaction lib/ovsdb-cs.c:1357:35 #7 0x56e3d5 in ovsdb_idl_txn_commit lib/ovsdb-idl.c:3147:27 #8 0x56fea9 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3186:22 #9 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14 #10 0x539251 in main utilities/ovs-vsctl.c:203:17 #11 0x7f7f7e376081 in __libc_start_main Fixes: 1c337c4 ("ovsdb-idl: Break into two layers.") Acked-by: Dumitru Ceara <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
blp
pushed a commit
that referenced
this pull request
Aug 2, 2021
…nected. The symptom of this issue is that OVS bridge looses its IP address on restart. Simple reproducer: 0. start ovsdb-server and ovs-vswitchd 1. ovs-vsctl add-br br0 2. ifconfig br0 10.0.0.1 up 3. ovs-appctl -t ovs-vswitchd exit 4. start ovs-vswitchd back. After step #3 ovs-vswitchd is down, but br0 interface exists and has configured IP address. After step #4 there is no IP address on the port br0. What happened: 1. ovsdb-cs connects to the database via ovsdb-idl and requests database lock. --> get_schema for _Server database --> lock request 2. ovsdb-cs receives schema for the _Server database. And sends monitor request. <-- schema for _Server --> monitor_cond for _Server 3. ovsdb-cs receives lock reply. <-- locked At this point ovsdb-cs generates OVSDB_CS_EVENT_TYPE_LOCKED event and passes it to ovsdb-idl. ovsdb-idl increases change_seqno. 4. ovsdb_idl_has_ever_connected() is 'true' now, because change_seqno is not zero. 5. ovs-vswitchd decides that it has connection with database and all the initial data, therefore initiates configuration of bridges. bridge_run():ovsdb_idl_has_ever_connected() --> true 6. Since monitor request for the Open_vSwitch database is not even sent yet, the database is empty. This leads to removal of all the ports and all other resources. 7. When data finally received, ovs-vswitchd re-creates bridges and ports, but IP addresses can not be restored. While splitting out ovsdb-cs from ovsdb-idl one part of the logic was lost. Particularly, before the split, ovsdb-idl updated change_seqno only in MONITORING state. Restoring the logic by updating the change_seqno only if may send transaction, i.e. lock is ours and ovsdb-cs is in the MONITORING state. This matches with the main purpose of increasing change_seqno at this point, i.e. to force the client to re-try the transaction. With this change ovsdb_idl_has_ever_connected() remains 'false' until the first monitor reply with the actual data received. This issue was reported several times during the last couple of weeks. Reported-at: https://bugzilla.redhat.com/1968445 Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383512.html Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-June/051222.html Fixes: 1c337c4 ("ovsdb-idl: Break into two layers.") Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Dumitru Ceara <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With the new ovsdb2ddlog adapter generator, DDlog output tables have
the exact same schema as OVSDB tables. The notions of uuid_names and
uuid_or_str type are gone. Instead every output record must contain a
_uuid
field. UUIDs are also uniformly used as cross-table references.Portin northd logic to the new schema revealed two bugs that were somehow
masked previously:
for a disabled router, but we would still generate Port_Bindings's for
it, causing referential integrity violations.
SB database if the group has no chassis assigned to it, but we would
still reference such a group from other tables.
Most changes in this commit are straightforward, but there are a couple
of issues that require work:
We sometimes use
hash128
from std lib to manufacture UUIDs byhashing the subset of unique columns of a record. In case of the
Logical_Flow
table this includes all columns, and is going to beexpensive.
My changes to HA_Chassis logic are a bit hacky and need to be cleaned
up.