Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WiP]: Initial DDlog scaffolding in controller #4

Draft
wants to merge 4 commits into
base: ddlog-controller
Choose a base branch
from

Conversation

amytai
Copy link
Collaborator

@amytai amytai commented Jun 30, 2021

Opening a permanent PR to keep track of our ovn-controller changes.

This initial commit creates a dummy output relation and populates it programmatically.
Still todo:

  • Create correct output relations
  • Create adapters that populate DDlog whenever diffs are made to OVN_Southbound database (perhaps port from northd)

@amytai amytai marked this pull request as draft June 30, 2021 19:50
- Create dummy output relation in ovn_controller.dl
- Populate this output relation programmatically inside ovn-controller main
@amytai
Copy link
Collaborator Author

amytai commented Jun 30, 2021

For now, the way to test whether things are working correctly is to run make check TESTSUITEFLAGS='-j8 -d 924' in the top-level directory. Then, inspect tests/testsuite.dir/924/hv1/ovn-controller.log and grep for DDlog record.

@amytai amytai requested review from blp and mihaibudiu June 30, 2021 20:15

input_table = ddlog_get_table_id(prog, input_relation);
if (input_table == -1)
VLOG_INFO(" ddlog_get_table_id returned -1 for %s, %d\n", input_relation, input_table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main problem is that on all these errors you seemingly continue through the normal path as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just to get stuff building. We definitely would need better error handling for production but it's a start!

@@ -2720,6 +2734,62 @@ main(int argc, char *argv[])
OVS_NODES
#undef OVS_NODE

/*** Begin DDlog test code ***/
// DDlog vars
ddlog_prog prog;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this "main" is for. I wonder how control reaches this point.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main code of ovn-controller. This new code will run as part of the daemon startup.

controller/ovn-controller.c Outdated Show resolved Hide resolved
controller/ovn-controller.c Outdated Show resolved Hide resolved
controller/ovn-controller.c Outdated Show resolved Hide resolved
controller/ovn-controller.c Outdated Show resolved Hide resolved
Copy link
Owner

@blp blp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a lot of compiler warnings and errors here. All minor but should be fixed. You can use OVS_UNUSED to silence warnings about unused parameters.

../controller/ovn-controller.c: In function ‘print_records_callback’:
../controller/ovn-controller.c:2567:46: error: unused parameter ‘arg’ [-Werror=unused-parameter]
../controller/ovn-controller.c:2567:84: error: unused parameter ‘weight’ [-Werror=unused-parameter]
In file included from ../controller/ovn-controller.c:47:
../controller/ovn-controller.c: In function ‘main’:
../controller/ovn-controller.c:2752:19: error: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘table_id’ {aka ‘long unsigned int’} [-Werror=format=]
/home/bpfaff/nicira/ovs/include/openvswitch/vlog.h:280:41: note: in definition of macro ‘VLOG’
../controller/ovn-controller.c:2752:9: note: in expansion of macro ‘VLOG_INFO’
../controller/ovn-controller.c:2752:61: note: format string is defined here
In file included from ../controller/ovn-controller.c:47:
../controller/ovn-controller.c:2756:19: error: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘table_id’ {aka ‘long unsigned int’} [-Werror=format=]
/home/bpfaff/nicira/ovs/include/openvswitch/vlog.h:280:41: note: in definition of macro ‘VLOG’
../controller/ovn-controller.c:2756:9: note: in expansion of macro ‘VLOG_INFO’
../controller/ovn-controller.c:2756:60: note: format string is defined here

blp pushed a commit that referenced this pull request Jul 9, 2021
This is benign but AddressSanitizer was complaining about it:

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7fa93cd6d667 in __interceptor_malloc (/lib64/libasan.so.6+0xb0667)
    #1 0x1be8d3e in xmalloc__ lib/util.c:137
    #2 0x1be8e1a in xmalloc lib/util.c:172
    #3 0x1b799d3 in json_create lib/json.c:1451
    #4 0x1b74314 in json_integer_create lib/json.c:263
    #5 0x1b7d38a in jsonrpc_create_id lib/jsonrpc.c:563
    #6 0x1b7d3a5 in jsonrpc_create_request lib/jsonrpc.c:570
    #7 0x1b8d851 in ovsdb_cs_send_transaction lib/ovsdb-cs.c:1376
    #8 0x40b017 in northd_send_output_only_data_request northd/ovn-northd-ddlog.c:290
    #9 0x40c802 in northd_run northd/ovn-northd-ddlog.c:568
    #10 0x410225 in main northd/ovn-northd-ddlog.c:1289
    #11 0x7fa93c4a9081 in __libc_start_main ../csu/libc-start.c:308

Signed-off-by: Dumitru Ceara <[email protected]>
Acked-by: Ben Pfaff <[email protected]>
@amytai amytai force-pushed the ddlog-controller branch from 3f52b32 to 3159654 Compare July 27, 2021 18:26
amytai added 2 commits July 27, 2021 11:34
This commit contains:
- new controller/ddlog.h header file to encapsulate ddlog boilerplate
- move insertion of DDlog input relation records to controller/lflow.c This might not be the permanent place: we need to identify the location to place the hook that writes OVSDB updates to the ovn-controller DDlog's Logical_Flow input relation.

Still TODO:
- consider_logical_flow logic should eventually be encapsulated in DDlog code
- find permanent location to insert OVSDB updates to ovn-controller DDlog instance
… Logical Flows

Limitations:
- For now, only track address_set and port_group dependencies
- Don't have the logic for creating the actual OpenFlow flows
blp pushed a commit that referenced this pull request Aug 2, 2021
While decoding RAW_ENCAP action, decode_ed_prop() might re-allocate
ofpbuf if there is no enough space left.  However, function
'decode_NXAST_RAW_ENCAP' continues to use old pointer to 'encap'
structure leading to write-after-free and incorrect decoding.

  ==3549105==ERROR: AddressSanitizer: heap-use-after-free on address
  0x60600000011a at pc 0x0000005f6cc6 bp 0x7ffc3a2d4410 sp 0x7ffc3a2d4408
  WRITE of size 2 at 0x60600000011a thread T0
    #0 0x5f6cc5 in decode_NXAST_RAW_ENCAP lib/ofp-actions.c:4461:20
    #1 0x5f0551 in ofpact_decode ./lib/ofp-actions.inc2:4777:16
    #2 0x5ed17c in ofpacts_decode lib/ofp-actions.c:7752:21
    #3 0x5eba9a in ofpacts_pull_openflow_actions__ lib/ofp-actions.c:7791:13
    #4 0x5eb9fc in ofpacts_pull_openflow_actions lib/ofp-actions.c:7835:12
    #5 0x64bb8b in ofputil_decode_packet_out lib/ofp-packet.c:1113:17
    #6 0x65b6f4 in ofp_print_packet_out lib/ofp-print.c:148:13
    #7 0x659e3f in ofp_to_string__ lib/ofp-print.c:1029:16
    #8 0x659b24 in ofp_to_string lib/ofp-print.c:1244:21
    #9 0x65a28c in ofp_print lib/ofp-print.c:1288:28
    #10 0x540d11 in ofctl_ofp_parse utilities/ovs-ofctl.c:2814:9
    #11 0x564228 in ovs_cmdl_run_command__ lib/command-line.c:247:17
    #12 0x56408a in ovs_cmdl_run_command lib/command-line.c:278:5
    #13 0x5391ae in main utilities/ovs-ofctl.c:179:9
    #14 0x7f6911ce9081 in __libc_start_main (/lib64/libc.so.6+0x27081)
    #15 0x461fed in _start (utilities/ovs-ofctl+0x461fed)

Fix that by getting a new pointer before using.

Credit to OSS-Fuzz.

Fuzzer regression test will fail only with AddressSanitizer enabled.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27851
Fixes: f839892 ("OF support and translation of generic encap and decap")
Acked-by: William Tu <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
blp pushed a commit that referenced this pull request Aug 2, 2021
ovsdb_cs_send_transaction() returns the pointer to the same
'request_id' object that is used internally.  This leads to
situation where transaction in idl and CS module has the
same 'request_id' object.  However, CS module is able to
destroy this transaction id at any time, e.g. if connection
state chnaged, but idl transaction might be still around at
this moment and application might still use it.

Found by running 'make check-ovsdb-cluster' with AddressSanitizer:

  ==79922==ERROR: AddressSanitizer: heap-use-after-free on address
  0x604000167a98 at pc 0x000000626acf bp 0x7ffcdb38a4c0 sp 0x7ffcdb38a4b8
  READ of size 8 at 0x604000167a98 thread T0
    #0 0x626ace in json_destroy lib/json.c:354:18
    #1 0x56d1ab in ovsdb_idl_txn_destroy lib/ovsdb-idl.c:2528:5
    #2 0x53a908 in do_vsctl utilities/ovs-vsctl.c:3008:5
    #3 0x539251 in main utilities/ovs-vsctl.c:203:17
    #4 0x7f7f7e376081 in __libc_start_main (/lib64/libc.so.6+0x27081)
    #5 0x461fed in _start (utilities/ovs-vsctl+0x461fed)

  0x604000167a98 is located 8 bytes inside of 40-byte
                    region [0x604000167a90,0x604000167ab8)
  freed by thread T0 here:
    #0 0x503ac7 in free (utilities/ovs-vsctl+0x503ac7)
    #1 0x626aae in json_destroy lib/json.c:378:9
    #2 0x6adfa2 in ovsdb_cs_run lib/ovsdb-cs.c:625:13
    #3 0x567731 in ovsdb_idl_run lib/ovsdb-idl.c:394:5
    #4 0x56fed1 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3187:9
    #5 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14
    #6 0x539251 in main utilities/ovs-vsctl.c:203:17
    #7 0x7f7f7e376081 in __libc_start_main

  previously allocated by thread T0 here:
    #0 0x503dcf in malloc (utilities/ovs-vsctl+0x503dcf)
    #1 0x594656 in xmalloc lib/util.c:138:15
    #2 0x626431 in json_create lib/json.c:1451:25
    #3 0x626972 in json_integer_create lib/json.c:263:25
    #4 0x62da0f in jsonrpc_create_id lib/jsonrpc.c:563:12
    #5 0x62d9a8 in jsonrpc_create_request lib/jsonrpc.c:570:23
    #6 0x6af3a6 in ovsdb_cs_send_transaction lib/ovsdb-cs.c:1357:35
    #7 0x56e3d5 in ovsdb_idl_txn_commit lib/ovsdb-idl.c:3147:27
    #8 0x56fea9 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3186:22
    #9 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14
    #10 0x539251 in main utilities/ovs-vsctl.c:203:17
    #11 0x7f7f7e376081 in __libc_start_main

Fixes: 1c337c4 ("ovsdb-idl: Break into two layers.")
Acked-by: Dumitru Ceara <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
blp pushed a commit that referenced this pull request Aug 2, 2021
…nected.

The symptom of this issue is that OVS bridge looses its IP address on
restart.

Simple reproducer:
 0. start ovsdb-server and ovs-vswitchd
 1. ovs-vsctl add-br br0
 2. ifconfig br0 10.0.0.1 up
 3. ovs-appctl -t ovs-vswitchd exit
 4. start ovs-vswitchd back.

After step #3 ovs-vswitchd is down, but br0 interface exists and
has configured IP address.  After step #4 there is no IP address
on the port br0.

What happened:
1. ovsdb-cs connects to the database via ovsdb-idl and requests
   database lock.
   --> get_schema for _Server database
   --> lock request

2. ovsdb-cs receives schema for the _Server database.  And sends
   monitor request.
   <-- schema for _Server
   --> monitor_cond for _Server

3. ovsdb-cs receives lock reply.
   <-- locked
   At this point ovsdb-cs generates OVSDB_CS_EVENT_TYPE_LOCKED
   event and passes it to ovsdb-idl.  ovsdb-idl increases change_seqno.

4. ovsdb_idl_has_ever_connected() is 'true' now, because change_seqno
   is not zero.

5. ovs-vswitchd decides that it has connection with database and
   all the initial data, therefore initiates configuration of bridges.
   bridge_run():ovsdb_idl_has_ever_connected() --> true

6. Since monitor request for the Open_vSwitch database is not even
   sent yet, the database is empty.  This leads to removal of all the
   ports and all other resources.

7. When data finally received, ovs-vswitchd re-creates bridges and
   ports, but IP addresses can not be restored.

While splitting out ovsdb-cs from ovsdb-idl one part of the logic
was lost.  Particularly, before the split, ovsdb-idl updated
change_seqno only in MONITORING state.

Restoring the logic by updating the change_seqno only if may send
transaction, i.e. lock is ours and ovsdb-cs is in the MONITORING
state.  This matches with the main purpose of increasing change_seqno
at this point, i.e. to force the client to re-try the transaction.
With this change ovsdb_idl_has_ever_connected() remains 'false'
until the first monitor reply with the actual data received.

This issue was reported several times during the last couple of weeks.

Reported-at: https://bugzilla.redhat.com/1968445
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383512.html
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-June/051222.html
Fixes: 1c337c4 ("ovsdb-idl: Break into two layers.")
Signed-off-by: Ilya Maximets <[email protected]>
Acked-by: Dumitru Ceara <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants