Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: New "networkmanager.passthrough" structure (LP: #2080301) #522

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

daniloegea
Copy link
Collaborator

@daniloegea daniloegea commented Sep 30, 2024

Description

This PR proposes a new format for the networkmanager.passthrough section. The new format was a suggestion from the LP#2080301 bug reporter. It addresses the problem reported in the bug and is future proof against similar problems.

The main problem with the currently used format is that it's ambiguous. The keyfile [group].key=value format is encoded as a string separated by dots. As dots can be used in the key name, it becomes tricky to separate what is the group and what is the key name. A different character could be used but we could eventually hit the same problem.

Arguably we could do something like [group-name-between-brackets]key-name: value. Brackets are forbidden in group names (I think?!). But it still would be a change in the format so I opted for turning each entry into mappings.

Here is how it looks like:

  nm-devices:
    NM-4bf69677-e326-49b4-8e37-2acf3074f6c7:
      renderer: NetworkManager
      networkmanager:
        uuid: "4bf69677-e326-49b4-8e37-2acf3074f6c7"
        name: "tuntap7"
        passthrough:
          connection:
            type: "tun"
            autoconnect: "false"
            interface-name: "tuntap7"
          proxy: {}
          tun: {}
          ipv4:
            method: "manual"
            address1: "4.3.2.1/24,1.2.3.254"
            dns: "1.2.3.254;4.3.2.254;"
          ipv6:
            addr-gen-mode: "default"
            method: "manual"
            address1: "1:2:3::4/64,1:2:3::abcd"
            dns: "1:2:3::abcd;abcd:3:2::1;"

With the integration of netplan and Network Manager on Ubuntu, tweaking the passthrough configuration might be necessary every now and then due to things missing in netplan. This format makes it easier to do that.

The old format is still accepted for backwards compatibility. The parser still accepts it but will emit the new format. So YAMLs generated by Network Manager will be updated if the user modifies the connection and it will still work with existing YAMLs if they are never updated.

IMPORTANT: the datalist data structure was replaced with hash tables. This will make the order in which elements are added to keyfiles unpredictable. I'm not 100% sure that ordering would cause problems with configuration from the passthrough section. My main motivation to drop datalists is that GQuarks might leak memory. We hit this issue in our config_fuzzer test due to the number of GQuarks needed when a big number of entries are added to the datalist.

A couple of Network Manager autopkgtests are failing due to the change in the structure and will need to be fixed as well.

I understand it's a big change and we might decide to not merge it so I made it an RFC.

Checklist

  • Runs make check successfully.
  • Retains code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

@daniloegea daniloegea force-pushed the nm_passthrough branch 5 times, most recently from c7e5515 to f70c447 Compare October 1, 2024 15:00
@daniloegea daniloegea changed the title WIP: New "networkmanager.passthrough" structure RFC: New "networkmanager.passthrough" structure Oct 1, 2024
@daniloegea daniloegea added the RFC Request for comment (don't merge yet) label Oct 1, 2024
@daniloegea daniloegea marked this pull request as ready for review October 1, 2024 15:15
@daniloegea daniloegea requested a review from slyon October 1, 2024 15:49
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this. I think the new and improved mapping format makes sense and is more sustainable than the existing flat list of group+key-values, going forward.

As can be seen from the CI NetworkManager tests, we need to adopt some cases in NetworkManager, and we should do so in a compatible way, e.g. using "netplan get", which can handle the old and new format, so we won't run into issues when backporting this change to older series.

I like the way you kept it reading the old format! But in general, we should double-check that we don't break backwards compatibility. Especially around netplan get/set and corresponding DBus calls, as used by the snapd integration.

Overall, getting rid of the GData list data-structure is a win, to avoid memory leaks in upstream GLib. And having a mapping structure is much more readable than the current flat list. I don't think there are any ordering issues introduced by using hashmaps, as we have one hashmap per keyfile group. And also the GKeyFIle functions put things into the correct place (group) inside the resulting file automatically.

I left a bunch of comments inline, mostly minor stuff, but also some questions, which we need to think through.

Comment on lines -573 to -576
/* Group name may contain dots, but key name may not.
* The "tc" group is a special case, where it is the other way around, e.g.:
* tc->qdisc.root
* tc->tfilter.ffff: */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This assumption is broken by the network-manager-openconnect plugin, as described in LP#2080301. We might be able to special case the vpn-secrets group similarly as we did with tc, to keep things simple... Although, that might make us run into similar issues in the future for other "edge cases".

src/nm.c Outdated Show resolved Hide resolved
src/parse-nm.c Outdated Show resolved Hide resolved
keys = g_key_file_get_keys(kf, groups[i], &klen, NULL);
if (klen == 0) {
/* empty group */
g_datalist_set_data_full(list, g_strconcat(groups[i], ".", NETPLAN_NM_EMPTY_GROUP, NULL), g_strdup(""), g_free);
g_hash_table_insert(*list, g_strdup(groups[i]), group);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I think this might leak memory, iff the key or value already exists in *list, as we did not supply a key_destroy_func/value_destroy_func above when creating the hash-table (e.g. using g_hash_table_new_full)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case, groups is created by g_key_file_get_groups which will not add repetitions to the result.

Although, there is a different memory leak there: thanks to the continue in the next line it will not free the keys array.

src/parse.c Outdated
@@ -548,22 +549,78 @@ handle_generic_datalist(NetplanParser *npp, yaml_node_t* node, const char* key_p
key = yaml_document_get_node(&npp->doc, entry->key);
value = yaml_document_get_node(&npp->doc, entry->value);

assert_type(npp, key, YAML_SCALAR_NODE);
assert_type(npp, value, YAML_SCALAR_NODE);

escaped_key = g_strescape(scalar(key), STRESCAPE_EXCEPTIONS);
escaped_value = g_strescape(scalar(value), STRESCAPE_EXCEPTIONS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: If we walk through a mapping, what's the escaped_value why do we need it here? Is it defined at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's a good point. The escaped_value is only relevant when the value type is scalar. I'll move it to inside the if statement.

@@ -58,6 +58,83 @@ def test_passthrough_basic(self):
method=ignore
'''}, '''[device-netplan.ethernets.NM-87749f1d-334f-40b2-98d4-55db58965f5f]
match-device=type:ethernet
managed=1\n\n''')

def test_passthrough_basic_new_format(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: _new_format is very generic. Maybe better call it test_passthrough_basic_mapping.

(Also in other places where "new_format" is used.)

Comment on lines +114 to +115
key: a
key: b''', skip_generated_yaml_validation=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should the result rather be key=b here. So that the value defined later wins?

also followup question: what happens if we mix and match the old format?

e.g.

passthrough:
  connection:
    name: "another name"
  connection.name: "some name"

I'd argue the value defined in the new (mapping) format should take precendence. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeaaah it probably should be key=b in the end. But that's actually a problem in the handle_generic_map function. It will error out if it finds the same key with different values (see below). Arguably this logic should be used everywhere else then...

$ cat /tmp/fakeroot/etc/netplan/a.yaml
network:
  version: 2
  renderer: NetworkManager
  ethernets:
    renderer: NetworkManager
    enp5s0:
      dhcp4: true
      openvswitch:
        other-config:
          one: 1
          one: 2
$ netplan  get --root-dir /tmp/fakeroot
Command failed: /tmp/fakeroot/etc/netplan/a.yaml:10:11: Error in network definition: duplicate map entry 'one'
          one: 1
          ^

For the second question, mixed format is supported. Although, if the key is set using the old style first, the parser will ignore it if it's found again using the new format. The problem is the same described above, but for these cases I'm ignoring the error.

I could create a new handle_generic_map only for the passthrough settings... but in this case I suppose the same logic should be used across all the parser and not only here...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I just realized that netplan set will not work if you try to update these fields, again, due to the way handle_generic_map works.

Adding and nullifying fields work fine.

It also doesn't work for openvswitch:

# netplan get
network:
  version: 2
  renderer: NetworkManager
  ethernets:
    enp5s0:
      dhcp4: true
      openvswitch:
        other-config:
          one: "1"
root@oracular-deleteme:~# netplan set ethernets.enp5s0.openvswitch.other-config.one=2
Command failed: Error in network definition: duplicate map entry 'one'

match-device=type:wifi
managed=1\n\n''')

def test_passthrough_wifi_new_format(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Maybe call it test_passthrough_wifi_mapping to be more precise.

src/validation.c Outdated
key = g_hash_table_lookup(group, "type");
}
}
if (nd->type == NETPLAN_DEF_TYPE_NM && (!nd->backend_settings.passthrough || !key))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the !nd->backend_settings.passthrough here any more. key is only defined if nd->backend_settings.passthrough was accessible in the previous if-statement.

Comment on lines -485 to +487
self._set([ap_key+'.networkmanager.passthrough.connection\\.permissions=null'])
self._set([ap_key+'.networkmanager.passthrough.connection.permissions=null'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we need to change the escaping in existing netplan get calls, out in the wild? That might have the potential to break the snapd integration used in Ubuntu Core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that (the escaping with \) will not work anymore if you are using get and set with settings inside the passthrough section. We probably shouldn't backport this change to LTS releases (like, never), even if we don't give any guarantees about stability for passthrough block.

@slyon slyon changed the title RFC: New "networkmanager.passthrough" structure RFC: New "networkmanager.passthrough" structure (LP: #2080301) Oct 17, 2024
@daniloegea daniloegea force-pushed the nm_passthrough branch 5 times, most recently from 5606e32 to fcc7cd5 Compare November 8, 2024 10:37
The networkmanager.passthrough section stores raw keyfile information
using a format that can be tricky to interpret. It will concatenate the
group and key names separated by dots. Unfortunately, Network Manager
accepts multiple dots in the middle of both the group and key names.
Because of that it's hard, if even possible, to determine where the
strings must be split.

Ex:
networkmanager:
  passthrough:
    group_name.key_name: value1
    group_name.key_name1: value2

The new format stores the same information in YAML mappings. By using
this format, parsing the group and key names is not needed anymore.

Ex:
networkmanager:
  passthrough:
    group_name:
      key_name1: value1
      key_name2: value2

The datalist data structure was replaced by hash tables. The entire
passthrough section is represented by a hash table indexed by the group
name. Each entry is also a hash table representing all the key/value
entries and it's indexed by the key name.
Now that the datalist was replaced with hash tables, run the generator
with --ignore-errors against the entire dataset. The datalist was
leaking memory and causing ASAN to crash the test.

Also limit the value of path-cost to avoid triggering a g_assert(v <
G_MACUINT).
Now it will generate a mix of old and new formats in the passthrough
section.
When --ignore-errors is used, some netdefs might arrive at the NM config
writers in a bad state. In such cases we just skip them.

Found with config_fuzzer.
Fixes the build on Fedora 41:

In file included from /usr/include/glib-2.0/glib.h:117,
                 from ../src/parse.c:25:
In function ‘g_autoptr_cleanup_generic_gfree’,
    inlined from ‘get_ip_family’ at ../src/parse.c:1918:22:
/usr/include/glib-2.0/glib/glib-autocleanups.h:32:3: error: ‘ip_str’ may be used uninitialized [-Werror=maybe-uninitialized]
   32 |   g_free (*pp);
      |   ^~~~~~~~~~~~
../src/parse.c: In function ‘get_ip_family’:
../src/parse.c:1918:22: note: ‘ip_str’ was declared here
 1918 |     g_autofree char *ip_str;
      |                      ^~~~~~
cc1: all warnings being treated as errors
@daniloegea
Copy link
Collaborator Author

Thank you, Lukas.

I addressed a bunch of your comments and left answers for many others.

I think the main highlights are:

  • netplan set will not work for updating fields inside the passthrough block. The reason is the way handle_generic_map works when it finds the same fields with different values. It also doesn't work for openvswitch.other-config for example due to the same problem. Adding and nullifying fields works fine.
  • I updated the documentation too

I changed a lot of stuff, it's a good idea to go over all the code again (sorry for that).

@daniloegea daniloegea requested a review from slyon November 8, 2024 12:01
The RPM build started to fail on Fedora 41:

find-debuginfo: starting
Extracting debug info from 4 files
gdb-add-index: Failed to find a useable GDB binary
@slyon slyon added the schema change This PR includes a change to netplan's YAML schema, which needs sign-off label Nov 12, 2024
@slyon
Copy link
Collaborator

slyon commented Nov 12, 2024

FTR: This is an enhancement of #193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comment (don't merge yet) schema change This PR includes a change to netplan's YAML schema, which needs sign-off
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants