-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong PCIe devices get passed through to wrong qubes after adding new PCIe cards #7792
Comments
Ouch. This is actually a security concern: passing e.g. your primary SSD to sys-net would be bad. The only solution I can think of is to identify devices by PCI bus path, which should be deterministic and enforced by hardware. |
This is a known issue with PCI ( at least it was ~4 yrs ago - I could be
out of date.)
Numbering can be affected by introducing new devices, BIOS/UEFI changes,
etc. When I cared about these things we never rolled out hardware
changes without testing for such changes.
The bus path was not deterministic.
Some cards will provide their own buses and bridges, which can really
complicate numbering.
|
@unman is there any sort of identifier associated with a physical slot on the motherboard somewhere, and which cannot change? |
I'm a little out of touch - I'll check back.
|
I've experienced this issue whilst trying to add a 3rd GPU to this motherboard https://www.msi.com/Motherboard/X399-SLI-PLUS/Specification - in particular when 2 cards were nvidia and 1 ati all the pci IDs changed compared to a known good configuration with 1 less GPU. However going from 1 -> 2 GPUs with them in the white slots, they kept their IDs |
@unman friendly ping |
I dont think there is a way to circumvent this.
|
It can be reduced (not eliminated but reduced) by validating the pci vendor id and pci device id match what they were during qube assignments. If they change block qube auto-start. Special case perhaps to have sys-gui also auto-assign based on these ids if the current device doesn't match (woe to multi-card systems). B |
This can have unfortunate results if you add a second device matching
the original.
|
For people who are interested: the issue #7892 has some possible security-breaching cases and workarounds for users about this issue. |
I think Qubes OS would need a more wholesome approach for PCI devices: This would also resolve problems with PCI devices suddenly attached to dom0 (e.g. #7886). |
This would be rather user-unfriendly on thunderbolt systems. B |
On 11/20/22 11:58, Brendan Hoar wrote:
> ...and inform users that they'll have to edit that list from some rescue boot mode whenever they modify devices.
This would be rather user-unfriendly on thunderbolt systems.
Well, thunderbolt systems are just another example why a new architecural solution might be required here.
Btw I still think that my proposal of having some boot menu decision on which PCI devices to attach where (on changes only) would be an improvement for thunderbolt users over the current situation (= no thunderbolt support). Later on, one could maybe extend it to runtime decisions whenever a PCI change occurs on a running system. Some popup asking for an immediate decision and blocking the device until then comes to mind.
I'd be happy to hear about better wholesome ideas though.
Plus I think Qubes OS made the `security` > `user friendly` - if necessary - choice long ago (and correctly so from my point of view).
|
This is IMO a good idea. To validate VID+PID in addition to device bus-device-function (BDF). If it changes, refuse to start, and request the user to detach the device and attach it again. It wouldn't automatically try to find where the device moved to (which, as suggested later, could result in attaching wrong device, if there are multiple matches). |
Thanks. This gets a bit more annoying for sys-gui (w/o dom0 fallback) and sys-usb (w/o ps2 mouse/keyboard), but...well we already have those problems today. :) B |
There was a complementary feature considered for portable cases (Qubes installed on external disk, plugged into different machines) - assignment based not on specific devices, but all devices of a given class: #214 |
@brendanhoar: I don’t know if sys-gui w/o dom0 fallback is going to happen. For one, it would likely block a port to Apple Silicon, as the GPU is not behind an SMMU (ARM IOMMU) on that platform. |
@marmarek: do you know what the PCIe Location Path that Microsoft refers to is? @brendanhoar On a Thunderbolt system, plugging or unplugging a Thunderbolt device should not cause the numbers of other devices to change. If it does, that is a bug in the hardware or firmware, IMO. |
That's irrelevant now, since ARM isn't supported by Qubes yet. I can totally see, if/when we'll be doing ARM port, deciding to require IOMMU properly put in front of all relevant devices (including GPU). Anyway, that's some years in the future, hardware may look different at that time.
Seems to be BDF, based on a quick google: https://superuser.com/a/1202958 lists "PCI bus 3, device 0, function 0" as "LocationInfo" |
That is fair. If we were to support ARM now, then Apple Silicon would be an obvious target, which would make such a requirement difficult to meet. But this might change in the future.
Interesting. Is BDF what is used for switching PCIe packets “on-the-wire”? |
This comment was marked as off-topic.
This comment was marked as off-topic.
Yes, I think so. It is at least used to access config space. |
Is it also used when programming the IOMMU? |
Short answer: yes. At least for VT-d, DMAR ACPI table describes mapping BDF to IOMMU engines. |
When BDF changes due to adding or removing a card, does the DMAR table also change in such a way that the IOMMU engine number of existing devices does not change? If this is always the case, then the IOMMU engine number could be used as an additional key. |
I don't think that's useful in practice. In common desktop systems you have 1 or 2 IOMMU engines, so this "additional key" would be the same for all devices in most cases anyway. |
Ah, good point. |
This feature can be used by advanced users to assign devices to pciback in a policy-like manner based on various PCI device attributes. References QubesOS/qubes-issues#7886 QubesOS/qubes-issues#7792
This feature can be used by advanced users to assign devices to pciback in a policy-like manner based on various PCI device attributes. References QubesOS/qubes-issues#7886 QubesOS/qubes-issues#7792
Depends at what stage of the boot process it happened at. For example, if the "The PCI-ID of your USB controller changed, what do you want to do?" happened during the grub bootloader stage before xen loaded, then it wouldn't be a problem. (Also, note: I believe there is supposedly some information that can be gathered before xen loads that cannot be gathered afterwards. (I read something about software tools that can't run under xen (as dom0), which can find out which PCI devices need to be passed as part of the same "group" (can't seem to find the article right now))) Also, dom0 would either need to "publish" the VM to PCI device map for that to be accessible by grub, or grub would need to be able to read the luks volumes. Anyway, I'm not pushing for this, but though I'd mention it as a possibility. |
After installing qubes and setting up your passthrough PCI devices, if you add a new PCIe card, it can change the device numbers of the old cards. This will cause qubes to pass through the wrong devices to the wrong HVM qubes.
The device renumbering is obviously a hardware thing, and preferably there would be a way to make persistent names similar to how linux gives ethernet cards persistent names. If you remove eth0 from a linux system and put another ethernet card in, linux starts calling that new card eth1, not eth0. Linux probably determines uniqueness by MAC address, so there might not be any way to uniquely identify PCIe cards like that.
I expect this to become a bigger problem in the future when people are using sys-gui-gpu and video cards getting a new PCIe number means they will no longer pass through to sys-gui-gpu qube.
(this is related to, but different then #6587 )
The text was updated successfully, but these errors were encountered: