-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drm: Some refactoring and bug fixes to address VM panics #2215
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bsdjhb
reviewed
Sep 4, 2024
markjdb
force-pushed
the
dev-panfrost
branch
from
September 4, 2024 21:30
a50a81b
to
88f3f0d
Compare
bsdjhb
reviewed
Sep 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks ok to me.
bsdjhb
approved these changes
Sep 9, 2024
No functional change intended.
Make sure that a failure from pmap_gpu_enter() triggers an assertion failure. This should only happen under memory pressure.
mmap_sem refers to a Linux synchronization mechanism. Linux is different enough here that the references are not very helpful. No functional change intended.
vm_pfn_pcount is only set once, we might as well set it during initialization.
vmap->vm_len is always 0 here. I believe this code is effectively dead, but let's fix it anyway.
See vm_fault_populate(): a return value of VM_PAGER_BAD tells the fault layer to continue handling the fault "normally", i.e., it will allocate a page, insert it into the VM object, and map it. That's not what we want in general.
There are currently no DRM objects of type OBJT_DEVICE, all are of type OBJT_PHYS or OBJT_MGTDEVICE. drm_cdev_pager_fault() is thus dead code, so remove it in advance of further refactoring. In particular, drm_vmap_find() will be removed.
As in dev_pager_dealloc(), drop the object lock around calls to the destructor. This might be necessary if the destructor wants to sleep.
Each GEM object is backed by a VM object which stores a handle for each page belonging to the object. A GEM object can be mapped multiple times. Panfrost's buffer objects are backed by physical RAM from the host system. The old code tried to maintain a global list of VMAs which map a GEM object, and this used the object pointer as a unique key, which breaks when the same object is mapped more than once. The mapping list is also somewhat awkward to maintain: FreeBSD's vm_fault layer is supposed to hide details relating to mappings of a given object (for instance, it doesn't pass the fault address to DRM), but Linux passes such details, so we end up faking them. Refactor things to handle the possibility of multiply-mapped GEM objects: - Minimize uses of VMAs. In the pager populate handler, we don't know which mapping the fault came from, so it doesn't make sense to try and look up a VMA. In practice, there is no need to know the origin of the fault, we just need a pindex to select the page to map. - Introduce a struct drm_object_glue, which glues together a GEM object, a VM object, and a list of VMAs which map the object. The list of VMAs is mostly useless but I've avoided removing it for now. Most uses of the VMA are replaced by extending the thread-private vm_fault structure. - Convert the global linked list lock to an sx lock so that it's possible to hold it while allocating memory. - Change the DRM object fault interface to avoid referencing VMAs. This deviates from upstream, but the modified code is all quite FreeBSD-specific anyway. While here, fix a related problem: when mmap()ing a device file, drm_fstub_do_mmap() tries to infer the VM object type to use. In particular, it creates a "managed device" object for panfrost buffer objects, which isn't correct since the object is backed by host RAM. The problem is further illustrated by the need to clear VPO_UNMANAGED and set PG_FICTITIOUS: setting PG_FICTITIOUS in host RAM pages is incorrect and can trigger assertion failures in the contig reclaim code. My view is that it's not possible to correctly map DRM objects without modifying upstream sources a little bit. That is, we cannot hide everything in drmkpi/linuxkpi. Keeping this in mind, the change fixes the problem described above by adding a new VM object type to struct vm_operations_struct, used in drm_fstub_do_mmap() to allocate the correct type of VM object.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The last commit is the main change. I had tried a couple of times to split it up but it was hard to do correctly, so I gave up in the interest of making the patches available a bit more quickly.
I took some small liberties in modifying drmkpi and DRM to better match the design of FreeBSD's fault handler. Since this code (drm device file mmap, fault handling) is highly non-portable anyway, I think this should be acceptable.
This should fix the panics reported in #2097 . I might be too optimistic, but so far it also seems to address some issues I was seeing with tearing and clipped text.