-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Unify names of generic relations to media classes #4599
Conversation
This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is quite novel and interesting, and the final code undoubtedly looks much cleaner.
However, I feel like metaprogramming is a bit of a huge sledgehammer to wield against what was a relatively small nut of being able to connect media models with each other.
Using common related names like sensitive_media
and report
(instead of media-specific ones like sensitive_image
and audio_report
) already simplifies the code massively by removing almost all cases where we needed to handle media types separately.
What I'm trying to say is that I don't feel like the metaprogramming part adds much to this PR and the simplifications would remain even if that particular aspect of this PR was dropped.
I am making these statements based on a quick pass through the PR changes and so it is possible that I am not realising the full importance of the metaprogramming aspect. So I will definitely look into the PR more closely soon. But I wanted to leave my first impressions so that you can also have time to explore other directions if you are in an exploratory mood in the coming days.
Yeah, totally agree metaprogramming is a pretty intense approach for this. I suppose the motivation of using metaprogramming to solve this, as opposed to explicitly configuring the related names on each real model (and additional explicit The alternatives I considered were:
What I like about the metaclass approach is that it fully centralises the configuration of the related name and the backreferences of the actual classes from the media models. so that subclasses of the related models don't have to do anything (other than define So, the metaclass approach looks pretty intense at face value, but it does, I think, ultimately simplify the configuration and make it less fragile or prone to random accidental variations (for example, when new related models are added). (Here's an aside about our data model, inspired by thinking about the different approaches and where this complexit comes from). To be honest though, I think it's worth us considering whether the per-media type segmentation of the models (and tables) is entirely useful, and whether there is a generic approach that would actually eliminate the need to repeat structures for each media type, especially when considering aspects of the data model like the complexities I touched on in #1290, where our current model is so focused on individual works, it doesn't allow for the actually complex relationships that sometimes exist. Audio having Audioset as a distinct audio-specific implementation is interesting, for example. We already have images in our catalogue that are parts of real, meaningful sets (which could, for example, be extremely useful for related media queries). Mostly these are scans of pages from the same books (Biodiversity Heritage Library being the source that immediately comes to mind), but there are other forms of images-in-series or collections... not to mention the idea of being able to create Openverse-user curated collections, which we've touched on frequently. So where metaprogramming in this PR seems like it could be overly complex or abstract, it might also (or instead) point to a problem with our data model, which pre-encodes this level of abstraction and complexity that might neither be necessary or even real in a fundamental sense, and for which meaningful differences between media types could be accounted in less complex ways. Notably, other systems for modelling the metadata of individual works all work under the assumption of linked data built off core metadata shared by most works regardless of the kind. For example DublinCore and the more complex RDF (which both came up in #4430) work from the same set of basic terms shared by all works, and enabling much more flexible modeling and metadata storage than we currently have, demonstrated by Fedora Repository1 as an example. Our per-media-type approach is not necessarily the best way of doing things generally, and other systems that pre-date ours and which have been manually tested and applied to many millions of digital and analogue works do things a little differently. Crucially broken in our system, for example, is how a multimedia work or collection of works would be described. We don't have any way of modelling audio that are related to images, for example, that wouldn't require creating a separate container for those works from containers of audio (which we have as audio set) and containers of images (which we do not have). Anyway, that's not really important to the question of what the best way of solving this problem of generic relations between these Python classes... I just thought it was interesting to consider why these relations exist, whether they're necessary and whether they're even real (in the sense of, does this division actually exist when works are reduced down to an identifier pointing to a binary blob, with a few to many of potentially infinite number of data points associated). If we step back from the model we've implemented and especially if we look at how others have tried to solve this problem, our approach is rather distinct. Whether that's good, purposeful, or serves our use case in a meaningful way that is worth the trade-offs... good for us to think about! Footnotes
|
related_name="deleted_audio", | ||
related_name="deleted_media", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, these actually do not need to be defined anymore! I accidentally left these in when I was trying one of the other approaches I mentioned that still required manually configuring the related names. The metaclass approach intentionally removes the need for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the approach that I was actually preferring over the metaclass because it is much more explicit and clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't solve the back-references to the class though. You can go through the related set but if it's a one-to-one, there's no good way to do it, as far as I can tell, that won't cause an implicit query.
The complexity comes from trying to solve both problems:
- Make it possible to use generic references to instance relations
- Make it possible to use generic references to _model_s of those relations
class OpenverseMediaModel(Protocol): | ||
media_type: MediaType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused, left over from another experimental approach I tried (one that didn't leverage metaclass kwargs and instead generated a new metaclass per-abstract related media model, or used a decorator, both of which were a little ugly or didn't work).
Anyway, it's meant to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave a comment in the code saying it is meant to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be removed from this PR. I'll do so when I get back around to making updates to this based on feedback for the overall approach.
@property | ||
def index(self) -> str: | ||
return settings.MEDIA_INDEX_MAPPING[self.media_type] | ||
|
||
@property | ||
def filtered_index(self) -> str: | ||
return f"{self.index}-filtered" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be cached classmethod properties.
@@ -275,17 +289,29 @@ def save(self, *args, **kwargs): | |||
super().save(*args, **kwargs) | |||
|
|||
|
|||
class AbstractMediaDecision(OpenLedgerModel): | |||
# Do not subclass or it will cause an unnecessary migration due to the change in class name | |||
def MediaModeratorForeignKey(related_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a static method on the AbstractMediaDecision
instead.
This is looking quite interesting. Similar to Dhruv's opinion, I like how much code this saves, and the resulting one looks clean. I also worry about the metaprogramming part, it diverges quite a lot from our current approach. If we can reach an intermediate stage that would be wonderful. Perhaps with your second alternative:
Regarding data modeling, I find our current approach still simple, as we currently have a few central tables. Actually, most of the complexities I usually find come more from maintaining a sort of mirror with the catalog DB. Of course, I'd be interested in exploring new ways to store/represent the data, but I don't see this as a problem for our current use case but with expanding the use cases that we could potentially include.
The key is that the current models weren't exactly planned for collections, so it's understandable that they cannot be fit currently. It's important to note this because relational models are designed with very specific requirements so they can work better. It's not set in stone, of course, but trying to change it one way or another has several implications.
I'd like to know more about these cases you mentioned so I can understand them better! And see how they relate to what we are trying to resolve. I agree we can reevaluate all of these, but we also need to define the requirements/constraints beforehand. |
Fixes
Fixes #4553 by @sarayourfriend
Description
This is still WIP because I need to test a bit more thoroughly. So far things appear to work with manual testing, but I haven't run unit tests, and I suspect there are edge cases in the model admins that I have missed.
Heads up through, @dhruvkb, if you want to take a look at the approach I've taken. It's a bit abstract, and I could definitely see wanting to use a more concrete, straightforward, declarative approach that just manually creates the back references onto the media classes.
The approach overrides the metaclass for Django models with one specifically for models related to a media type. The abstract models reference that metaclass, along with a related name, and then subclasses are processed to get assigned to the media class as
{related_name}_class
, and the related name from the abstract class is passed down to theMeta
of the final models (which Django doesn't do on its own, only a small set of fields are inherited fromMeta
by default).I initially tried this with
__init_subclasses__
but you can kind of see what that results in with the admin models and forms... but a bit more unwieldy because of the related name. It ended up being a lot of specific, fiddly implementation details on each of the related abstract classes, which the metaclass was able to tidy up into a single implementation pretty nicely. Of course, it is a bit magical, so if we want to just, say, hand-maintain a nested dictionary of these relations, kind of like themedia_type_config
fixture in the tests (which could be replaced or reduced by some of this work), and prefer that for its straightforwardness (if being slightly more clunky and requiring more manual configuration of relation names) then I can probably be convinced.My goal was to prevent any SQL migrations, and indeed, sqlmigrate shows the current migration as a complete noop.
Anyway, let me know what you think, Dhruv, if you get a chance to look at this while it's still a draft. Very curious to see what you'll think of this introspective/meta-programming based approach, and whether you can think of an easier way to do it or think a more concrete and less meta approach would be better.
Testing Instructions
TBD.
Checklist
Update index.md
).main
) or a parent feature branch../ov just catalog/generate-docs
for catalogPRs) or the media properties generator (
./ov just catalog/generate-docs media-props
for the catalog or
./ov just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin