Add collections #1824

alejandromumo · 2024-09-25T15:12:56Z

closes #1820

invenio_rdm_records/collections/models.py

alejandromumo · 2024-09-27T15:12:42Z

invenio_rdm_records/collections/api.py

+        ret = db.session.execute(stmt).scalars().all()
+        return [type(self)(r) for r in ret]
+
+    def get_subcollections(self, max_depth=3):


this makes sense to have in CollectionTree API

invenio_rdm_records/collections/api.py

alejandromumo · 2024-09-27T15:32:24Z

invenio_rdm_records/collections/service.py

+
+    def create(self, identity, community_id, tree_slug, slug, title, query, **kwargs):
+        """Create a new collection."""
+        current_communities.service.require_permission(


if it has a REST API, permissions could be added directly to the service so they can e.g. be overwritten by Zenodo to System

alejandromumo · 2024-09-27T15:36:07Z

invenio_rdm_records/collections/service.py

+    def read(self, identity, id_):
+        """Get a collection by ID or slug."""
+        collection = self.collection_cls.resolve(id_)
+        if not collection:
+            raise ValueError(f"Collection {id_} not found.")
+        if collection.community:
+            current_communities.service.require_permission(
+                identity, "read", community_id=collection.community.id
+            )
+
+        return CollectionItem(collection)
+
+    def read_slug(self, identity, community_id, tree_slug, slug):
+        """Get a collection by slug."""
+        current_communities.service.require_permission(
+            identity, "read", community_id=community_id
+        )
+
+        ctree = CollectionTree.get_by_slug(tree_slug, community_id)
+        if not ctree:
+            raise ValueError(f"Collection tree {tree_slug} not found.")
+
+        collection = self.collection_cls.resolve(slug, ctree.id, use_slug=True)
+        if not collection:
+            raise ValueError(f"Collection {slug} not found.")
+
+        return CollectionItem(collection)


collapse into just one read that is smart enough, same as API get

consider depth as well

* collections: fix create api

invenio_rdm_records/collections/models.py

anikachurilova · 2024-09-30T14:56:36Z

invenio_rdm_records/collections/models.py

+    @classmethod
+    def get_by_slug(cls, slug, tree_id):
+        """Get a collection by slug."""
+        return cls.query.filter(cls.slug == slug, cls.tree_id == tree_id).one_or_none()


Do we return None, or should it fail when it's not found?

I see the point of failing fast, however, I prefer to return None for separation of concerns. Whether the collection does not exist is an exception, I think it should be up to the business layer to decide. From a data point of view, it tried to fetch the collection but nothing was found.

I agree that we need to throw an exception at some point to have the presentation layer(s) acting accordingly (and that's currently missing in this PR). I will add that part in the next iteration since I refactored part of the public API / service layers from Alex's comments.

Maybe None works in this case, but you will see soon its limitations. With a None return value there are 2 main problems (and maybe more):

every time you call the func, you need to check if the return value is None or not. And the subsequent code will have to handle this case. So, as in your PR, you end up having if everywhere. Not very elegant.

even if less applicable in this func, in general you can't differentiate different type of unhappy path.

An exception is much more elegant because the caller will have many more options on how to handle the unhappy path.
I don't really understand what are concerns are better separate with returning None.

No problem with solving this later on, in subsequent PRs.

invenio_rdm_records/collections/models.py

anikachurilova · 2024-09-30T15:03:42Z

invenio_rdm_records/collections/api.py

+from .models import CollectionTree as CollectionTreeModel
+
+
+class Collection:


Are we sure that this api layer is really needed, when the data model is not a classic record with a json schema? Can't the models be directly used in the service layer (as done in invenio-jobs for example)?
This class looks like a wrapper of the model, and the various methods can be moved to the model class.

On one hand, I agree this class feels like a wrapper to the model, and it could be richer (e.g. I and Alex discussed storing the relation between collections in memory and then they could be "navigated" in memory).

However, some ideas that made me implement a public API for collections:

Abstract some concepts about how we store collections in DB (materialized path). In my opinion, for the developer, things like path should be abstracted and encapsulated under a "nice looking" API.

Related to the one above, adding some public-facing utilities to improve the usability of collections, e.g. collection.ancestors, collection.children, create a collection based on a parent, resolving a collection by id or slug, community_id pair.

dump / to_dict methods for serialization.

My 2 cents

I understand your reasoning. At the same time, with this approach, we will end up having a wrapper classes that have most of the attributes and methods duplicated from the encapsulated model. And then we add a couple of extra methods.
I would not know what is the separation of concerns or abstraction, and if I have to add a new feature (a new query), I would not know if I have to put in the API or in the Model. What is the criteria that tells me where to put it?

For the path, maybe, I don't see it for the moment in the class.
For your second point, it can be done in the model, I don't see the difference.

A model is already an API class. What has been done for records is actually very different and in that case very much needed: the model is loaded into a dict object, with many other extra methods.

If we start creating wrappers everywhere, we will end up having more complex code, and less DRY. As a comparison, it looks a little bit like the props drilling problem in React.

As an alternative, if you really feel like that we need to separate things, then in this case I would go with inheritance instead of composition.

To recap: I think that our final objective is to have the simplest possible code. If the majority of the APIs exposed by a model are intended to be used, I would use the model directly or create inheritance.
Composition and wrapping make sense to me when you want to hide the majority of the features provided by the wrapped object.

Those are excellent points, I will keep them in mind in the next PR.

Thanks for the detailed explanation 🚀

invenio_rdm_records/collections/service.py

anikachurilova · 2024-09-30T15:09:59Z

invenio_rdm_records/collections/service.py

+
+    def create(self, identity, community_id, tree_slug, slug, title, query, **kwargs):
+        """Create a new collection."""
+        current_communities.service.require_permission(


Maybe we should introduce a collection specific permissions instead of using the communities' ones?

can_update default value is CommunityOwners, is this correct?

You are right, conceptually, communities are not required to have a collection (think about a collection on the repository level). However, the current state of this service is to require a community (this is our use case right now).

I agree that the collections' permission policy at some point has to be specific for collections, but it still needs more thought and the service to be adapted to it.

For a first launch, the permission can_update from communities is good since we don't want any user of a community creating collections for a community

anikachurilova · 2024-09-30T15:13:01Z

invenio_rdm_records/collections/service.py

+        return CollectionItem(collection)
+
+    def read(self, identity, id_):
+        """Get a collection by ID or slug."""


Suggested change

"""Get a collection by ID or slug."""

"""Get a collection by ID."""

This is going to be refactored in a second iteration when we add collections to the browse page. The idea is that service.read() is flexible enough that it can be done by ID or slug

invenio_rdm_records/collections/service.py

anikachurilova · 2024-09-30T15:25:06Z

invenio_rdm_records/collections/service.py

+        collection = self.collection_cls.create(
+            slug=slug, title=title, query=query, ctree=ctree, **kwargs
+        )
+        return CollectionItem(collection)


create and add methods are missing the commit operations (for example uow.register(ModelCommitOp(...)))

Will be added now :)

ntarocco · 2024-09-30T16:23:54Z

tests/collections/test_collections_api.py

+
+def test_create(running_app, db, community, community_owner):
+    """Test collection creation via API."""
+    tree = CollectionTree.create(


I suggest changing the test to read from the DB when asserting the creation of the collection [tree]. Given that the db commit is missing, they should fail.

collections: added core backend

8a7b18d

alejandromumo commented Sep 26, 2024

View reviewed changes

invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved

alejandromumo commented Sep 26, 2024

View reviewed changes

invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved

alejandromumo commented Sep 26, 2024

View reviewed changes

invenio_rdm_records/collections/models.py Show resolved Hide resolved

alejandromumo force-pushed the add_collections branch from 01e34fe to 9233f72 Compare September 26, 2024 09:03

alejandromumo commented Sep 27, 2024

View reviewed changes

alejandromumo added 2 commits September 30, 2024 11:00

tests: added collection tests

d849ef7

alembic: added collections tables

7be1ad0

* collections: fix create api

alejandromumo force-pushed the add_collections branch from b61fa0d to 7be1ad0 Compare September 30, 2024 09:01

anikachurilova reviewed Sep 30, 2024

View reviewed changes

invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved

anikachurilova reviewed Sep 30, 2024

View reviewed changes

invenio_rdm_records/collections/models.py Show resolved Hide resolved

anikachurilova reviewed Sep 30, 2024

View reviewed changes

invenio_rdm_records/collections/service.py Outdated Show resolved Hide resolved

anikachurilova reviewed Sep 30, 2024

View reviewed changes

invenio_rdm_records/collections/service.py Outdated Show resolved Hide resolved

anikachurilova reviewed Sep 30, 2024

View reviewed changes

invenio_rdm_records/collections/service.py Outdated Show resolved Hide resolved

anikachurilova reviewed Sep 30, 2024

View reviewed changes

ntarocco reviewed Sep 30, 2024

View reviewed changes

alejandromumo added 2 commits October 1, 2024 14:16

collections: added uow and refactor depth calculation

22d4971

collections: updated depth column

fe65cf4

alejandromumo merged commit ac37cf8 into inveniosoftware:master Oct 3, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add collections #1824

Add collections #1824

alejandromumo commented Sep 25, 2024

alejandromumo Sep 27, 2024

alejandromumo Sep 27, 2024

alejandromumo Sep 27, 2024

anikachurilova Sep 30, 2024

alejandromumo Oct 1, 2024

ntarocco Oct 1, 2024

anikachurilova Sep 30, 2024

alejandromumo Oct 1, 2024

ntarocco Oct 1, 2024

alejandromumo Oct 2, 2024

anikachurilova Sep 30, 2024

alejandromumo Oct 1, 2024

anikachurilova Sep 30, 2024

alejandromumo Oct 1, 2024

anikachurilova Sep 30, 2024

alejandromumo Oct 1, 2024

ntarocco Sep 30, 2024

		from .models import CollectionTree as CollectionTreeModel


		class Collection:

	"""Get a collection by ID or slug."""
	"""Get a collection by ID."""

Add collections #1824

Add collections #1824

Conversation

alejandromumo commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment