Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement closure-tree to improve performance. #34

Open
TimurKady opened this issue Jun 24, 2021 · 26 comments
Open

Implement closure-tree to improve performance. #34

TimurKady opened this issue Jun 24, 2021 · 26 comments
Assignees
Labels
enhancement New feature or request

Comments

@TimurKady
Copy link

TimurKady commented Jun 24, 2021

Hello!
I want to share the experience of using the module. Today I see many of its advantages and two main disadvantages. One of them I want to discuss. This is control over the order of elements.

Description of the problem.
The module does not have a transparent and understandable mechanism for ordering elements in the tree. In practice, the most common are two modes: alphabetical sorting and strict order set by the user. The logic dictates that the tn_priority field provided by the default form should do this. If it is set to 0 for all elements, then they must be ordered alphabetically. If it is specified, then the field value determines the order.

But alas, this is not the case.

Another method of establishing a coercive order of elements, which is intuitively prompted by experience, is also not suitable. This is an attempt to import data with the tn_order field set.

It would be nice if you made it easier to manage the order of items in tree.

PS. The main trouble is that with all my attachment to this module, without a mechanism for intelligible order management, I have to refuse to use it. And it just tears my soul to shreds :-(

Funding

  • You can sponsor this specific effort via a Polar.sh pledge below
  • We receive the pledge once the issue is completed & verified
Fund with Polar
@TimurKady TimurKady added the enhancement New feature or request label Jun 24, 2021
@fabiocaccamo
Copy link
Owner

fabiocaccamo commented Jun 26, 2021

@Timurkhan thank you for reporting your experience with this library.

The ordering problem was really hard to solve because the objective was to sort an N levels data structure which is stored in a flat data-structure (just 1 db table) and at the same time each object must respect the order of its ancestors.

I understand that the solution is not very transparent but it solves the problem well, if you have any idea on how to obtain the same result in a better way feel free to share it.

The tn_order field is used internally for the final ordering, its value is calculated based on the tn_priority value.

Please try to import your data using the tn_priority instead of tn_order and I think that it will work correctly (let me know).

@TimurKady
Copy link
Author

TimurKady commented Jun 26, 2021

I will think about how to solve the problem.
What is obvious now (as a statement of the task) is that the tn_priority parameter must be abandoned. At least its purpose should be to set the position of the element in the list of children of the node.
At one time (until I found your library) I had some developments. I will try to present something.

I've tried all the options. First, I assigned the tn_priority a order number in the list of node . Secondly, I tried to duplicate the id in the same field. Nothing helped. All the time, sorting occurs in the reverse order of id.

@TimurKady
Copy link
Author

I have a quick idea. It doesn't require a lot of code change.

  1. If the user needs an alphabetical ordering, then we leave the decision to him. Let's enter a CharField field, will create an index (db_index=True) in it, will indicate this field as a sorting field in the Meta class.

  2. We must change the type of the tn_order field to BinaryField. We store the Materialized Path to the tree node in it. We allocate two bytes (word or 65536 child nodes) for the binary representation of tn_priority of each node along the path. We must create an index. We don't change anything in the Meta class.

In this case, sorting will occur correctly when the ordinal number of the node in the children of this node is specified in the tn_priority field.

This will slow down the already not very fast insertion of the node, but I think it will be also useful for issuing the Materialized Path without traversing the tree.

@fabiocaccamo
Copy link
Owner

fabiocaccamo commented Jun 27, 2021

This doesn't seem to be more a more transparent solution :)

Anyway, before the current implementation, I stored the ordering string in a db field, but it could become very long in case of deep structures, especially if the pk is a UUIDField, and there is not real need to store it, so I decided to compute the ordering string just in memory and use it to calculate the final order using integer field.

I agree with the fact that tn_order and tn_priority field could be merged into a single field, that would surely reduce confusion.

@TimurKady
Copy link
Author

Yes: of course! The combined field must in any case unambiguously specify the order of the element in the list of the node's children.

@fabiocaccamo
Copy link
Owner

@Timurkhan have you any update on this?

@TimurKady
Copy link
Author

Befor NY I will sent u vertion. Now in testing

@TimurKady
Copy link
Author

TimurKady commented Dec 26, 2021

Hi Fabio!
After testing various ideas, we settled on this variation, with which we use your module:

    def __get_node_order_str(self):

        return '.'.join(
            '{0:04d}'.format(item)
            for item in self.get_breadcrumbs(attr='tn_priority')
        )

    def save(self, *args, **kwargs):
        def sort_key(obj):
            return obj.tn_priority

        super().save(*args, **kwargs)
        siblings = sorted(self.get_siblings(), key=sort_key)
        index = 1
        for leaf in siblings:
            if leaf.tn_priority >= self.tn_priority:
                leaf.tn_priority = self.tn_priority + index
                leaf.save()
                index += 1

It doesn't seem to work badly. The main thing is that it allows you to explicitly specify the order of the entries in the tree.
Perhaps you will not like this solution, or seem too straightforward. But... :-)

@fabiocaccamo
Copy link
Owner

@Timurkhan thank you for the feedback, I will test it and let you know!

@TimurKady
Copy link
Author

TimurKady commented Jan 14, 2022

Hi @fabiocaccamo!
Below is the more correct code. Also, it doesn't slow down the code working and use less no more 3 requests

    def __get_node_order_str(self):

        return '.'.join(
            '{0:04d}'.format(item)
            for item in self.get_breadcrumbs(attr='tn_priority')
        )

    def save(self, force_insert=False, force_update=False, *args, **kwargs):

        def sort_key(obj):
            return obj.tn_priority

        def order_list(obj_list):
            index = 0
            for obj in obj_list:
                obj.tn_priority = index
                index += 1
            self._meta.model.objects.bulk_update(obj_list, ['tn_priority'])

        try:
            parent = self.tn_parent
        except:
            parent = None

        if parent:
            new_siblings = sorted(self.tn_parent.get_children(), key=sort_key)
        else:
            roots = self._meta.model.get_roots()
            new_siblings = sorted(roots, key=sort_key)

        if self.pk and not force_insert:
            try:
                old = self._meta.model.objects.get(pk=self.pk)
                old_siblings = sorted(old.get_siblings(), key=sort_key)
            except:
                old = None

            if self in new_siblings:
                new_siblings.remove(self)
            new_siblings.insert(self.tn_priority, self)

            if old and (old.tn_parent != self.tn_parent):
                order_list(old_siblings)
                order_list(new_siblings)

            if old and (old.tn_priority != self.tn_priority):
                order_list(new_siblings)

        else:
            new_siblings.insert(self.tn_priority, self)
            super().save(*args, **kwargs)
            if len(new_siblings):
                order_list(new_siblings)

        self._meta.model.update_tree()

@fabiocaccamo
Copy link
Owner

Hello @TimurKady, thank you for the update, the code seems pretty clear, I will test it as soon as possible.

@TimurKady
Copy link
Author

TimurKady commented Jan 14, 2022

Tnx!
If interested, I can also offer update for TreeNodeForm. It is a widget for the "tn_parent" field (signal2 tree based).

@fabiocaccamo
Copy link
Owner

@TimurKady there already an issue for that, check #37

@TimurKady
Copy link
Author

TimurKady commented Jan 15, 2022

I've adjusted the code a bit. Colleagues made a small bug. Sorry.

@fabiocaccamo
Copy link
Owner

@TimurKady I tried to test your code, but trying to run the test suite (python runtests.py) it gets stuck indefinitely (more than a minute, then I killed the process since the same it run in ~12 seconds on my machine).

@TimurKady
Copy link
Author

To be honest, in any case, the main drawback of the library is the high cost of creating one record. We looked at the reason. You completely rebuild the tree when you insert each element with post_save. I already thought about making changes to the code. To do this, will have to change two methods (.save() anl __get_node_data()) and abandon the signals. But the costs are very high. Meybe leter...

@fabiocaccamo
Copy link
Owner

The main drawback of the library is the high cost of creating one record

This is what I needed to know, thanks!

@TimurKady
Copy link
Author

TimurKady commented Jan 18, 2022

But I really like your idea, to store the answers to the most likely requests in the database. Maybe later, when I have time, I will try to offer you an adjusted faster solution based on your idea.
I made one mistake. I entrusted the revision of your library to a not very experienced young man. Unfortunately, I will have to go back to finalizing anyway, and apparently I will have to do this myself outside of working on our project.

@fabiocaccamo
Copy link
Owner

fabiocaccamo commented Jan 18, 2022

@TimurKady When I initially developed this library my need was to manage quickly a 4 levels menu (less than 200 entries) with a nice admin integration, so I come up to this solution and I used it in different projects always for solving the same problem.

Now looking at feedbacks, many devs are using it with very big trees with thousands of records, so performances are a top-priority issue to solve.

Updating the whole tree is not a scalable solution, now something more smart that continue to satisfy the tests is needed, not an easy task at all.

Just for curiosity, have you tried to do the same with django-treebeard?

@TimurKady
Copy link
Author

I have an idea, or rather a suggestion. There is one neat way to drastically improve the performance of your library. The essence of the method is the simultaneous use of the main Adjacency List and the Closure Table. Then you will not have to incur large costs when creating a new element, and the bulk of the requests will still be performed in one database request.

If you don't mind, I'll have a free weekend so I can write most of the code. You will only need to add some methods that will ensure compatibility with earlier versions of the library and run tests. If you are in favor of this approach, then find a way to tell me your email address so that we can communicate not here, but directly.

@TimurKady
Copy link
Author

Просто из любопытства, вы пытались сделать то же самое с django-treebeard?

Yes, of course! I have tried all tree packages. But yours appealed to me the most. All other libraries have less advantages than disadvantages.

@fabiocaccamo
Copy link
Owner

fabiocaccamo commented Jan 19, 2022

@TimurKady It's an ambitious change, I didn't know about closure tables, it seems a very good approach, thank you for suggesting it.

You can find my email on my GH profile.

@TimurKady
Copy link
Author

@fabiocaccamo Unfortunately, there is nothing ambitious in this. The code will not be too big. The most ambitious is the support of the previous version.

I am well acquainted with graphs and algorithms for working with it. I'll close this part of the code. You know Django and cache working very well. Close this part!

I'll send you my code and we will discuss everything.

@fabiocaccamo
Copy link
Owner

fabiocaccamo commented Jan 20, 2022

@TimurKady
Copy link
Author

I'll look it later

@TimurKady
Copy link
Author

Combination of Adjacency List and Closure Table (django-treenode based)
Done!

@fabiocaccamo fabiocaccamo moved this to Todo in Open Source Dec 1, 2022
@polar-sh polar-sh bot added polar and removed polar labels Jun 29, 2023
@fabiocaccamo fabiocaccamo changed the title Order control Implement closure-tree to improve performance. Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

2 participants