Implementation of new Annotator API #21

hoxbro · 2023-09-13T14:02:56Z

resolves #16

This PR adds the API, so it is possible to work with multiple regions across multiple elements with a single Annotator. A small example is added in MultiPlot.ipynb, to show the new API off.

In general, I have tried to keep myself limited to only making changes in annotator.py, but naturally, a redesign like this will also affect other parts of the codebase. The other significant change is the format of region_df in AnnotatorTable. See section the Dimension specification for more information.

Changed methods

Some methods have been moved from the main Annotator class to the AnnotatorElement where it makes sense. Some logic has also been moved from Annotator into AnnotatorInterface. In my mind, AnnotatorInterface should be able to work by itself, and the Annotator should expand the API to work with Holoviews elements.

Annotator.set_range and Annotator.set_point are superseded by Annotator.set_regions. Both of the former have a limited interface that can only handle up to two dimensions and very tailored logic on whether the number of dimensions is one or two. I have updated them both to use the Annotator.set_regions, mainly to not mess around too much with the tests in this PR. I plan to remove them entirely in a follow-up PR.

Annotator.define_ranges, Annotator.define_points, and Annotator.define_fields have all been merged into one API in Annotator.define_annotations. The methods have been left untouched (except a small print statement) as the tests mainly check bad inputs, e.g., you need to run define_fields before define_{ranges, points}. The tests do not work with the new API, and I plan to remove them in a follow-up PR.

Dimension specification

Some major redesign has been needed to support the new API. The two main changes have been

Not be limited to two dimensions in the Annotator class.
Change the type and region so it is associated with the dimension itself and not the Annotator.

In the previous implementation, the Annotator had two parameters for this: kdim_dtypes and region_type.

holonote/holonote/annotate/annotator.py

Lines 100 to 107 in a502bdb

    
               kdim_dtypes = param.Dict(default=None, allow_None=True, doc=""" 
        
                  Dictionary of one or two key dimension names to dtypes (e.g. int, float, datetime).""") 
        
               connector = param.ClassSelector(class_=Connector, allow_None=False) 
        
               annotation_table = param.ClassSelector(class_=AnnotationTable, allow_None=False) 
        
               region_types = param.ListSelector(default=['Range'], objects=['Range', 'Point'], doc="""

This logic has been moved to the specification parameter spec. Each dimension now contains information about the type and region for itself. Right now, this is what a spec looks like and how it is cleaned up in the clean_spec class method.

{
    # Range (two values)
    "A1": (np.float64, "range"),
    "A2": {"type": np.float64, "region": "range"},
    "A3": np.float64,  # Special case
    # Single
    "B1": (np.float64, "single"),
    "B2": {"type": np.float64, "region": "single"},
    # Multi
    ("C1", "D1"): {"type": np.float64, "region": "multi"},
    ("C2", "D2"): (np.float64, "multi"),
}
# Converted to:
{
    "A1": {"type": np.float64, "region": "range"},
    "A2": {"type": np.float64, "region": "range"},
    "A3": {"type": np.float64, "region": "range"},
    "B1": {"type": np.float64, "region": "single"},
    "B2": {"type": np.float64, "region": "single"},
    ("C1", "D1"): {"type": np.float64, "region": "multi"},
    ("C2", "D2"): {"type": np.float64, "region": "multi"},
}

Based on the number of dimensions passed and their respective region, AnnotatorElement should determine* if it should be a span, a line, a point or a polygon:

Two dimensions	x = Single	x = Range	x = Multi
y = Single	Point	Finite Hline	-
y = Range	Finite Vline	Rectangle	-
y = Multi	-	-	Polygon

Or only one dimension:

	Single	Range	Multi
Only x-dim	Vline	Vspan	-

The name of the regions is single, range, and multi. The change from having point -> single is the finite lines contain a single value in one of the dimensions. So I felt it was correct to call it single.

Right now, I am considering having the multi-region be a paired dimension, which is why it is defined as a tuple.

Annotator Tables region dataframe

The intermediate values of regions in AnnotationTable._region_df have changed from containing two dimensions to only containing one. An example would be this is how it looked before

	region_type	dim1	dim2	value	_id
0	Range	A	B	(1.0, 2.0, 3.0, 4.0)	b92372deb83045c7929c8d2092365b74

And this is how it looks now:

	region	dim	value	_id
0	range	A	(1.0, 2.0)	b92372deb83045c7929c8d2092365b74
1	range	B	(3.0, 4.0)	b92372deb83045c7929c8d2092365b74

The representation in the database schema and the output of Annotator.df remains unchanged.

Not in this PR:

All of these points will be looked into in future PRs.

The styling of annotators has not been a focus of this PR.
Some code could be optimized by vectorizing and reducing the amount of commit to the database.
Improving the vectorized elements has not been looked into, as it will need a rewrite when Vectorized VLines, HLines, VSpans and HSpans elements holoviews#5845 is merged.
Some cyclic references could still be in the code.
I have updated the examples with the new API, but I still need to update the corresponding text. Some sections are marked with 🚧, meaning the code has not been updated and will fail.
I want to have AnnotationElement and AnnotationTable only know about itself and the Annotator that initialized it. For this PR, it has not been a big focus, but it will be in the future.

*) This logic has not been implemented in this PR.

…ion'

holonote/tests/conftest.py

jlstevens · 2023-09-18T13:49:01Z

This is something we have discussed already @hoxbro but I dislike this 'single', 'range' and 'multi' terminology replacing the region types of 'point', 'range' and 'geometry'.

I understand that the word 'point' as a concept of a point in a n-dimensional space may be confused with a HoloViews Points element (though the former is singular and the latter is plural) so I can see the argument for a different name here. How about 'coordinate', 'range' and 'geometry' to represent a specific location in an n-dimensional space, an interval in an n-dimensional space and a polygon selection?

Co-authored-by: Simon Høxbro Hansen <[email protected]>

jlstevens · 2023-09-18T13:57:56Z

holonote/annotate/table.py

-            annotator.refresh(clear=clear)
+    # def refresh_annotators(self, clear=False):
+    #     for annotator in self._annotators.values():
+    #         annotator.refresh(clear=clear)


These commented lines can now be deleted not there will be only a single annotator instance.

Agree. I have just left this in to reduce the number of changes. I will follow up on this PR with the removal of this.

jlstevens · 2023-09-18T21:33:38Z

Also, instead of 'coordinate' maybe simply 'position' conveys the right concept?

jlstevens · 2023-09-18T21:41:05Z

holonote/tests/test_annotation_table.py

 import pandas as pd

 from holonote.annotate import AnnotationTable


-def test_table_region_df():
+def test_table_single_kdim() -> None:


What is the point of typing test methods? They will all return None!

Mostly, preparation for when a type checking is run as part of the CI.

jlstevens · 2023-09-18T21:47:52Z

holonote/annotate/annotator.py

        self.annotation_table.register_annotator(self)
        self.annotation_table.add_schema_to_conn(self.connector)

        if init:
            self.load()

+    @classmethod
+    def clean_spec(self, input_spec: dict[str, Any]) -> SpecDict:
+        """ Convert spec to a DataFrame with columns: type, region


Docstring says it returns a dataframe when it is returning a dict.

Also, I don't think it is cleaning anything. Maybe it should be called normalize_spec?

I had it as a DataFrame to start with, but that became too troublesome to work with.

Fine with the name change.

jlstevens · 2023-09-18T21:49:57Z

holonote/annotate/annotator.py

+    def set_regions(self, **items):
+        self._set_regions(**items)
+
+    def _set_regions(self, **items):


Isn't this redundant with set_regions?

This is loosely following the structure of the existing code.

In the Annotator class, set_regions will also call self.refresh. This will give problems when running some of the holoviews code where it will keep on refreshing.

I have made a small comment about it here:
https://github.com/holoviz/holonote/blob/decouple_plot_annotator/holonote/annotate/annotator.py/#L548

jlstevens · 2023-09-18T21:54:06Z

holonote/annotate/annotator.py

        if self.connector.primary_key.field_name not in fields:
            index_val = self.connector.primary_key(self.connector,
                                                   list(self.annotation_table._field_df.index))
            fields[self.connector.primary_key.field_name] = index_val

-        if self.region != self._last_region:
+        # Don't do anything if self.region is an empty dict
+        if self.region and self.region != self._last_region:


Shouldn't the line below should be removed?

if len(self.annotation_table._annotators)>1:

There can only be one annotator now...

Agree. I will clean it up in another PR.

This logic is still there to keep changes to a minimum.

jlstevens · 2023-09-18T21:56:20Z

holonote/annotate/annotator.py

+    def define_annotations(self, data: pd.DataFrame, **kwargs) -> None:
+        # Will both set regions and add annotations. Can accept multiple inputs
+        # if index is none a new index will be set.
+        # if nothing is given it infer the regions and fields from the column header.


What do you mean by 'infer the regions and fields from the column header'?

jlstevens · 2023-09-18T22:02:38Z

holonote/annotate/annotator.py

@@ -317,15 +408,28 @@ def commit(self, return_commits=False):
            return commits


-class AnnotatorPlot(AnnotatorInterface):
+class AnnotatorElement(param.Parameterized):


I don't understand this concept of an AnnotatorElement and how it is distinct from an Annotator. Whatever it is, it is not a holoviews element so I think this is confusing.

I've just noticed the kdims... once rect_min and rect_max are gone, isn't that the key feature of this class?

No, the key feature of the class is that the Annotator can have multiple AnnotatorElement associated with it. A small sketch of how I see the internal (note that some changes are still needed in follow-up PRs). The dashed line should be seen as something a normal user should not interact with.

As for the naming itself, I think it is fine, as I see it as a provider of an element to the annotator. In the same way, AnnotationTable is a provider of the tables even though it is not directly a table.

jlstevens · 2023-09-18T22:12:59Z

Here are some notes from my review:

define_fields is defined as legacy...why? It seems useful to be to be able to build up an annotator from different dataframes and often real data will be like this (it is unreasonably to always expect all the input data to be from the same dataframe). At any rate, if it is 'legacy' it should not be mentioned in the notebooks (e.g. in the Basics notebook). I'm not saying define_annotations should not also exist...
There needs to be some explanatory text in MultiPlot though this doesn't have to be added in this PR.
I would like to see single, range and multi renamed to position, range and geometry.

Other than those comments and the other questions/suggestions above, looks good!

hoxbro

define_fields is defined as legacy...why? It seems useful to be to be able to build up an annotator from different dataframes and often real data will be like this (it is unreasonably to always expect all the input data to be from the same dataframe). At any rate, if it is 'legacy' it should not be mentioned in the notebooks (e.g. in the Basics notebook). I'm not saying define_annotations should not also exist...

You can still run the define_annotations multiple times. The database and annotator.df, both follow this structure of having one table, whereas AnnotationTable internally will use two different data frames. In my personal opinion, it is easy to construct a dataframe.

I tried to keep away from changing the text and only the code itself, but tried to clean up define_ references in the, but clearly wasn't thorough enough. I will push a change with this.

There needs to be some explanatory text in MultiPlot though this doesn't have to be added in this PR.

Agree. The example notebooks (mostly text) must be updated with the new API. But didn't do it in this PR. I will open a new issue with it so I don't forget it.

I would like to see single, range and multi renamed to position, range and geometry.

I will make these changes.

hoxbro · 2023-09-19T08:03:15Z

holonote/tests/test_annotation_table.py

 import pandas as pd

 from holonote.annotate import AnnotationTable


-def test_table_region_df():
+def test_table_single_kdim() -> None:


Mostly, preparation for when a type checking is run as part of the CI.

hoxbro · 2023-09-19T08:07:59Z

holonote/annotate/annotator.py

        self.annotation_table.register_annotator(self)
        self.annotation_table.add_schema_to_conn(self.connector)

        if init:
            self.load()

+    @classmethod
+    def clean_spec(self, input_spec: dict[str, Any]) -> SpecDict:
+        """ Convert spec to a DataFrame with columns: type, region


I had it as a DataFrame to start with, but that became too troublesome to work with.

Fine with the name change.

hoxbro · 2023-09-19T08:12:16Z

holonote/annotate/annotator.py

+    def set_regions(self, **items):
+        self._set_regions(**items)
+
+    def _set_regions(self, **items):


This is loosely following the structure of the existing code.

In the Annotator class, set_regions will also call self.refresh. This will give problems when running some of the holoviews code where it will keep on refreshing.

I have made a small comment about it here:
https://github.com/holoviz/holonote/blob/decouple_plot_annotator/holonote/annotate/annotator.py/#L548

hoxbro · 2023-09-19T08:14:23Z

holonote/annotate/annotator.py

        if self.connector.primary_key.field_name not in fields:
            index_val = self.connector.primary_key(self.connector,
                                                   list(self.annotation_table._field_df.index))
            fields[self.connector.primary_key.field_name] = index_val

-        if self.region != self._last_region:
+        # Don't do anything if self.region is an empty dict
+        if self.region and self.region != self._last_region:


Agree. I will clean it up in another PR.

This logic is still there to keep changes to a minimum.

hoxbro · 2023-09-19T08:21:53Z

holonote/annotate/annotator.py

@@ -317,15 +408,28 @@ def commit(self, return_commits=False):
            return commits


-class AnnotatorPlot(AnnotatorInterface):
+class AnnotatorElement(param.Parameterized):


No, the key feature of the class is that the Annotator can have multiple AnnotatorElement associated with it. A small sketch of how I see the internal (note that some changes are still needed in follow-up PRs). The dashed line should be seen as something a normal user should not interact with.

As for the naming itself, I think it is fine, as I see it as a provider of an element to the annotator. In the same way, AnnotationTable is a provider of the tables even though it is not directly a table.

jlstevens · 2023-09-19T08:58:53Z

Thanks for the explanation: in that case I would suggest AnnotationDisplay instead of AnnotationElement.

jlstevens · 2023-09-19T09:33:47Z

Thanks for applying the suggestions! PR now looks good to me.

hoxbro added 30 commits August 29, 2023 20:44

First iteration of decoupling Annotator and Element

6b238f6

Make region_editor and selection_editor singleton

2be2ae2

Add some typehints

b5bf2c5

Added _make_empty_element

8d87ffc

Refactor and type hints

ef864c8

Small refactor and update code

16f5764

Handle multiple dims

e827965

Add spec as input

8f31a1e

Comment out require import for now

2ce9ff9

Implement spec parameter

de607df

Add_annotation now support new regions and spec format

3bce132

Don't use same object for self._last_region

96cfcaf

Handle combining region_df and field_df in annotations_table

881fda0

Update Annotator to work with new regions and spec format

d27c32f

Add some docstring

e58f26a

Change 'set_{range, point}' to use 'set_regions' and remove '_set_reg…

098cbb1

…ion'

Update tests to new framework

8f8669c

Update code to fix test

6034f24

Skip test which does not work with new framework

b65598f

Handle empty 2d range

9f9a292

Set regions function call now update the element

84eeabb

Test to verify set_regions gives a value in plot

8673b96

Add annotation_type to elements for debugging purposes

b276285

Remove annotation type and clean up test

08d9c73

Add more table tests

090d963

Better handling of datetime_types

ae50442

Remove kdims_dtypes to selective AnnotatorElement

3f8cb16

Update multiple_annotator

40f08f7

Add test for multiple element annotation

b7841a8

Add test_commit_update_set_region

147197c

hoxbro added 6 commits September 6, 2023 15:27

Handle no region

8cb32a9

Add construction sign to sections in examples

c146bfe

Improve deleting selected index

f8adb91

small changes

f532950

Pin Pandas<2.1

acacda7

Ignore wrong warning in older version of Numpy

fd4e903

jlstevens reviewed Sep 18, 2023

View reviewed changes

holonote/tests/conftest.py Outdated Show resolved Hide resolved

Applied suggested edit

a251d77

Co-authored-by: Simon Høxbro Hansen <[email protected]>

jlstevens reviewed Sep 18, 2023

View reviewed changes

hoxbro added 2 commits September 19, 2023 10:09

Rename to normalize_spec

59f63a4

Remove mention of define_fields

9336f06

hoxbro commented Sep 19, 2023

View reviewed changes

hoxbro added 2 commits September 19, 2023 10:48

Change single back to point

7e45ce1

Rename multi to geometry

06d41f4

Rename AnnotatorElement to AnnotationDisplay

658b40f

jlstevens approved these changes Sep 19, 2023

View reviewed changes

hoxbro merged commit a8fafa6 into main Sep 19, 2023
14 checks passed

hoxbro deleted the decouple_plot_annotator branch September 19, 2023 09:34

hoxbro mentioned this pull request Sep 19, 2023

Update examples code and text with the new API #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of new Annotator API #21

Implementation of new Annotator API #21

hoxbro commented Sep 13, 2023

jlstevens commented Sep 18, 2023

jlstevens Sep 18, 2023

hoxbro Sep 18, 2023

jlstevens commented Sep 18, 2023

jlstevens Sep 18, 2023

hoxbro Sep 19, 2023

jlstevens Sep 18, 2023

jlstevens Sep 18, 2023

hoxbro Sep 19, 2023

jlstevens Sep 18, 2023

hoxbro Sep 19, 2023

jlstevens Sep 18, 2023

hoxbro Sep 19, 2023

jlstevens Sep 18, 2023

jlstevens Sep 18, 2023

jlstevens Sep 18, 2023

hoxbro Sep 19, 2023

jlstevens commented Sep 18, 2023

hoxbro left a comment

hoxbro Sep 19, 2023

hoxbro Sep 19, 2023

hoxbro Sep 19, 2023

hoxbro Sep 19, 2023

hoxbro Sep 19, 2023

jlstevens commented Sep 19, 2023

jlstevens commented Sep 19, 2023

	kdim_dtypes = param.Dict(default=None, allow_None=True, doc="""
	Dictionary of one or two key dimension names to dtypes (e.g. int, float, datetime).""")

	connector = param.ClassSelector(class_=Connector, allow_None=False)

	annotation_table = param.ClassSelector(class_=AnnotationTable, allow_None=False)

	region_types = param.ListSelector(default=['Range'], objects=['Range', 'Point'], doc="""

Implementation of new Annotator API #21

Implementation of new Annotator API #21

Conversation

hoxbro commented Sep 13, 2023

Changed methods

Dimension specification

Annotator Tables region dataframe

Not in this PR:

jlstevens commented Sep 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Sep 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Sep 18, 2023

hoxbro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Sep 19, 2023

jlstevens commented Sep 19, 2023