Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of new Annotator API #21

Merged
merged 61 commits into from
Sep 19, 2023
Merged

Implementation of new Annotator API #21

merged 61 commits into from
Sep 19, 2023

Conversation

hoxbro
Copy link
Member

@hoxbro hoxbro commented Sep 13, 2023

resolves #16

This PR adds the API, so it is possible to work with multiple regions across multiple elements with a single Annotator. A small example is added in MultiPlot.ipynb, to show the new API off.

In general, I have tried to keep myself limited to only making changes in annotator.py, but naturally, a redesign like this will also affect other parts of the codebase. The other significant change is the format of region_df in AnnotatorTable. See section the Dimension specification for more information.

Changed methods

Some methods have been moved from the main Annotator class to the AnnotatorElement where it makes sense. Some logic has also been moved from Annotator into AnnotatorInterface. In my mind, AnnotatorInterface should be able to work by itself, and the Annotator should expand the API to work with Holoviews elements.

Annotator.set_range and Annotator.set_point are superseded by Annotator.set_regions. Both of the former have a limited interface that can only handle up to two dimensions and very tailored logic on whether the number of dimensions is one or two. I have updated them both to use the Annotator.set_regions, mainly to not mess around too much with the tests in this PR. I plan to remove them entirely in a follow-up PR.

Annotator.define_ranges, Annotator.define_points, and Annotator.define_fields have all been merged into one API in Annotator.define_annotations. The methods have been left untouched (except a small print statement) as the tests mainly check bad inputs, e.g., you need to run define_fields before define_{ranges, points}. The tests do not work with the new API, and I plan to remove them in a follow-up PR.

Dimension specification

Some major redesign has been needed to support the new API. The two main changes have been

  1. Not be limited to two dimensions in the Annotator class.
  2. Change the type and region so it is associated with the dimension itself and not the Annotator.

In the previous implementation, the Annotator had two parameters for this: kdim_dtypes and region_type.

kdim_dtypes = param.Dict(default=None, allow_None=True, doc="""
Dictionary of one or two key dimension names to dtypes (e.g. int, float, datetime).""")
connector = param.ClassSelector(class_=Connector, allow_None=False)
annotation_table = param.ClassSelector(class_=AnnotationTable, allow_None=False)
region_types = param.ListSelector(default=['Range'], objects=['Range', 'Point'], doc="""

This logic has been moved to the specification parameter spec. Each dimension now contains information about the type and region for itself. Right now, this is what a spec looks like and how it is cleaned up in the clean_spec class method.

{
    # Range (two values)
    "A1": (np.float64, "range"),
    "A2": {"type": np.float64, "region": "range"},
    "A3": np.float64,  # Special case
    # Single
    "B1": (np.float64, "single"),
    "B2": {"type": np.float64, "region": "single"},
    # Multi
    ("C1", "D1"): {"type": np.float64, "region": "multi"},
    ("C2", "D2"): (np.float64, "multi"),
}
# Converted to:
{
    "A1": {"type": np.float64, "region": "range"},
    "A2": {"type": np.float64, "region": "range"},
    "A3": {"type": np.float64, "region": "range"},
    "B1": {"type": np.float64, "region": "single"},
    "B2": {"type": np.float64, "region": "single"},
    ("C1", "D1"): {"type": np.float64, "region": "multi"},
    ("C2", "D2"): {"type": np.float64, "region": "multi"},
}

Based on the number of dimensions passed and their respective region, AnnotatorElement should determine* if it should be a span, a line, a point or a polygon:

Two dimensions x = Single x = Range x = Multi
y = Single Point Finite Hline -
y = Range Finite Vline Rectangle -
y = Multi - - Polygon

Or only one dimension:

Single Range Multi
Only x-dim Vline Vspan -

The name of the regions is single, range, and multi. The change from having point -> single is the finite lines contain a single value in one of the dimensions. So I felt it was correct to call it single.

Right now, I am considering having the multi-region be a paired dimension, which is why it is defined as a tuple.

Annotator Tables region dataframe

The intermediate values of regions in AnnotationTable._region_df have changed from containing two dimensions to only containing one. An example would be this is how it looked before

region_type dim1 dim2 value _id
0 Range A B (1.0, 2.0, 3.0, 4.0) b92372deb83045c7929c8d2092365b74

And this is how it looks now:

region dim value _id
0 range A (1.0, 2.0) b92372deb83045c7929c8d2092365b74
1 range B (3.0, 4.0) b92372deb83045c7929c8d2092365b74

The representation in the database schema and the output of Annotator.df remains unchanged.

Not in this PR:

All of these points will be looked into in future PRs.

  1. The styling of annotators has not been a focus of this PR.
  2. Some code could be optimized by vectorizing and reducing the amount of commit to the database.
  3. Improving the vectorized elements has not been looked into, as it will need a rewrite when Vectorized VLines, HLines, VSpans and HSpans elements holoviews#5845 is merged.
  4. Some cyclic references could still be in the code.
  5. I have updated the examples with the new API, but I still need to update the corresponding text. Some sections are marked with 🚧, meaning the code has not been updated and will fail.
  6. I want to have AnnotationElement and AnnotationTable only know about itself and the Annotator that initialized it. For this PR, it has not been a big focus, but it will be in the future.

*) This logic has not been implemented in this PR.

@jlstevens
Copy link
Contributor

This is something we have discussed already @hoxbro but I dislike this 'single', 'range' and 'multi' terminology replacing the region types of 'point', 'range' and 'geometry'.

I understand that the word 'point' as a concept of a point in a n-dimensional space may be confused with a HoloViews Points element (though the former is singular and the latter is plural) so I can see the argument for a different name here. How about 'coordinate', 'range' and 'geometry' to represent a specific location in an n-dimensional space, an interval in an n-dimensional space and a polygon selection?

Co-authored-by: Simon Høxbro Hansen <[email protected]>
annotator.refresh(clear=clear)
# def refresh_annotators(self, clear=False):
# for annotator in self._annotators.values():
# annotator.refresh(clear=clear)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commented lines can now be deleted not there will be only a single annotator instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I have just left this in to reduce the number of changes. I will follow up on this PR with the removal of this.

@jlstevens
Copy link
Contributor

Also, instead of 'coordinate' maybe simply 'position' conveys the right concept?

import pandas as pd

from holonote.annotate import AnnotationTable


def test_table_region_df():
def test_table_single_kdim() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of typing test methods? They will all return None!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, preparation for when a type checking is run as part of the CI.

self.annotation_table.register_annotator(self)
self.annotation_table.add_schema_to_conn(self.connector)

if init:
self.load()

@classmethod
def clean_spec(self, input_spec: dict[str, Any]) -> SpecDict:
""" Convert spec to a DataFrame with columns: type, region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says it returns a dataframe when it is returning a dict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think it is cleaning anything. Maybe it should be called normalize_spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it as a DataFrame to start with, but that became too troublesome to work with.

Fine with the name change.

def set_regions(self, **items):
self._set_regions(**items)

def _set_regions(self, **items):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant with set_regions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is loosely following the structure of the existing code.

In the Annotator class, set_regions will also call self.refresh. This will give problems when running some of the holoviews code where it will keep on refreshing.

I have made a small comment about it here:
https://github.com/holoviz/holonote/blob/decouple_plot_annotator/holonote/annotate/annotator.py/#L548

if self.connector.primary_key.field_name not in fields:
index_val = self.connector.primary_key(self.connector,
list(self.annotation_table._field_df.index))
fields[self.connector.primary_key.field_name] = index_val

if self.region != self._last_region:
# Don't do anything if self.region is an empty dict
if self.region and self.region != self._last_region:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the line below should be removed?

if len(self.annotation_table._annotators)>1:

There can only be one annotator now...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will clean it up in another PR.

This logic is still there to keep changes to a minimum.

def define_annotations(self, data: pd.DataFrame, **kwargs) -> None:
# Will both set regions and add annotations. Can accept multiple inputs
# if index is none a new index will be set.
# if nothing is given it infer the regions and fields from the column header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'infer the regions and fields from the column header'?

@@ -317,15 +408,28 @@ def commit(self, return_commits=False):
return commits


class AnnotatorPlot(AnnotatorInterface):
class AnnotatorElement(param.Parameterized):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this concept of an AnnotatorElement and how it is distinct from an Annotator. Whatever it is, it is not a holoviews element so I think this is confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just noticed the kdims... once rect_min and rect_max are gone, isn't that the key feature of this class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the key feature of the class is that the Annotator can have multiple AnnotatorElement associated with it. A small sketch of how I see the internal (note that some changes are still needed in follow-up PRs). The dashed line should be seen as something a normal user should not interact with.

1

As for the naming itself, I think it is fine, as I see it as a provider of an element to the annotator. In the same way, AnnotationTable is a provider of the tables even though it is not directly a table.

@jlstevens
Copy link
Contributor

Here are some notes from my review:

  • define_fields is defined as legacy...why? It seems useful to be to be able to build up an annotator from different dataframes and often real data will be like this (it is unreasonably to always expect all the input data to be from the same dataframe). At any rate, if it is 'legacy' it should not be mentioned in the notebooks (e.g. in the Basics notebook). I'm not saying define_annotations should not also exist...
  • There needs to be some explanatory text in MultiPlot though this doesn't have to be added in this PR.
  • I would like to see single, range and multi renamed to position, range and geometry.

Other than those comments and the other questions/suggestions above, looks good!

Copy link
Member Author

@hoxbro hoxbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define_fields is defined as legacy...why? It seems useful to be to be able to build up an annotator from different dataframes and often real data will be like this (it is unreasonably to always expect all the input data to be from the same dataframe). At any rate, if it is 'legacy' it should not be mentioned in the notebooks (e.g. in the Basics notebook). I'm not saying define_annotations should not also exist...

You can still run the define_annotations multiple times. The database and annotator.df, both follow this structure of having one table, whereas AnnotationTable internally will use two different data frames. In my personal opinion, it is easy to construct a dataframe.

I tried to keep away from changing the text and only the code itself, but tried to clean up define_ references in the, but clearly wasn't thorough enough. I will push a change with this.

There needs to be some explanatory text in MultiPlot though this doesn't have to be added in this PR.

Agree. The example notebooks (mostly text) must be updated with the new API. But didn't do it in this PR. I will open a new issue with it so I don't forget it.

I would like to see single, range and multi renamed to position, range and geometry.

I will make these changes.

import pandas as pd

from holonote.annotate import AnnotationTable


def test_table_region_df():
def test_table_single_kdim() -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, preparation for when a type checking is run as part of the CI.

self.annotation_table.register_annotator(self)
self.annotation_table.add_schema_to_conn(self.connector)

if init:
self.load()

@classmethod
def clean_spec(self, input_spec: dict[str, Any]) -> SpecDict:
""" Convert spec to a DataFrame with columns: type, region
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it as a DataFrame to start with, but that became too troublesome to work with.

Fine with the name change.

def set_regions(self, **items):
self._set_regions(**items)

def _set_regions(self, **items):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is loosely following the structure of the existing code.

In the Annotator class, set_regions will also call self.refresh. This will give problems when running some of the holoviews code where it will keep on refreshing.

I have made a small comment about it here:
https://github.com/holoviz/holonote/blob/decouple_plot_annotator/holonote/annotate/annotator.py/#L548

if self.connector.primary_key.field_name not in fields:
index_val = self.connector.primary_key(self.connector,
list(self.annotation_table._field_df.index))
fields[self.connector.primary_key.field_name] = index_val

if self.region != self._last_region:
# Don't do anything if self.region is an empty dict
if self.region and self.region != self._last_region:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will clean it up in another PR.

This logic is still there to keep changes to a minimum.

@@ -317,15 +408,28 @@ def commit(self, return_commits=False):
return commits


class AnnotatorPlot(AnnotatorInterface):
class AnnotatorElement(param.Parameterized):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the key feature of the class is that the Annotator can have multiple AnnotatorElement associated with it. A small sketch of how I see the internal (note that some changes are still needed in follow-up PRs). The dashed line should be seen as something a normal user should not interact with.

1

As for the naming itself, I think it is fine, as I see it as a provider of an element to the annotator. In the same way, AnnotationTable is a provider of the tables even though it is not directly a table.

@jlstevens
Copy link
Contributor

Thanks for the explanation: in that case I would suggest AnnotationDisplay instead of AnnotationElement.

@jlstevens
Copy link
Contributor

Thanks for applying the suggestions! PR now looks good to me.

@hoxbro hoxbro merged commit a8fafa6 into main Sep 19, 2023
14 checks passed
@hoxbro hoxbro deleted the decouple_plot_annotator branch September 19, 2023 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal for new Annotator API
2 participants