-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of new Annotator API #21
Conversation
This is something we have discussed already @hoxbro but I dislike this 'single', 'range' and 'multi' terminology replacing the region types of 'point', 'range' and 'geometry'. I understand that the word 'point' as a concept of a point in a n-dimensional space may be confused with a HoloViews |
Co-authored-by: Simon Høxbro Hansen <[email protected]>
annotator.refresh(clear=clear) | ||
# def refresh_annotators(self, clear=False): | ||
# for annotator in self._annotators.values(): | ||
# annotator.refresh(clear=clear) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commented lines can now be deleted not there will be only a single annotator instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I have just left this in to reduce the number of changes. I will follow up on this PR with the removal of this.
Also, instead of 'coordinate' maybe simply 'position' conveys the right concept? |
import pandas as pd | ||
|
||
from holonote.annotate import AnnotationTable | ||
|
||
|
||
def test_table_region_df(): | ||
def test_table_single_kdim() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of typing test methods? They will all return None
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, preparation for when a type checking is run as part of the CI.
holonote/annotate/annotator.py
Outdated
self.annotation_table.register_annotator(self) | ||
self.annotation_table.add_schema_to_conn(self.connector) | ||
|
||
if init: | ||
self.load() | ||
|
||
@classmethod | ||
def clean_spec(self, input_spec: dict[str, Any]) -> SpecDict: | ||
""" Convert spec to a DataFrame with columns: type, region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring says it returns a dataframe when it is returning a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think it is cleaning anything. Maybe it should be called normalize_spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it as a DataFrame to start with, but that became too troublesome to work with.
Fine with the name change.
def set_regions(self, **items): | ||
self._set_regions(**items) | ||
|
||
def _set_regions(self, **items): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant with set_regions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is loosely following the structure of the existing code.
In the Annotator
class, set_regions
will also call self.refresh
. This will give problems when running some of the holoviews code where it will keep on refreshing.
I have made a small comment about it here:
https://github.com/holoviz/holonote/blob/decouple_plot_annotator/holonote/annotate/annotator.py/#L548
if self.connector.primary_key.field_name not in fields: | ||
index_val = self.connector.primary_key(self.connector, | ||
list(self.annotation_table._field_df.index)) | ||
fields[self.connector.primary_key.field_name] = index_val | ||
|
||
if self.region != self._last_region: | ||
# Don't do anything if self.region is an empty dict | ||
if self.region and self.region != self._last_region: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the line below should be removed?
if len(self.annotation_table._annotators)>1:
There can only be one annotator now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I will clean it up in another PR.
This logic is still there to keep changes to a minimum.
def define_annotations(self, data: pd.DataFrame, **kwargs) -> None: | ||
# Will both set regions and add annotations. Can accept multiple inputs | ||
# if index is none a new index will be set. | ||
# if nothing is given it infer the regions and fields from the column header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by 'infer the regions and fields from the column header'?
holonote/annotate/annotator.py
Outdated
@@ -317,15 +408,28 @@ def commit(self, return_commits=False): | |||
return commits | |||
|
|||
|
|||
class AnnotatorPlot(AnnotatorInterface): | |||
class AnnotatorElement(param.Parameterized): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this concept of an AnnotatorElement
and how it is distinct from an Annotator
. Whatever it is, it is not a holoviews element so I think this is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just noticed the kdims
... once rect_min
and rect_max
are gone, isn't that the key feature of this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the key feature of the class is that the Annotator
can have multiple AnnotatorElement
associated with it. A small sketch of how I see the internal (note that some changes are still needed in follow-up PRs). The dashed line should be seen as something a normal user should not interact with.
As for the naming itself, I think it is fine, as I see it as a provider of an element to the annotator. In the same way, AnnotationTable
is a provider of the tables even though it is not directly a table.
Here are some notes from my review:
Other than those comments and the other questions/suggestions above, looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define_fields is defined as legacy...why? It seems useful to be to be able to build up an annotator from different dataframes and often real data will be like this (it is unreasonably to always expect all the input data to be from the same dataframe). At any rate, if it is 'legacy' it should not be mentioned in the notebooks (e.g. in the Basics notebook). I'm not saying define_annotations should not also exist...
You can still run the define_annotations
multiple times. The database
and annotator.df
, both follow this structure of having one table, whereas AnnotationTable internally will use two different data frames. In my personal opinion, it is easy to construct a dataframe.
I tried to keep away from changing the text and only the code itself, but tried to clean up define_
references in the, but clearly wasn't thorough enough. I will push a change with this.
There needs to be some explanatory text in MultiPlot though this doesn't have to be added in this PR.
Agree. The example notebooks (mostly text) must be updated with the new API. But didn't do it in this PR. I will open a new issue with it so I don't forget it.
I would like to see single, range and multi renamed to position, range and geometry.
I will make these changes.
import pandas as pd | ||
|
||
from holonote.annotate import AnnotationTable | ||
|
||
|
||
def test_table_region_df(): | ||
def test_table_single_kdim() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, preparation for when a type checking is run as part of the CI.
holonote/annotate/annotator.py
Outdated
self.annotation_table.register_annotator(self) | ||
self.annotation_table.add_schema_to_conn(self.connector) | ||
|
||
if init: | ||
self.load() | ||
|
||
@classmethod | ||
def clean_spec(self, input_spec: dict[str, Any]) -> SpecDict: | ||
""" Convert spec to a DataFrame with columns: type, region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it as a DataFrame to start with, but that became too troublesome to work with.
Fine with the name change.
def set_regions(self, **items): | ||
self._set_regions(**items) | ||
|
||
def _set_regions(self, **items): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is loosely following the structure of the existing code.
In the Annotator
class, set_regions
will also call self.refresh
. This will give problems when running some of the holoviews code where it will keep on refreshing.
I have made a small comment about it here:
https://github.com/holoviz/holonote/blob/decouple_plot_annotator/holonote/annotate/annotator.py/#L548
if self.connector.primary_key.field_name not in fields: | ||
index_val = self.connector.primary_key(self.connector, | ||
list(self.annotation_table._field_df.index)) | ||
fields[self.connector.primary_key.field_name] = index_val | ||
|
||
if self.region != self._last_region: | ||
# Don't do anything if self.region is an empty dict | ||
if self.region and self.region != self._last_region: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I will clean it up in another PR.
This logic is still there to keep changes to a minimum.
holonote/annotate/annotator.py
Outdated
@@ -317,15 +408,28 @@ def commit(self, return_commits=False): | |||
return commits | |||
|
|||
|
|||
class AnnotatorPlot(AnnotatorInterface): | |||
class AnnotatorElement(param.Parameterized): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the key feature of the class is that the Annotator
can have multiple AnnotatorElement
associated with it. A small sketch of how I see the internal (note that some changes are still needed in follow-up PRs). The dashed line should be seen as something a normal user should not interact with.
As for the naming itself, I think it is fine, as I see it as a provider of an element to the annotator. In the same way, AnnotationTable
is a provider of the tables even though it is not directly a table.
Thanks for the explanation: in that case I would suggest |
Thanks for applying the suggestions! PR now looks good to me. |
resolves #16
This PR adds the API, so it is possible to work with multiple regions across multiple elements with a single
Annotator
. A small example is added inMultiPlot.ipynb
, to show the new API off.In general, I have tried to keep myself limited to only making changes in annotator.py, but naturally, a redesign like this will also affect other parts of the codebase. The other significant change is the format of
region_df
inAnnotatorTable
. See section the Dimension specification for more information.Changed methods
Some methods have been moved from the main
Annotator
class to theAnnotatorElement
where it makes sense. Some logic has also been moved fromAnnotator
intoAnnotatorInterface
. In my mind,AnnotatorInterface
should be able to work by itself, and theAnnotator
should expand the API to work with Holoviews elements.Annotator.set_range
andAnnotator.set_point
are superseded byAnnotator.set_regions
. Both of the former have a limited interface that can only handle up to two dimensions and very tailored logic on whether the number of dimensions is one or two. I have updated them both to use theAnnotator.set_regions
, mainly to not mess around too much with the tests in this PR. I plan to remove them entirely in a follow-up PR.Annotator.define_ranges
,Annotator.define_points
, andAnnotator.define_fields
have all been merged into one API inAnnotator.define_annotations
. The methods have been left untouched (except a small print statement) as the tests mainly check bad inputs, e.g., you need to rundefine_fields
beforedefine_{ranges, points}
. The tests do not work with the new API, and I plan to remove them in a follow-up PR.Dimension specification
Some major redesign has been needed to support the new API. The two main changes have been
Annotator
class.Annotator
.In the previous implementation, the
Annotator
had two parameters for this:kdim_dtypes
andregion_type
.holonote/holonote/annotate/annotator.py
Lines 100 to 107 in a502bdb
This logic has been moved to the specification parameter
spec
. Each dimension now contains information about the type and region for itself. Right now, this is what aspec
looks like and how it is cleaned up in theclean_spec
class method.Based on the number of dimensions passed and their respective region,
AnnotatorElement
should determine* if it should be a span, a line, a point or a polygon:Or only one dimension:
The name of the regions is
single
,range
, andmulti
. The change from havingpoint
->single
is the finite lines contain a single value in one of the dimensions. So I felt it was correct to call itsingle
.Right now, I am considering having the multi-region be a paired dimension, which is why it is defined as a tuple.
Annotator Tables region dataframe
The intermediate values of regions in
AnnotationTable._region_df
have changed from containing two dimensions to only containing one. An example would be this is how it looked beforeAnd this is how it looks now:
The representation in the database schema and the output of
Annotator.df
remains unchanged.Not in this PR:
All of these points will be looked into in future PRs.
AnnotationElement
andAnnotationTable
only know about itself and theAnnotator
that initialized it. For this PR, it has not been a big focus, but it will be in the future.*) This logic has not been implemented in this PR.