Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObservationLevel: Object? #390

Open
ddachs opened this issue Oct 14, 2024 · 8 comments
Open

ObservationLevel: Object? #390

ddachs opened this issue Oct 14, 2024 · 8 comments
Assignees

Comments

@ddachs
Copy link

ddachs commented Oct 14, 2024

I am curious if we are overlooking the (so far) most detailed observation level, specifically one that pertains to objects within media files. Currently, CamtrapDP supports two levels: media and event. When identifying objects in a media file (e.g., using MegaDetector), the count in the observation table will always be 1. For this reason, we moved from the media level to a more granular object level. I believe this distinction is crucial when generating count values for events, as having data at the media or object level call for different approaches.

@peterdesmet
Copy link
Member

peterdesmet commented Oct 14, 2024

Can you clarify you question? Here's an attempt at providing info 😄

  1. It is possible to express object level observations in Camtrap DP. For that you use media-based observation (observationLevel = media, mediaID = not NULL) and use using bboxX, bboxY, bboxWidth, bboxHeight to indicate where the object was observed. All remaining properties in the observation then apply to that object. Example:

0e98b93e_1,62c200a9,0e98b93e,4dcacd8f,2021-04-05T19:08:33Z,2021-04-05T19:08:33Z,media,animal,,Ardea,1,,,,,,,,0.32325,0.46962,0.2744,0.25189,machine,Western Europe species model Version 1,2021-07-05T11:43:57Z,0.9,,
be8fe0df_1,62c200a9,be8fe0df,4dcacd8f,2021-04-05T19:08:34Z,2021-04-05T19:08:34Z,media,animal,,Ardea,1,,,,,,,,0.39899,0.4774,0.08314,0.27078,machine,Western Europe species model Version 1,2021-07-05T11:43:57Z,0.85,,
13b58ff5_1,62c200a9,13b58ff5,4dcacd8f,2021-04-05T19:08:34Z,2021-04-05T19:08:34Z,media,animal,,Ardea,1,,,,,,,,0.41081,0.47909,0.06936,0.29999,machine,Western Europe species model Version 1,2021-07-05T11:43:57Z,0.86,,
d8427514_1,62c200a9,d8427514,4dcacd8f,2021-04-05T19:08:35Z,2021-04-05T19:08:35Z,media,animal,,Ardea,1,,,,,,,,0.41001,0.48082,0.07038,0.26966,machine,Western Europe species model Version 1,2021-07-05T11:43:57Z,0.83,,

This can be used to draw the bounding box: https://camtrap-dp.tdwg.org/example/62c200a9/#7245a2aa

  1. Regarding how individualCount should be summed (without over-summing), we currently state the following in observationLevel:

Level at which the observation was classified. media for media-based observations that are directly associated with a media file (mediaID). These are especially useful for machine learning and don't need to be mutually exclusive (e.g. multiple classifications are allowed). event for event-based observations that consider an event (comprising a collection of media files). These are especially useful for ecological research and should be mutually exclusive, so that their count can be summed.

On point 2, we may be moving towards an approach where observations.csv contains the biologically relevant information (one truth, with clear approach on how to create events). While for machine learning, it is useful to have all classifications (multiple truths), which are provided in an annotations.csv. See a very early draft proposal at #389

@kbubnicki
Copy link
Contributor

Hey, as I understand @ddachs suggests to extend the list of possible options for the observationLevel field with object (or sth like sub-media) to better guide users when they do data aggregations by themselves. Correct @ddachs ?

@ddachs
Copy link
Author

ddachs commented Oct 14, 2024

@kbubnicki correct!

@peterdesmet
Copy link
Member

Makes sense to expand it to object, but we should think about terminology:

  • Spatial region of interest: object?
  • Temporal region of interest (indicated with eventStart/eventEnd), useful in video (and maybe later audio) files: fragment?
  • Is it necessary to differentiate between the two in observationLevel? Is submedia enough?
  • Are combinations of the spatial and temporal possible? Theoretically yes, if you use an temporal interval that identifies a frame in a video and then indicate the object. Or is it expected that for such use cases, the video is converted to images?

@ddachs
Copy link
Author

ddachs commented Oct 14, 2024

  1. object is a spatial region in a picture or video frame.
  2. a temporal region to me in clearly an interval. (I got used to lubridates terminology)
  3. good question regarding submedia: On the one hand submedia keeps the hierarchichal terminology (event > media > submedia), but on the other hand, lacks the information what it exactly is. I prefer the differentiation.
  4. I hope, that the combination is possible. Our observation tables will explode, if we split all videos into single frames.

@peterdesmet
Copy link
Member

peterdesmet commented Oct 15, 2024

  1. Agree, interval is the best term

  1. If combinations are possible, what term to use then? object or interval?

@ddachs
Copy link
Author

ddachs commented Dec 12, 2024

In this context, I would prioritize the object rather than the interval, as the object is of primary interest. The combination of objects and their temporal presence is particularly relevant in videos, where object tracking is a key technology. Object tracking involves linking the spatial information of objects across individual video frames over time (temporal dimension). Reflecting on this, it becomes clear that we will need to store comprehensive information about where and when an object appears in the video. The exact format will likely depend on the conventions used in Computer Vision for handling such data. Perhaps a vector of bounding box information, with its length corresponding to the number of frames in the video, could be a suitable approach?

@peterdesmet
Copy link
Member

@ddachs would a eventStart/eventEnd be sufficient to locate when an object appears in a video?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Consider?
Development

No branches or pull requests

3 participants