Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous "fib" in data_description.modality for pure behavior sessions #1056

Open
2 of 4 tasks
hanhou opened this issue Nov 22, 2024 · 9 comments
Open
2 of 4 tasks
Assignees

Comments

@hanhou
Copy link
Collaborator

hanhou commented Nov 22, 2024

When I'm investigating another issue, I noticed that there are many sessions in docDB that have "fib" in data_description.modality but not in rig.modalities. For example:

{
    "_id": "625df1c2-1b38-4875-9a5f-09bbaca0e213",
    "data_description": {
      "name": "behavior_751152_2024-10-13_16-07-35",
      "modality": [
        {
          "name": "Behavior",
          "abbreviation": "behavior"
        },
        {
          "name": "Behavior videos",
          "abbreviation": "behavior-videos"
        },
        {
          "name": "Fiber photometry",
          "abbreviation": "fib"
        }
      ]
    },
    "rig": {
      "modalities": [
        {
          "name": "Behavior",
          "abbreviation": "behavior"
        },
        {
          "name": "Behavior videos",
          "abbreviation": "behavior-videos"
        }
      ]
    }
  },
  {
    "_id": "fbb5bb77-fef3-4f3b-93f2-560a8565488d",
    "data_description": {
      "name": "behavior_764787_2024-10-15_13-07-30",
      "modality": [
        {
          "name": "Behavior",
          "abbreviation": "behavior"
        },
        {
          "name": "Behavior videos",
          "abbreviation": "behavior-videos"
        },
        {
          "name": "Fiber photometry",
          "abbreviation": "fib"
        }
      ]
    },
    "rig": {
      "modalities": [
        {
          "name": "Behavior",
          "abbreviation": "behavior"
        },
        {
          "name": "Behavior videos",
          "abbreviation": "behavior-videos"
        }
      ]
    }
  }

Here is the query showing all sessions with this problem.

We should:

@hanhou hanhou changed the title Missing "fib" modality in rig.json Missing "fib" modality in rig.modalities Nov 22, 2024
@alexpiet
Copy link
Collaborator

@hanhou Your query of sessions with this problem are exclusively from non-FIP rigs. The FIP rigs are 1D, 2D, 3D, 6D, 7D, 8D which do not show up in the log.

So the problem is with the data_description.json, not the rig.json

@XX-Yin
Copy link
Collaborator

XX-Yin commented Nov 22, 2024

@hanhou @alexpiet Previously, the session description was generated locally, and the code would exclude the fib modality if there was no fib data. That functionality has been removed and replaced by online generation during upload, which may be the source of the problem.

@hanhou
Copy link
Collaborator Author

hanhou commented Nov 22, 2024

@hanhou Your query of sessions with this problem are exclusively from non-FIP rigs. The FIP rigs are 1D, 2D, 3D, 6D, 7D, 8D which do not show up in the log.

So the problem is with the data_description.json, not the rig.json

Interesting! Then all make sense to me. Here may be what happened:

  1. Prior to this PR, we incorrectly added "fib" to "data_description.modalities" of pure behavior sessions without FIP.
  2. In the Streamlit app, to compare with pure behavior and behavior + FIP sessions in my temporary pipeline, we should have queried "behavior" from data_description.modalities, but we actually used "fib". Interestingly, two wrongs make a right: because of 1, we actually queried all behavior-including sessions. So for a long time, sessions from both pipelines seemed to match in Streamlit.
  3. Recently, after the PR, point 1 was fixed and we started to see discrepancies in Streamlit.

@XX-Yin
Copy link
Collaborator

XX-Yin commented Nov 22, 2024

I speculate that the uploading process added the fib modality to the data description by default if there is a fib folder.

@alexpiet If there is no fib data, have we already excluded the fib folder in the manifest file?

@hanhou hanhou changed the title Missing "fib" modality in rig.modalities Erroneous "fib" in data_description.modality for pure behavior sessions Nov 22, 2024
@hanhou
Copy link
Collaborator Author

hanhou commented Nov 22, 2024

I speculate that the uploading process added the fib modality to the data description by default if there is a fib folder.

@alexpiet If there is no fib data, have we already excluded the fib folder in the manifest file?

Do you think this PR has fixed the problem?

@alexpiet
Copy link
Collaborator

alexpiet commented Nov 22, 2024

Yes, my understanding is that PR fixed the problem. The modalities in data_description will be the modalities listed in the upload manifest. Before that PR, the modalities in the upload manifest were hard coded to include fip and video for every session. Now we set them based on the data streams

Spot checking actual manifests from this week, they correctly do NOT list FIP for behavior only sessions

@hanhou
Copy link
Collaborator Author

hanhou commented Nov 23, 2024

The correct way of querying "FIP" sessions

Since the data_description.modality was unreliable, at this point, the correct query might be

{"session.data_streams.stream_modalities.abbreviation": "fib"}

To validate, for sessions which have "fib" as a stream, all of them should have "fib" in rig.modalities, if "rig" exists. In other words, this query
https://api.allenneuraldynamics.org/v1/metadata_index/data_assets?filter={"session.data_streams.stream_modalities.abbreviation":"fib","rig":{"$ne":null},"rig.modalities.abbreviation":{"$ne":"fib"}}
should be empty. And it is.

(for comparison, as noted in the body of the current issue, sessions which have "fib" in data_description.modality do not always have "fib" in rig.modalities)

However, this validation is only necessary, but not sufficient, for the proposed query to be correct.

@alexpiet @XX-Yin @rachelstephlee @hagikent any ideas?

@rachelstephlee
Copy link
Contributor

rachelstephlee commented Nov 26, 2024

Discussed in kanban meeting 11/26

David says the fix forward has been done.
David says he's volunteered for the fixes in the past. I will defer to @dyf for how we can correct the queries.

@hanhou
Copy link
Collaborator Author

hanhou commented Dec 18, 2024

In this plot, the red circle should be inside the green circle and overlap perfectly with the blue circle.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants