Fix issue #772, Incorrect zeros included in emsd average when particles are dropped #773

vivarose · 2024-08-18T22:02:19Z

I've tested this code with imsd() and emsd() on test data and it works, and on actual movie data and it is more accurate but it doesn't make as big a difference as I feared it would. So Issue #772 isn't so bad unless there are a lot of gaps in the data.

Pros of accepting my pull-request: Slightly more accurate emsd data! If there are a lot of gaps, then it's MUCH MUCH more accurate.

Cons: It takes slightly more time to run. With timit, I see 30.6 ms ± 1.55 ms per loop before my edits versus 32.1 ms ± 1.43 ms per loop after my edits.

Issue #772 occurs when there are gaps in the data, so the problem shows up in _msd_gaps().

Here are the edits:
I updated _msd_gaps() with "skipna=False" on the sum() function. This way values based on a lack of data will show up as NaN (instead of showing up as zero).

I also added some code to reset the estimated number of datapoints N to be 0 if there is no data.
result['N'] = np.where(result['msd'].isna(), 0, result['N'])

I was hoping this would be enough but it created another problem that I had to take care of: When the emsd is calculated, there's a problem because we are averaging in a NaN (albeit weighted as N=0), and it turns out NaN * 0 = NaN. I wanted it to come out to zero. So I had to reset the NaN to zero again. Awkward, but it works. Then it's 0*0 = 0, which is the weight I wanted, and it calculates the emsd correctly.

# remove np.nan because it would make the rest of the calculation break
msds['msd'] = np.where(msds['msd'].isna(), 0, msds['msd'])

So this update does provide a more accurate emsd() calculation, and it removes the weird DROPS that show up in the imsd data (see Walkthrough output 31).

I plan to use this version going forward because I can't stand inaccuracies!

vivarose · 2024-08-18T23:11:09Z

It looks like the NaN makes the tests unhappy.

I prefer the NaN because an MSD of 0 means that there was a particle that didn't move, but we are really dealing with a situation of a particle that wasn't observed. But if it makes the tests unhappy, then, what can I do?

vivarose · 2024-08-18T23:37:42Z

I see that my attempt to appease the checks (#774) didn't work. That makes me think that my pull-request isn't the problem. Maybe there is something else going on.

I'm re-opening this pull-request. If you are trying to decide whether to incorporate one of these pull-requests, this is the one that leaves NaN in the imsd whereas #774 is the one that uses 0 instead. Both versions fix Issue #772.

Update motion.py

b9a43fc

vivarose closed this Aug 18, 2024

vivarose mentioned this pull request Aug 18, 2024

Fix Issue #772, alternative that doesn't affect imsd() but still correctly calculates weighted average for emsd() #774

Open

vivarose reopened this Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue #772, Incorrect zeros included in emsd average when particles are dropped #773

Fix issue #772, Incorrect zeros included in emsd average when particles are dropped #773

vivarose commented Aug 18, 2024

vivarose commented Aug 18, 2024

vivarose commented Aug 18, 2024

Fix issue #772, Incorrect zeros included in emsd average when particles are dropped #773

Are you sure you want to change the base?

Fix issue #772, Incorrect zeros included in emsd average when particles are dropped #773

Conversation

vivarose commented Aug 18, 2024

vivarose commented Aug 18, 2024

vivarose commented Aug 18, 2024