Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 #1587

durgeshiitj · 2020-05-06T07:37:13Z

Issue description

I ran snorkel(v 0.9.5) on a dataset using PandasParrallelLFApplier and to my surprise I got 10% accuracy which I was expecting to be 90%. Then tried to use PandasLFApplier just to cross verify and I got 90% accuracy. When I compared the LabelMatrixs, both were not eqauls.

Before I was using 0.9.3 never faced problem. Just to cross verify I ran the same dataset on a different sytem having version 0.9.3 with both PandasParallelLFApplier and PandasLFApplier and found that in 0.9.3, both are yielding same Label-Matrix and same accuracy with same LFAnalysis.

Expected behavior

Both LFAppliers should yield similar results.

Screenshots

I'm attaching screenshots for your reference.

V 0.9.5 Analysis:

PandasLFApplier:

PandasParallelLFApplier:

Label-Matrix Comparison:

V 0.9.3 Analysis:

PandasLFApplier:

PandasParallelLFApplier:

Label-Matrix Comparison:

System info

How you installed Snorkel (conda, pip, source): PIP
OS: Windows/Linux
Python version: 3.7
Snorkel version: 0.9.3(Windows)/ 0.9.5(Linux)

Additional context

Please look into this asap.

henryre · 2020-05-17T18:41:27Z

Hi @durgeshiitj, apologies for the delayed response here! This is likely due to using an unsorted index with PandasParallelLFApplier. I've opened up #1589 but in the meantime, you can just use the standard PandasLFApplier or sort your index before using PandasParallelLFApplier so that the order of the rows of L is expected.

durgeshiitj · 2020-05-17T19:10:43Z

Hi @durgeshiitj, apologies for the delayed response here! This is likely due to using an unsorted index with PandasParallelLFApplier. I've opened up #1589 but in the meantime, you can just use the standard PandasLFApplier or sort your index before using PandasParallelLFApplier so that the order of the rows of L is expected.

Hi Henry,
Thanks for following up.
However, I tried debugging at my end as well. I found out that the system where Snorkel 0.9.5 is installed, the Dask version was 2.14.2 and where 0.9.3 was installed the Dask version was 2.5.2.
So I tried downgrading Dask to 2.5.2 to run with Snorkel 0.9.5 and to my surprise there the PandasParallelLFApplier worked normally. So I please check that as well, because in requirement Dask version mentioned is <3 so 2.14 should not have caused any issue as well.

henryre · 2020-05-18T16:46:18Z

Hi @durgeshiitj, thanks for reporting and we'll look into version compatibility on our side!

durgeshiitj · 2020-07-10T16:51:12Z

Hi @durgeshiitj, thanks for reporting and we'll look into version compatibility on our side!

I didn't get any update on the issue

github-actions · 2020-10-09T12:37:51Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

durgeshiitj changed the title ~~Different results and accuracy down to 80% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5~~ Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 May 6, 2020

bhancock8 assigned henryre May 15, 2020

henryre mentioned this issue May 17, 2020

Restore original index with PandasParallelLFApplier #1589

Open

github-actions bot added the no-issue-activity label Oct 9, 2020

github-actions bot closed this as completed Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 #1587

Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 #1587

durgeshiitj commented May 6, 2020

henryre commented May 17, 2020

durgeshiitj commented May 17, 2020 •

edited

Loading

henryre commented May 18, 2020

durgeshiitj commented Jul 10, 2020

github-actions bot commented Oct 9, 2020

Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 #1587

Different results and accuracy down to 10% with PandasParallelLFApplier vs PandasLFApplier in Snorkel 0.9.5 #1587

Comments

durgeshiitj commented May 6, 2020

Issue description

Expected behavior

Screenshots

V 0.9.5 Analysis:

V 0.9.3 Analysis:

System info

Additional context

henryre commented May 17, 2020

durgeshiitj commented May 17, 2020 • edited Loading

henryre commented May 18, 2020

durgeshiitj commented Jul 10, 2020

github-actions bot commented Oct 9, 2020

durgeshiitj commented May 17, 2020 •

edited

Loading