Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch to loomcat #247

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

Schaechtle
Copy link
Collaborator

This patch is needed since the column names I am getting from loom while calling _update_state
(and _retrieve_featureid_to_cgpm, specifically) don't follow the assumptions made in the code. In fact, the column names I am getting from loom are the actual natural language column names which seems correct (thus patching cgpm).

I have a test case for this patch which depends on loom and bql -- I initially planned to include the
test case here (i.e the cgpm repo). Looking at the test suite however, I am not sure anymore if this makes sense (not much loom and BQL around in cgpm/test); I am now planning to include this test in another repo focussed only on exporting models from loom. Open for suggestions here.

Tested using the probcomp dev image as folllows:

 2018-08-13 14:47:24 ⌚  ulli-notebook in ~/cgpm
○ → pytest tests/

Result:

454 passed, 8 skipped, 2 xfailed in 363.28 seconds

@Schaechtle Schaechtle requested a review from fsaad August 13, 2018 14:57
@fsaad
Copy link
Collaborator

fsaad commented Aug 13, 2018

The loomcat conversion is assuming that the loom project on disk was created using loomcat.initialize(state), in which case the loom column names are of the form c%05d. This encoding scheme is needed since cgpm does not have a notion of natural language column names, only column numbers.

When the loom project on disk was created using BayesDB's loom_backend class instead of cgpm's loomcat.initialize method, the loom column names are the natural language column names since these are known to BayesDB.

The assumptions in the code should be further loosened by:

  • Making colname_colno_mapping a required argument of loomcat._retrieve_featureid_to_cgpm and loomcat._update_state.
  • Updating loomcat.transition and loomcat.transition_engine to use the output mapping given by _generate_column_names and pass this mapping into _update_state.

Copy link
Collaborator

@fsaad fsaad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #247 (comment) for requested change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants