Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New version of NLP #21

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

New version of NLP #21

wants to merge 7 commits into from

Conversation

muellerzr
Copy link
Owner

No description provided.

@marcglobality
Copy link

Hi @muellerzr , thanks for this repo. What happened with this?

@muellerzr
Copy link
Owner Author

Time and motivation :) Wound up moving to other things so NLP never came

@muellerzr muellerzr closed this Mar 23, 2021
@marcglobality
Copy link

I understand by your actions that it will not come with this PR then? :D

@muellerzr
Copy link
Owner Author

Correct. I may reopen and update these notebooks, as they're good NLP foundational tutorial notebooks for fastai, but nothing beyond that is planned at this moment

@muellerzr muellerzr reopened this Mar 23, 2021
@marcglobality
Copy link

Fair enough. I was wondering how to predict on the test set. I know it's a bit abusing of you (and you don't need to answer fully, but maybe point me to the right place?)

What I manage to do (I simplified this for readability):

text_block = TextBlock.from_df(
        text_cols=INPUT_COLUMNS, 
        is_lm=False,
        seq_len=1_000,

        # add xxfld between fields
        mark_fields=True,

        # name for the output column
        tok_text_col='ulmfit_text',
)

data_cls = DataBlock(
        blocks=(text_block, CategoryBlock),
        get_x=ColReader("ulmfit_text"),
        get_y=ColReader("label"),  
    )

data_cls = data_cls.dataloaders(
        pd.concat([
            df_train.assign(is_dev=False), 
            df_dev.assign(is_dev=True),
        ]),
        shuffle_train=True,
        bs=32, 
        verbose=False,
        splitter=ColSplitter("is_dev"),
    )

learn = text_classifier_learner(
    data_cls, 
    AWD_LSTM, 
    drop_mult=0.5, 
    metrics=[Precision(), Recall()]
)
learn.load_encoder("1epoch_encoder")
learn.fit_one_cycle(1, 2e-2)

and it runs correctly in (I simplified this for readability)

text = "xxbos xxfld 1 ......... solicitors xxfld 2 xxunk xxunk xxunk xxfld 3 xxmaj gao xxmaj jia..."
learn.predict(text)

My problem now is, given a test_df, how can I predict on it (with the pipeline composing the fields)? Something like:

test_df = pd.concat( 
    (
        df_dev[df_dev.label.eq(1)].sample(10, random_state=88),
        df_dev[df_dev.label.eq(0)].sample(10, random_state=88),
    )
)
dl = learn.dls.test_dl(test_df)
_, __, preds = learn.get_preds(dl=dl)

@muellerzr
Copy link
Owner Author

IIRC you need to do proc_df or something along those lines. Have you searched in the forums? I know this was brought up in there (I think the v2 text thread)

@marctorsoc
Copy link

IIRC you need to do proc_df or something along those lines. Have you searched in the forums? I know this was brought up in there (I think the v2 text thread)

So I searched in the forums, and could arrive into a better version where at least it predicts something. But still seems like a hack, and I don't get the same metrics for dev when fitting than when using the dev set as a test set. Could you answer to me in there https://forums.fast.ai/t/predictions-for-the-test-set-are-they-correct/86994? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants