New version of NLP #21

muellerzr · 2020-06-20T13:54:42Z

No description provided.

marcglobality · 2021-03-23T14:36:28Z

Hi @muellerzr , thanks for this repo. What happened with this?

muellerzr · 2021-03-23T14:45:18Z

Time and motivation :) Wound up moving to other things so NLP never came

marcglobality · 2021-03-23T14:50:20Z

I understand by your actions that it will not come with this PR then? :D

muellerzr · 2021-03-23T14:52:24Z

Correct. I may reopen and update these notebooks, as they're good NLP foundational tutorial notebooks for fastai, but nothing beyond that is planned at this moment

marcglobality · 2021-03-23T15:01:49Z

Fair enough. I was wondering how to predict on the test set. I know it's a bit abusing of you (and you don't need to answer fully, but maybe point me to the right place?)

What I manage to do (I simplified this for readability):

text_block = TextBlock.from_df(
        text_cols=INPUT_COLUMNS, 
        is_lm=False,
        seq_len=1_000,

        # add xxfld between fields
        mark_fields=True,

        # name for the output column
        tok_text_col='ulmfit_text',
)

data_cls = DataBlock(
        blocks=(text_block, CategoryBlock),
        get_x=ColReader("ulmfit_text"),
        get_y=ColReader("label"),  
    )

data_cls = data_cls.dataloaders(
        pd.concat([
            df_train.assign(is_dev=False), 
            df_dev.assign(is_dev=True),
        ]),
        shuffle_train=True,
        bs=32, 
        verbose=False,
        splitter=ColSplitter("is_dev"),
    )

learn = text_classifier_learner(
    data_cls, 
    AWD_LSTM, 
    drop_mult=0.5, 
    metrics=[Precision(), Recall()]
)
learn.load_encoder("1epoch_encoder")
learn.fit_one_cycle(1, 2e-2)

and it runs correctly in (I simplified this for readability)

text = "xxbos xxfld 1 ......... solicitors xxfld 2 xxunk xxunk xxunk xxfld 3 xxmaj gao xxmaj jia..."
learn.predict(text)

My problem now is, given a test_df, how can I predict on it (with the pipeline composing the fields)? Something like:

test_df = pd.concat( 
    (
        df_dev[df_dev.label.eq(1)].sample(10, random_state=88),
        df_dev[df_dev.label.eq(0)].sample(10, random_state=88),
    )
)
dl = learn.dls.test_dl(test_df)
_, __, preds = learn.get_preds(dl=dl)

muellerzr · 2021-03-23T15:23:46Z

IIRC you need to do proc_df or something along those lines. Have you searched in the forums? I know this was brought up in there (I think the v2 text thread)

marctorsoc · 2021-03-26T21:00:40Z

IIRC you need to do proc_df or something along those lines. Have you searched in the forums? I know this was brought up in there (I think the v2 text thread)

So I searched in the forums, and could arrive into a better version where at least it predicts something. But still seems like a hack, and I don't get the same metrics for dev when fitting than when using the dev set as a test set. Could you answer to me in there https://forums.fast.ai/t/predictions-for-the-test-set-are-they-correct/86994? Thanks

muellerzr added 7 commits June 20, 2020 09:53

Delete 01_Intro.ipynb

2307c2a

Add files via upload

16459d3

Delete 01_Intro.ipynb

b701fcf

Add files via upload

e0c287c

Add forwards + backwards

d81ead6

Verbosify

78f6ac1

Here too

7d8ef4a

muellerzr force-pushed the master branch from 54779fd to 21da993 Compare August 12, 2020 17:15

muellerzr closed this Mar 23, 2021

muellerzr reopened this Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New version of NLP #21

New version of NLP #21

muellerzr commented Jun 20, 2020

marcglobality commented Mar 23, 2021

muellerzr commented Mar 23, 2021

marcglobality commented Mar 23, 2021

muellerzr commented Mar 23, 2021

marcglobality commented Mar 23, 2021

muellerzr commented Mar 23, 2021

marctorsoc commented Mar 26, 2021

New version of NLP #21

Are you sure you want to change the base?

New version of NLP #21

Conversation

muellerzr commented Jun 20, 2020

marcglobality commented Mar 23, 2021

muellerzr commented Mar 23, 2021

marcglobality commented Mar 23, 2021

muellerzr commented Mar 23, 2021

marcglobality commented Mar 23, 2021

muellerzr commented Mar 23, 2021

marctorsoc commented Mar 26, 2021