Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

divided up the text into summary, and contnt for NLP processing #249

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ertosns
Copy link

@ertosns ertosns commented Mar 16, 2021

i divide up the text into two parts summary, and content, this can help in NLP processing specifically for summarizing transformers, for example the output of wikiextractor can be used to train wiki-summary https://github.com/ertosns/wiki-summary

@attardi
Copy link
Owner

attardi commented Mar 17, 2021

Isn't

summary = [i for i in page]

the same as:

summary = page

@ertosns
Copy link
Author

ertosns commented Mar 17, 2021

yes it is of course, but in case of numpy it will be a reference to it, i see in this case page is just a python default list, so you are absolutely right, perhaps i thought it was numpy! i will fix it now.

@ertosns
Copy link
Author

ertosns commented Mar 17, 2021

also the output need to be pruned a bit, for example to add the option to fit certain criteria, for example some output is too long, or too short. i will work on that soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants