Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove text analysis pilot #337

Merged
merged 2 commits into from
Jul 14, 2021
Merged

Remove text analysis pilot #337

merged 2 commits into from
Jul 14, 2021

Conversation

noamross
Copy link
Contributor

@ropensci/editors

For some time we had a pilot text-analysis category in our Aims and Scope. This originated when we had a text specialist, Lincoln Mullen, on our editorial board, and some collaborations with a text analysis working group. There have been very few submissions in this category and most of them have had challenges getting through review.

Given the lack of uptake, that these packages would fall under the scope of our statistical packages peer review, and we never really established a firm set of criteria for such packages, I think we should remove this from our Aims and Scope.

@maelle
Copy link
Member

maelle commented Jun 28, 2021

Please also update the changelog if this is approved by others. 🙏 Thank you!

@mpadge
Copy link
Member

mpadge commented Jun 28, 2021

If i may voice an opinion here: Even though this category has seen very little "action", i do think it is very important as it represents one of the only entry points for the entire domain of "digital humanities", itself representing a large portion of most universities around the world. Interest in, and funding for, this area is increasing enormously, and text analysis software is a very rapidly developing area. It may be that rOpenSci currently has little direct expertise in the area, but outright removal of this category would effectively exclude a very large portion of most academic communities from even considering rOpenSci as potentially relevant. Conversely, if expertise is slowly cultivated in this area, even merely passively allowing it to continue to exist may open up the organisation to a whole new field.

Disclaimer: I am also biased, because i hope one day to submit one of my own packages that only falls within this category and no other, and which is also definitely not statistical.

@karthik
Copy link
Member

karthik commented Jun 28, 2021

I second Mark's take on this.

@emilyriederer
Copy link

most of them have had challenges getting through review

What is the main cause of that? Do we find the packages rarely generalize well beyond a very specific problem/corpus (which might indicate its not a good category) or a lack of editors/reviews with expertise in this space (which perhaps could be addressed with recruiting)?

@noamross
Copy link
Contributor Author

noamross commented Jul 2, 2021

Good question @emilyriederer. Looking back, basically all the packages were those submitted by Lincoln Mullen before we consolidated what our current scope, and then there was the wordVectors package which the author never followed up on review feedback.

We've gotten two submissions in the past week in this category, both of which I have trouble admitting. Both implement machine-learning algorithms in NLP:

I find it challenging to think how we would accept these in our usual system and think it makes sense to refer them to statistical peer review. That said, one option is to clarify the scope to specify that under the non-statistical scope, text packages should still be data process/data lifecycle management packages, rather than ML. Some potential language:

  • text data: We include packages that process and manage text and language data. This is limited to text processing/munging/management (e.g., tokenization, stemming, conversion to and between structured text formats, text metadata and repository access, etc. ). Machine-learning and packages implementing NLP analysis algorithms should be submitted under statistical software peer review. The scope for this topic is not fully defined, please open a pre-submission inquiry if you are considering submitting a package that falls under this topic. (Example: tokenizers)

@vgherard
Copy link

vgherard commented Jul 2, 2021

Good question @emilyriederer. Looking back, basically all the packages were those submitted by Lincoln Mullen before we consolidated what our current scope, and then there was the wordVectors package which the author never followed up on review feedback.

We've gotten two submissions in the past week in this category, both of which I have trouble admitting. Both implement machine-learning algorithms in NLP:

I find it challenging to think how we would accept these in our usual system and think it makes sense to refer them to statistical peer review. That said, one option is to clarify the scope to specify that under the non-statistical scope, text packages should still be data process/data lifecycle management packages, rather than ML. Some potential language:

  • text data: We include packages that process and manage text and language data. This is limited to text processing/munging/management (e.g., tokenization, stemming, conversion to and between structured text formats, text metadata and repository access, etc. ). Machine-learning and packages implementing NLP analysis algorithms should be submitted under statistical software peer review. The scope for this topic is not fully defined, please open a pre-submission inquiry if you are considering submitting a package that falls under this topic. (Example: tokenizers)

@noamross thanks for pinging me on this. I was actually led to tentatively try for a presubmission, based on what I read in the rOpenSci guidelines. If you believe my package does not fit the scope of reviewed material, I will try to resubmit as soon as the Statistical Software section is consolidated.

Bests,
V.

@noamross
Copy link
Contributor Author

noamross commented Jul 2, 2021

Thanks for your response @vgherard. We're just doing a little re-arranging for this scope and getting our statistical package submission templates in order. I'll follow up when we've resolved these, I expect next week.

@noamross
Copy link
Contributor Author

noamross commented Jul 7, 2021

I've added back in this text, can I get a thumbs up/down from @ropensci/editors?

  • text data: We include packages that process and manage text and language data. This is limited to text processing/munging/management (e.g., tokenization, stemming, conversion to and between structured text formats, text metadata and repository access, etc. ). Machine-learning and packages implementing NLP analysis algorithms should be submitted under statistical software peer review. The scope for this topic is not fully defined, please open a pre-submission inquiry if you are considering submitting a package that falls under this topic. (Example: tokenizers)

If accepted I will also update the categories in the software-review templates.

@noamross noamross merged commit a968c6b into dev Jul 14, 2021
@noamross
Copy link
Contributor Author

@vgherard We've updated this, as well as our submission templates. Could you re-submit your pre-submission inquiry under the new template that now includes the statistical software project?: https://github.com/ropensci/software-review/issues/new?assignees=&labels=&template=B-submit-a-presubmission-inquiry.md

@vgherard
Copy link

@vgherard We've updated this, as well as our submission templates. Could you re-submit your pre-submission inquiry under the new template that now includes the statistical software project?: https://github.com/ropensci/software-review/issues/new?assignees=&labels=&template=B-submit-a-presubmission-inquiry.md

Thanks for informing me @noamross, will do ASAP.

Valerio

@mpadge mpadge deleted the remove-text-analysis-policy branch June 7, 2022 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants