Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilité #1

Open
rcharron opened this issue Feb 17, 2015 · 8 comments
Open

Utilité #1

rcharron opened this issue Feb 17, 2015 · 8 comments

Comments

@rcharron
Copy link
Contributor

Ce module est-il utile pour le ppp?

@Ezibenroc
Copy link
Member

What does this module do? According to the name of the repository, I guess that you expect it to say wether a request is a math request or not. Am I right?

@rcharron
Copy link
Contributor Author

Yes you are,
see also the description "A little module to differentiate math from other questions"
I suppose the only interest is for the core module, to avoid useless call to other module.
Anyway,it is only a heuritic

@robocop
Copy link

robocop commented Feb 18, 2015

Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?

@rcharron
Copy link
Contributor Author

That's not the question. This classifier is already implemented and
trained.


Raphaël Charrondière

ENS de Lyon

Le 2015-02-18 09:52, Quentin C. a écrit :

Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?

Reply to this email directly or view it on GitHub [1].

Links:

[1]
#1 (comment)

@robocop
Copy link

robocop commented Feb 18, 2015

Okay but maybe your can improve your feature extraction (convert a string to a vector, this is an important part). You should at least I think try to tokenised the questions and use a look up table, and this is done by my module.

@rcharron
Copy link
Contributor Author

My token are characters, we speak of math, not of words, so entities are
characters, and there is no mean have a lookup table


Raphaël Charrondière

ENS de Lyon

Le 2015-02-18 10:07, Quentin C. a écrit :

Yes but your feature extraction (convert a string to a vector) is not serious (and this is the main part) :p. You should at least tokenised the questions and use a look up table.

Reply to this email directly or view it on GitHub [1].

Links:

[1]
#1 (comment)

@marc-chevalier
Copy link
Member

Let's be clear. The dataset is full of mistakes (cos'(x) is math but sin'(x) is not). Moreover, it consider that a sequence is a math question. But we cannot guess a sequence with first values, so it is a question for OEIS => database => not CAS! That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation.

And last, but not the least, the dataset seems to be built automatically (https://github.com/ProjetPP/MathRecognizer/blob/wtf/networktrainer.py#L45) with some heuristics. In the best case, the ML learn to mimic these heuristics but, surely, it will not work as well. So... use directly these heuristics will be more efficient, it will not ?

@rcharron
Copy link
Contributor Author

with some heuristics=> yes and no. With some conditions you say it is math or not, otherwise you have to answer manually.

That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation=> if it is only correction on dataset, it is not much complicated

The dataset is full of mistakes=> probably, but i'm not perfect and didn't want to take much time on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants