Utilité #1

rcharron · 2015-02-17T17:50:29Z

Ce module est-il utile pour le ppp?

Ezibenroc · 2015-02-17T22:03:16Z

What does this module do? According to the name of the repository, I guess that you expect it to say wether a request is a math request or not. Am I right?

rcharron · 2015-02-18T07:10:28Z

Yes you are,
see also the description "A little module to differentiate math from other questions"
I suppose the only interest is for the core module, to avoid useless call to other module.
Anyway,it is only a heuritic

robocop · 2015-02-18T08:52:54Z

Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?

rcharron · 2015-02-18T09:01:30Z

That's not the question. This classifier is already implemented and
trained.

Raphaël Charrondière

ENS de Lyon

Le 2015-02-18 09:52, Quentin C. a écrit :

Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?

Reply to this email directly or view it on GitHub [1].

Links:

[1]
#1 (comment)

robocop · 2015-02-18T09:07:49Z

Okay but maybe your can improve your feature extraction (convert a string to a vector, this is an important part). You should at least I think try to tokenised the questions and use a look up table, and this is done by my module.

rcharron · 2015-02-18T09:56:46Z

My token are characters, we speak of math, not of words, so entities are
characters, and there is no mean have a lookup table

Raphaël Charrondière

ENS de Lyon

Le 2015-02-18 10:07, Quentin C. a écrit :

Yes but your feature extraction (convert a string to a vector) is not serious (and this is the main part) :p. You should at least tokenised the questions and use a look up table.

Reply to this email directly or view it on GitHub [1].

Links:

[1]
#1 (comment)

marc-chevalier · 2015-02-21T18:31:38Z

Let's be clear. The dataset is full of mistakes (cos'(x) is math but sin'(x) is not). Moreover, it consider that a sequence is a math question. But we cannot guess a sequence with first values, so it is a question for OEIS => database => not CAS! That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation.

And last, but not the least, the dataset seems to be built automatically (https://github.com/ProjetPP/MathRecognizer/blob/wtf/networktrainer.py#L45) with some heuristics. In the best case, the ML learn to mimic these heuristics but, surely, it will not work as well. So... use directly these heuristics will be more efficient, it will not ?

rcharron · 2015-02-21T18:59:22Z

with some heuristics=> yes and no. With some conditions you say it is math or not, otherwise you have to answer manually.

That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation=> if it is only correction on dataset, it is not much complicated

The dataset is full of mistakes=> probably, but i'm not perfect and didn't want to take much time on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilité #1

Utilité #1

rcharron commented Feb 17, 2015

Ezibenroc commented Feb 17, 2015

rcharron commented Feb 18, 2015

robocop commented Feb 18, 2015

rcharron commented Feb 18, 2015

robocop commented Feb 18, 2015

rcharron commented Feb 18, 2015

marc-chevalier commented Feb 21, 2015

rcharron commented Feb 21, 2015

Utilité #1

Utilité #1

Comments

rcharron commented Feb 17, 2015

Ezibenroc commented Feb 17, 2015

rcharron commented Feb 18, 2015

robocop commented Feb 18, 2015

rcharron commented Feb 18, 2015

Links:

robocop commented Feb 18, 2015

rcharron commented Feb 18, 2015

Links:

marc-chevalier commented Feb 21, 2015

rcharron commented Feb 21, 2015