Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Neural Network suggestions #14

Open
TimoFriedri opened this issue Oct 14, 2021 · 8 comments
Open

Some Neural Network suggestions #14

TimoFriedri opened this issue Oct 14, 2021 · 8 comments

Comments

@TimoFriedri
Copy link

Hi,

awesome project. Unfortunately the esp32-cam is such a unstable device.

I have some suggestions regarding your AI pipeline.

  1. Rely on monochrome images only. There is really no need for 3 channels here. Rather spent the extra memory from having 1 channel in a more complex neural network.

  2. Use non-linearities like relu after each CNN layer. You have 2 linear CNN layers which can only learn linear relations.

  3. It is intriguing to use your whole dataset as training, but this is really not a good idea. You always have to remember, a good value during training tells you nothing. There are even papers, suggesting to have way larger test sets than training sets for e.g. error calculation incl. standard deviation. If I remember correctly, you have ~1500 images. Which is quite ok, because augmentation can do a lot here for these simple images.
    I suggest to use 1000 for training an 500 for test. If you compare multiple NN models, you even need a third validation set.

  4. Stop your training when the test error starts to increase again. Or validate intermediate model state afterwards against the complete test set and choose then.

  5. last layer before dense is (4,2,32) which might be a little tight . Maybe use one less maxpool.

  6. Try elu as non-linearity. More complex to calc than relu on the esp32 but might be possible if only monochrome data is handled. Elu does not suffer from vanishing gradient as much as relu does. Might not be important for the small network, but I usually had better performance with elu.

  7. You might google for best MNIST models to get inspiration for your network, as it is very similar, just letters. They also only use monochrome images. This might also to streamline your code and increase SW quality.

  8. Try ADAM optimizer. My preference, usually quite robust and fast.

  9. You could think about using the softmax array as probabilites for your digits. SOmetimes the second best is the correct one. Maybe you can correct a number (because it maybe was lower than the previous) by replacing digits with high second guess values.

Again, awesome project. If I will find at the end of the year, I will try to train my own variant and give you feedback.

Thanks

@TimoFriedri
Copy link
Author

Ok, something more,

Does test_image = Image.open(aktfile) return values between 0 and 1. I am not sure. If not, scale your images data !
Usually just divide by 255.

@jomjol
Copy link
Owner

jomjol commented Oct 14, 2021

Great ideas, thanks a lot! I will keep them in mind and test / implement if I have time or need to change something anyway.

Going to black and white would stop the compatiblities, but this should not be a real topic. At the moment I'm working on hardware improvements (external LEDs to avoid reflections).

The Relu for the inner layer got lost somehow during the restructuring of my python files --> will do this with the next update.

@TimoFriedri
Copy link
Author

TimoFriedri commented Oct 18, 2021

HI,

you don't need to break compatability with your existing pipline.

Just add this layer https://www.tensorflow.org/api_docs/python/tf/image/rgb_to_grayscale
at the beginning of your network and all your data, input and output stays as it is while the network solely concentrates on brightness gradients and ignores color, which is really not necessary for digits classification and might even hinder the learning process.

I speak from own experience.


I had a look at your training data guideline.
I beleive you're doing your models real-world classification accuracy no good.

  1. Definitely add the non unique images to the dataset! Really, do so.
  2. Add more images from your called bad images to the dataset. The example 0 and 6 are clearly recognizable!

The thing is, if you leave out these images, your network will never be robust against real-world perturbations like bad lighting, reflections and so on. You need ALL real-world examples to get a good model. I had a look these days on my esp32-cam-meter and noticed the classification is quite unreliable, even if the images for me look identical from one shot to the other.

  1. I am not sure, so take this with a grain of salt, if the idea of labeling NAN explicitly is a good idea. It lis like label 0-9 with clear definitions of what this is, and EVERYTHING else which ever occurs as NAN. I am not sure if this is well taken by the training process.
    I'd suggest a different method. Only label 0-9 and leave out NAN. Your last layers is a softmax layer, which basically gives you some sort of probabilities for each digit. You could define a threshold for minimum probability to be sure a 6 is a 6. Say something like 80% or whatever, I think, this might work out better

As I said, training accuracy does not matter at all. By adding all these non-perfect images, the accuracy during training might be smaller than you are used to. What counts, is real-world accuracy AND robustness!


Sorry for all the comments and suggestions without any valuable contribution. I hope to be able to actually do a lot of that stuff at he end of this year and open some pull requests, because I think this project is really cool. I'd really like to support this in the future.

@jomjol
Copy link
Owner

jomjol commented Oct 18, 2021

Thanks for your further input and I'll take them into consideration at the next major update. Currently my focus is in some other features. Regarding the idea of the conversion by the tensorflow function: this is not working, as it is not a layer, but a separate function. It is also not supported in tflite, which I need to use for the ESP32

@TimoFriedri TimoFriedri changed the title Some Neural NEtwork suggestions Some Neural Network suggestions Oct 18, 2021
@TimoFriedri
Copy link
Author

TimoFriedri commented Nov 23, 2021

@jomjol
I had some time playing around.

Here is a sneak peak:
https://colab.research.google.com/drive/1fRriOzjPy-6Ektc0nQxaTzaUssmSeJKu?usp=sharing

I changed your dataset:

  • no more NAN
  • also included "weak" images
  • fixed jpg actually being a bmp image issue
  • organized in clean folder structure to easily use the dataset helpers

I trained the tflite img reference classifier just out of curiosity. Model is way to big. But performance was also not so nice.

I trained 2 simple classifiers. A RGB and a Gray version, both accept RGB images as input.
The performance is quite nice.
I also visualized the softmax probabilities especially of the misclassified images.
I think you should leverage this information for a more robust digit processing.


Ich wuerde durchaus noch mehr Zeit da rein stecken, falls du Interesse hast dort zusammen zu arbeiten.

@jomjol
Copy link
Owner

jomjol commented Nov 23, 2021

Hallo - wir können gerne in Deutsch weiter schreiben - geht für mich schneller :-).

Danke für deinen Mühen, da sind ein paar coole Ideen dabei. Deine Idee mit Graustufen statt RGB hatte ich schon ausprobiert. In meiner Netzarchitektur bringt das aber nur ein minimal kleiners Netz. Der Vorteil könnte höchstens im ESP32 liegen, denn die Bilder wären dann nur noch 1/3 so groß.

Spannend finde ich die Idee, NaN durch die SoftMax-Wahrscheinlichkeiten zu ersetzen und damit quasi schlecht erkannte Ziffern raus zu sortieren. Ich bin allerdings gerade an einer anderen Weiterentwicklung für die Ziffern dran, nämlich Erkennung der Nachkommastelle anhand der Position des Ziffer auf dem ROI:

2 4_NaN_Ziffer_NaN_2334 = 2.4; 3 9_Ziffer_4_0001 = 3.9; 8 7_NaN_6_dig6_20210629-112809 = 8.7

Das würde das Problem mit "NaN" auch lösen und du hättest noch eine weitere Nachkommastelle für Genauigkeit oder Plausibilitätschecks vergleichbar zu den analogen Zählern. Das mache ich natürlich nicht mit 100 Klassen (0.0, 0.1, ... 9.9), sondern über 20 Klassen (je 10 für die Vorkomma bzw. Nachkommastelle).

Trainingsdaten etc. sind natürlich aufwendiger, aber fast fertig. Vielleicht können wir das zusammen dran arbeiten?

Leider habe ich aber auch mehr Ideen wie Zeit und gerade schreibe ich eine Artikelserie zu dem bisherigen Ansatz für die ct-Make - erster Teil kommt jetzt Anfang Dezember. Das bindet meine Kapa (ist eh "nur Hobby").

@TimoFriedri
Copy link
Author

Hi,

in Nachkommestellen sehe ich ehrlich gesagt keinen Benefit bei massig Aufwand. Die Meter geben ja praktisch die Genauigkeit vor. Mein Strommesser z.b. 3 Nachkommestellen, die ich eh nicht brauche, vielleicht die erste.

Meine Einschaetzung ist, dass das Projekt mehr Robustheit benötigt. Es ist schlicht nervig fuer den Nutzer, wenn da staendig NAN steht oder dann wurde angeblich doch die Ziffer erkannt, aber die falsche und jetzt ist Zahl hoeher und Plausibilitaetschecks versagen, ...

Es sind einfach massig Stoereinfluesse. Bildqualitaet, Licht, Rauschen, Ausrichtung, ...
Ich glaube, man sollte es moeglichst simpel halten und dafuer die Confidence mitschaetzen. Dann kann man sowas machen wie:

Ok, bin mir nicht 100% sicher, ich mache lieber gleich direkt nochmal ein Bild ...


Grayscale
Ja, das netz spart nur ein paar parameter im ersten layer.
Andererseits krieg ich die gleiche Qualitaet bei weniger Daten im ESP RAM bei single channel.


Hast du irgendeinen Ansatz Daten zu sammeln oder hast du die immer mal nur auf Zuruf?

@jomjol
Copy link
Owner

jomjol commented Nov 23, 2021

Nachkommstellen:

Die Nachkommastellen machen einen massiven Unterschied. Ich habe das bei mir schon am Laufen. Der Unterschied kommt in der höheren Auflösung:
z.B. Stromzähler: 1 Digit für die Nachkommastelle --> Genauigkeit: 0.1 kWh / 5 Minuten = 1200 Watt
d.h. ich messe entweder > 1200W oder 0W und eigentlich würde ich gerne noch etwas weiter runter gehen in der Zeit, um auch kleiner Sprünge zu erkennen.
Wenn ich jetzt noch eine Nachkommastelle bekomme, dann ist die kleinste Auflösung 120 Watt und damit deutlich genauer.
Je länger das Messintervall oder je höher der Verbrauch, desto geringer ist dieser Effekt natürlich. Für meine Stromerfassung war das jedenfalls ein wesentlicher Sprung.


Greyscale:

Meine ersten Versuche mit Greyscale haben eine deutlich schlechtere Performance des Netzes ergeben, so dass ich das erstmal nicht weiter verfolgt hatte.


Bilder sammeln:

Mache ich momentan auf Zuruf. Bin aber gut versorgt :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants