Conversion of records to checkpoints #15

jrasor · 2019-07-05T17:29:59Z

The training tutorial https://github.com/google/ftc-object-detection/tree/master/training says. "You can now take the .record files you generated and use them in the same training pipeline you were using earlier in the tutorials. As before, you'll almost certainly want to fine tune an existing model..." I'm not quite sure what are those earlier tutorials. The only tutorial mentioned on the training tutorial is a Medium one for training on the cloud.

I have a good video of poker chips and thumb drives, good records, and a pre-trained model -- the one you supply for Gold and Silver Minerals. My aim is to convert those poker chip and thumb drive records into checkpoints using your Gold and Silver Mineral model, so my phone can recognize poker chips and thumb drives. How do I do this?

I can make a good model using Tensorflow for Poets, but that model does not work with ftc_app version 4.3 ConceptTensorFlowObjectDetection.

ftctechnh · 2019-07-05T18:28:42Z

@jrasor
Hi jrasor,

The following tutorial/blog post has some helpful information which describe how to make a custom inference model (and then how to convert it to .tflite format) that you can use to create your own app to detect things like poker chips and thumb drives:

https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193

That example is customized for running the training model on a Google Cloud server (with optimized hardware to do the Tensorflow-related calculations) however, you can use their example and run the training locally on a laptop.

Also, you can use one of the existing pretrained models (the example uses an SSD Mobilenet pretrained model) as the basis for your own custom model.

Once you have your inference graph exported to .tflite format, you should be able to use the tflite model in an Android app to detect your custom objects. The ftc-object-detection app has a nice example app that you can use as the basis for your own app (you can switch the tflite graph and index/label files with your own files):

https://github.com/google/ftc-object-detection/tree/master/TFObjectDetector

I hope this helps.

jrasor · 2019-07-15T16:59:31Z

Thanks for your reply. I rejected that at first because I didn't want to make a Google Cloud account, but now that you mention the possibility of running it locally, I'll take another look. Has *anybody *actually trained a model using https://github.com/google/ftc-object-detection/tree/master/training for objects other than the Rover Ruckus Gold and Silver Minerals? Forum thread https://ftcforum.usfirst.org/search?q=tensorflow&searchJSON=%7B%22keywords%22%3A%22tensorflow%22%7D is mighty quiet. John Rasor

…

On Fri, Jul 5, 2019 at 11:28 AM FTC Engineering ***@***.***> wrote: @jrasor <https://github.com/jrasor> Hi jrasor, The following tutorial/blog post has some helpful information which describe how to make a custom inference model (and then how to convert it to .tflite format) that you can use to create your own app to detect things like poker chips and thumb drives: https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193 That example is customized for running the training model on a Google Cloud server (with optimized hardware to do the Tensorflow-related calculations) however, you can use their example and run the training locally on a laptop. Also, you can use one of the existing pretrained models (the example uses an SSD Mobilenet pretrained model) as the basis for your own custom model. Once you have your inference graph exported to .tflite format, you should be able to use the tflite model in an Android app to detect your custom objects. The ftc-object-detection app has a nice example app that you can use as the basis for your own app (you can switch the tflite graph and index/label files with your own files): https://github.com/google/ftc-object-detection/tree/master/TFObjectDetector I hope this helps. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15?email_source=notifications&email_token=AF37XI6XKZALFVYV5DSP4W3P56HFXA5CNFSM4H6M53I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZKDFIA#issuecomment-508834464>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AF37XI2X4FLSYDEKA5QDJADP56HFXANCNFSM4H6M53IQ> .

-- Lead Coach FTC 5197 "the GearHeads". One test is worth a thousand expert opinions. --* The Riehle Axiom*

ftctechnh · 2019-07-17T17:02:18Z

Hi John,

I followed the tutorial and trained a model using one of the SSD Mobilenet pretrained models and it worked well at detecting some inanimate objects we used for our model.

For the training model, I found it easier to use Linux to run the tensorflow object detection scripts. I installed a Linux (ubuntu) subsystem on my windows 10 laptop and used Google's tools to generate training records for the model.

Then I followed the tutorial and modified the commands to run locally on my laptop and generated a training model using the SSD mobilenet pretrained model.

I haven't tried creating a custom model using last season's model as the pretrained model, but I imagine that you should be able to do so if you want to.

jrasor · 2019-07-18T02:55:45Z

Hi, and thanks! I presume from the above that you did not use cloud computing in tutorial https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193, right? I have Linux installed, so I'll give it a try. John Rasor

…

On Wed, Jul 17, 2019 at 10:02 AM FTC Engineering ***@***.***> wrote: Hi John, I followed the tutorial and trained a model using one of the SSD Mobilenet pretrained models and it worked well at detecting some inanimate objects we used for our model. For the training model, I found it easier to use Linux to run the tensorflow object detection scripts. I installed a Linux (ubuntu) subsystem on my windows 10 laptop and used Google's tools to generate training records for the model. Then I followed the tutorial and modified the commands to run locally on my laptop and generated a training model using the SSD mobilenet pretrained model. I haven't tried creating a custom model using last season's model as the pretrained model, but I imagine that you should be able to do so if you want to. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15?email_source=notifications&email_token=AF37XIZOF3A3Q3RMWU7AEV3P75GBXA5CNFSM4H6M53I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2FE2QA#issuecomment-512380224>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AF37XI7KFQYY2WGR2HKTYUDP75GBXANCNFSM4H6M53IQ> .

-- Lead Coach FTC 5197 "the GearHeads". One test is worth a thousand expert opinions. --* The Riehle Axiom*

ftctechnh · 2019-07-30T15:04:15Z

Hi John,

Yes, I used the Cloud Tutorial as a guide, but I simply installed Tensorflow and the Tensorflow Object-detection api on a linux machine. Actually, I installed Ubuntu 18.04 LTS and ran it as a Windows Linux Subsystem. It works well, but running the training models takes a long time.

If you start with a pretrained model and your are trying to create a model that recognizes some relatively common objects, it seems like you can train it pretty reasonably by running the model on a workstation for several hours.

However, as we are preparing for the upcoming Skystone season we are finding that building a custom model using a large number of training records takes a LONG time and is better done on a Cloud server that is optimized for the Tensorflow calculations (i.e., a Cloud server that has TPU hardware).

Note that when installing Tensorflow and the Tensorflow Object Detection API, I used the README file on the tensorflows/models/research/object-detection subfolder to guide me on the installation:

https://github.com/tensorflow/models/blob/master/research/object_detection/README.md

Specifically, I followed these instructions to install the object detection API on my Linux machine:

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

jrasor · 2019-08-05T02:32:57Z

Installation done, passes python object_detection/builders/model_builder_test.py with 16 tests OK, 1 skipped. So, we've made some progress.

Now what?

jrasor · 2019-08-05T02:37:55Z

How does https://github.com/google/ftc-object-detection/tree/master/training make use of the records in https://github.com/google/ftc-object-detection/tree/master/training/train_data? The ftc-object-detection/training tutorial makes no mention of its directory train_data after telling us our records are there. Mine are there.

From that training sub-repo, grep -R train_data ./ gives no hits for anything looking like records in that directory. There is only one hit mentioning that directory itself without contents, that last mention in README.md. grep -R train_data ./ does give hits: the records themselves. grep -R record ./ | grep -v experimental gets hits only in README.md, and in convert_labels_to_records.py.

No scripts, python or any other sort, seem to be aware of those records sitting in there.

jrasor · 2019-08-05T02:51:48Z

Following https://github.com/google/ftc-object-detection/tree/master/training, I made a new video of the poker chip only, and processed it according to that tutorial. I then copied the records into the ftc-object-detection/training/models/sample_mobilenet_v1_0.5_ssd_quantized directory with checkpoints already there. Then I ran python3 $MODEL_RESEARCH_DIR/object_detection/export_tflite_ssd_graph.py, got brand new training/models/sa*zed/tflite/tflite_graph.pb and tflite_graph.pbtxt, then invoked bazel to turn those into chipNdrives.tflite. In a Teleop mode, that model detected only Gold Minerals. It seemed to ignore the poker chip derived records.

How can I get that training tutorial to make use of my records?

ftctechnh · 2019-08-05T11:30:29Z

hi jrasor,

Before you ran the export_tflite_ssd_graphy.py, did you first run the model training script (model_main.py) to generate a new inference model? I believe you need to run the script and generate the new custom model first, before you export it to a format that you can then convert it into a .tflite file.

It might be helpful to review this tutorial. You want to copy your .record files to a data directory, and also create a pipeline.config file that the model_pain.py script will use to do the model training.

Note that the tutorial shows you how to run the training on a TPU cluster. However, the procedure is very similar for running it locally, but instead of using the cloud storage and initializing a Cloud (TPU) job, you run the training locally using the model_main.py script.

Also, you might want to adjust the batch size in the example pipeline.config file provided in the tutorial that i referenced (see the section entitled "Training a quantized model with Cloud TPUs on Cloud ML Engine"). The example pipeline has a large batch size value since the job is run on a TPU cluster in the example. If you run the job on your workstation, you might need to decrease the batch size (and increase the number of training steps) so the data set sizes aren't too big (I had some crashes on my laptop when I ran the training model for a few hundred steps and I believe these crashes were due to memory issues... I did not experience these crashes when I ran the model training on a TPU cluster.

Also, once you start your job locally, please google search how to use Tensorboard. You can use the utility to monitor the training. You can even view the training images and monitor the training's progress. This might be useful so you can see (and verify) that the process is using your new (poker chip) training records.

Definitely check out that tutorial and maybe even work through their example. If you can get their example running on your laptop, then you can get your custom training running on your laptop. Also, once you have it set up for your laptop, it's easy to then convert the solution to run on a TPU cluster.

I hope this helps.

Tom

jrasor · 2019-08-08T16:31:10Z

Thanks, Tom, lots to work with here.

It will be a few days before I can fully implement your suggestions; school starting up shortly. When I have something definite, I will report here.

jrasor · 2019-08-26T01:56:28Z

Some progress. Summary of findings since I last commented.
= = = = = =
Show-stopper path element contrib is still in tutorial https://github.com/google/ftc-object-detection/tree/master/training bazel call. See issue #14.
Laptop can train on poker chip in 3 hours with poor accuracy, but better than untrained guess (WAG), using 30 minute tutorial https://medium.com/tensorflow/traini...s-b78971cf1193 as a guide.
30 minute tutorial has show-stopper error in a switch for the detection job. Error fixed, job runs, but fails after 5 minutes with "Please provide a TPU Name to connect to". Evaluation job runs, but fails after 5 minutes with "Expected string but found: 'input_path".

jrasor · 2019-08-26T16:28:55Z

Forgot to mention: Tom suggested invoke model_main.py. That was the missing step that enabled this partial success. It is not mentioned anywhere in this tutorial. It is the local analog of ml_engine in the 30 minute tutorial https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193. That tutorial has it as a switch in the evaluation job.

powersurge-luke · 2019-10-08T01:26:44Z

Hi, @jrasor, I seem to have the same problem as you, have you had any success yet? I made the records, but can't find out what tutorial it is specifying. I am using Linux and not Google Cloud.

jrasor · 2019-10-08T16:18:20Z

Dear powersurge-luke, Not much success as yet. From my logs: = = = = = = = Tom Helped See “Tom Eng Help” in this folder. Summary of his suggestions: follow the [30 minute TPU] tutorial https://medium.com/tensorflow/traini...s-b78971cf1193 <https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193> and modify the commands to run locally use tensorboard adjust the confidence threshold to maybe 0.4 No help. invoke model_main.py We have that in /home/jrasor/Android/projects/models/research/object_detection/ specify the location of the training records in the pipeline.config file I did follow the [30 minute TPU] tutorial and modify the commands to run locally. Did that, and got the training started. My hardware seemed to slow to finish this, but in view of the poor quality of the models I’ve done locally, this may be worth another look. confidence threshold to 0.4, 0.39, 0.20. No help. Use tensorboard. It runs, but doesn’t show me anything useful. Needs more study. invoke model_main.py. *That’s a breakthrough*. It generates checkpoints whose losses converge. But that script is mentioned neither the README.md of https://github.com/google/ftc-object-detection/tree/master/training nor in the [30 minute TPU] tutorial https://medium.com/tensorflow/traini...s-b78971cf1193 <https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193>. Its cloud version is mentioned once. I think the proper place to introduce it, and explain its purpose, is the ftc-object-detection training tutorial, right where it says, “You can now take the .record files you generated and use them in the same training pipeline you were using earlier in the tutorials.” specify the location of the training records in the pipeline.config file. That seems to work, but my models so far are too poor to be sure. Another try Following tutorial https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193 (“30 minute tutorial”). Things seemed to go well until Eval job: Expected string but found: 'input_path'. Log from https://console.cloud.google.com/logs/viewer?project=913555336449&resource=ml_job%2Fjob_id%2Fjrasor_object_detection_eval_validation_1566444056&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22jrasor_object_detection_eval_validation_1566444056%22 is empty. Detection job: Please provide a TPU Name to connect to. Log at https://console.cloud.google.com/logs/viewer?project=913555336449&resource=ml_job%2Fjob_id%2Fjrasor_object_detection_1566443884&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22jrasor_object_detection_1566443884%22 shows the TPU name gripe. Traceback at 20:21, failure notice issued at 20:24. What happened in those 3 minutes? Do I need a ---tpu_node option? = = = = = = =

…

On Mon, Oct 7, 2019 at 6:26 PM powersurge-luke ***@***.***> wrote: Hi, I seem to have the same problem as you, have you had any success yet? I made the records, but can't find out what tutorial it is specifying. I am using Linux and not Google Cloud. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15?email_source=notifications&email_token=AF37XI4BKMFL2S3PMUT5KADQNPOVNA5CNFSM4H6M53I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASKMUA#issuecomment-539272784>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AF37XI4RRY2L2T3ZO4LRMGLQNPOVNANCNFSM4H6M53IQ> .

-- Lead Coach FTC 5197 "the GearHeads". One test is worth a thousand expert opinions. --* The Riehle Axiom*

powersurge-luke · 2019-10-08T21:19:04Z

@jrasor Thanks for your reply. How do I change the commands to run locally. Do they need to run using a command line, or what? Previously I have been using other training methods using the cmd line, but they didn't work in the ftc app, but that is the only experience that I have.

powersurge-luke · 2019-10-16T22:56:10Z

@jrasor, @ftctechnh , its been over a week with no response. Can someone please tell me how to do this?

jrasor · 2019-10-16T22:58:52Z

I got your question, powersurge-luke, and I'm sorry for getting back to you so late. Busy time between school quarters. I have the local commands in my log; it will take a little time to pull the signal out of the noise. JR

…

On Wed, Oct 16, 2019 at 3:56 PM powersurge-luke ***@***.***> wrote: @jrasor <https://github.com/jrasor>, @ftctechnh <https://github.com/ftctechnh> , its been over a week with no response. Can someone please tell me how to do this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15?email_source=notifications&email_token=AF37XI4NWWCBDGA4HMEOFC3QO6LYXA5CNFSM4H6M53I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOF52Y#issuecomment-542924523>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF37XI64IDO5XSCV6XAKHLDQO6LYXANCNFSM4H6M53IQ> .

-- Lead Coach FTC 5197 "the GearHeads". One test is worth a thousand expert opinions. --* The Riehle Axiom*

powersurge-luke · 2019-10-18T02:58:11Z

Although I would really like to get this working, someone said that the models they give work better if you lower the confidence level. If this works I won't need to train my own models. We meet on fridays, so I can tell you what I find out then. Thanks for the response.

…

On Wednesday, October 16, 2019, John Rasor ***@***.***> wrote: I got your question, powersurge-luke, and I'm sorry for getting back to you so late. Busy time between school quarters. I have the local commands in my log; it will take a little time to pull the signal out of the noise. JR On Wed, Oct 16, 2019 at 3:56 PM powersurge-luke ***@***.***> wrote: > @jrasor <https://github.com/jrasor>, @ftctechnh > <https://github.com/ftctechnh> , its been over a week with no response. > Can someone please tell me how to do this? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/google/ftc-object-detection/issues/ 15?email_source=notifications&email_token=AF37XI4NWWCBDGA4HMEOFC3QO6LYXA 5CNFSM4H6M53I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5 WW2ZLOORPWSZGOEBOF52Y#issuecomment-542924523>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ AF37XI64IDO5XSCV6XAKHLDQO6LYXANCNFSM4H6M53IQ> > . > -- Lead Coach FTC 5197 "the GearHeads". One test is worth a thousand expert opinions. --* The Riehle Axiom* — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15?email_source=notifications&email_token=AIBTROJSZWCAXTQR3BY2YYDQO6MC3A5CNFSM4H6M53I2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOGC7Y#issuecomment-542925183>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIBTROIA2FQVSKRET7GFCITQO6MC3ANCNFSM4H6M53IQ> .

jrasor · 2019-10-19T17:50:34Z

I was able to train a model to distinguish poker chips from usb memory sticks with a little better than wild guess accuracy. I did it on a laptop. I'm attaching some log files. Some takeaways:

Tom Eng really worked hard to help me on this. See the first attachment.
The training tutorial in this repo has problems. See the second attachment.
I really worked hard to get Tensorflow to train on arbitrary objects. See the third attachment.

Tom Eng Help.docx
Training Tutorial Problems.docx
Training Tensorflow log.docx

jrasor · 2019-10-19T17:54:18Z

For powersurge-luke, the conversion of Google Cloud commands to local ones is in the log. Search for "gotta translate". I'm sure Tom or somebody can do better than I did.

Adjusting the confidence level was no help.

powersurge-luke · 2019-10-20T01:15:22Z

Thanks for the help, I'll take a look at the logs and see what works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion of records to checkpoints #15

Conversion of records to checkpoints #15

jrasor commented Jul 5, 2019

ftctechnh commented Jul 5, 2019

jrasor commented Jul 15, 2019 via email

ftctechnh commented Jul 17, 2019

jrasor commented Jul 18, 2019 via email

ftctechnh commented Jul 30, 2019 •

edited

Loading

jrasor commented Aug 5, 2019

jrasor commented Aug 5, 2019

jrasor commented Aug 5, 2019

ftctechnh commented Aug 5, 2019

jrasor commented Aug 8, 2019

jrasor commented Aug 26, 2019 •

edited

Loading

jrasor commented Aug 26, 2019 •

edited

Loading

powersurge-luke commented Oct 8, 2019 •

edited

Loading

jrasor commented Oct 8, 2019 via email

powersurge-luke commented Oct 8, 2019 •

edited

Loading

powersurge-luke commented Oct 16, 2019

jrasor commented Oct 16, 2019 via email

powersurge-luke commented Oct 18, 2019 via email

jrasor commented Oct 19, 2019

jrasor commented Oct 19, 2019

powersurge-luke commented Oct 20, 2019

Conversion of records to checkpoints #15

Conversion of records to checkpoints #15

Comments

jrasor commented Jul 5, 2019

ftctechnh commented Jul 5, 2019

jrasor commented Jul 15, 2019 via email

ftctechnh commented Jul 17, 2019

jrasor commented Jul 18, 2019 via email

ftctechnh commented Jul 30, 2019 • edited Loading

jrasor commented Aug 5, 2019

jrasor commented Aug 5, 2019

jrasor commented Aug 5, 2019

ftctechnh commented Aug 5, 2019

jrasor commented Aug 8, 2019

jrasor commented Aug 26, 2019 • edited Loading

jrasor commented Aug 26, 2019 • edited Loading

powersurge-luke commented Oct 8, 2019 • edited Loading

jrasor commented Oct 8, 2019 via email

powersurge-luke commented Oct 8, 2019 • edited Loading

powersurge-luke commented Oct 16, 2019

jrasor commented Oct 16, 2019 via email

powersurge-luke commented Oct 18, 2019 via email

jrasor commented Oct 19, 2019

jrasor commented Oct 19, 2019

powersurge-luke commented Oct 20, 2019

ftctechnh commented Jul 30, 2019 •

edited

Loading

jrasor commented Aug 26, 2019 •

edited

Loading

jrasor commented Aug 26, 2019 •

edited

Loading

powersurge-luke commented Oct 8, 2019 •

edited

Loading

powersurge-luke commented Oct 8, 2019 •

edited

Loading