Skip to content
This repository has been archived by the owner on Jan 8, 2023. It is now read-only.

Conversion of records to checkpoints #15

Open
jrasor opened this issue Jul 5, 2019 · 21 comments
Open

Conversion of records to checkpoints #15

jrasor opened this issue Jul 5, 2019 · 21 comments

Comments

@jrasor
Copy link

jrasor commented Jul 5, 2019

The training tutorial https://github.com/google/ftc-object-detection/tree/master/training says. "You can now take the .record files you generated and use them in the same training pipeline you were using earlier in the tutorials. As before, you'll almost certainly want to fine tune an existing model..." I'm not quite sure what are those earlier tutorials. The only tutorial mentioned on the training tutorial is a Medium one for training on the cloud.

I have a good video of poker chips and thumb drives, good records, and a pre-trained model -- the one you supply for Gold and Silver Minerals. My aim is to convert those poker chip and thumb drive records into checkpoints using your Gold and Silver Mineral model, so my phone can recognize poker chips and thumb drives. How do I do this?

I can make a good model using Tensorflow for Poets, but that model does not work with ftc_app version 4.3 ConceptTensorFlowObjectDetection.

@ftctechnh
Copy link

@jrasor
Hi jrasor,

The following tutorial/blog post has some helpful information which describe how to make a custom inference model (and then how to convert it to .tflite format) that you can use to create your own app to detect things like poker chips and thumb drives:

https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193

That example is customized for running the training model on a Google Cloud server (with optimized hardware to do the Tensorflow-related calculations) however, you can use their example and run the training locally on a laptop.

Also, you can use one of the existing pretrained models (the example uses an SSD Mobilenet pretrained model) as the basis for your own custom model.

Once you have your inference graph exported to .tflite format, you should be able to use the tflite model in an Android app to detect your custom objects. The ftc-object-detection app has a nice example app that you can use as the basis for your own app (you can switch the tflite graph and index/label files with your own files):

https://github.com/google/ftc-object-detection/tree/master/TFObjectDetector

I hope this helps.

@jrasor
Copy link
Author

jrasor commented Jul 15, 2019 via email

@ftctechnh
Copy link

Hi John,

I followed the tutorial and trained a model using one of the SSD Mobilenet pretrained models and it worked well at detecting some inanimate objects we used for our model.

For the training model, I found it easier to use Linux to run the tensorflow object detection scripts. I installed a Linux (ubuntu) subsystem on my windows 10 laptop and used Google's tools to generate training records for the model.

Then I followed the tutorial and modified the commands to run locally on my laptop and generated a training model using the SSD mobilenet pretrained model.

I haven't tried creating a custom model using last season's model as the pretrained model, but I imagine that you should be able to do so if you want to.

@jrasor
Copy link
Author

jrasor commented Jul 18, 2019 via email

@ftctechnh
Copy link

ftctechnh commented Jul 30, 2019

Hi John,

Yes, I used the Cloud Tutorial as a guide, but I simply installed Tensorflow and the Tensorflow Object-detection api on a linux machine. Actually, I installed Ubuntu 18.04 LTS and ran it as a Windows Linux Subsystem. It works well, but running the training models takes a long time.

If you start with a pretrained model and your are trying to create a model that recognizes some relatively common objects, it seems like you can train it pretty reasonably by running the model on a workstation for several hours.

However, as we are preparing for the upcoming Skystone season we are finding that building a custom model using a large number of training records takes a LONG time and is better done on a Cloud server that is optimized for the Tensorflow calculations (i.e., a Cloud server that has TPU hardware).

Note that when installing Tensorflow and the Tensorflow Object Detection API, I used the README file on the tensorflows/models/research/object-detection subfolder to guide me on the installation:

https://github.com/tensorflow/models/blob/master/research/object_detection/README.md

Specifically, I followed these instructions to install the object detection API on my Linux machine:

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

@jrasor
Copy link
Author

jrasor commented Aug 5, 2019

Installation done, passes python object_detection/builders/model_builder_test.py with 16 tests OK, 1 skipped. So, we've made some progress.

Now what?

@jrasor
Copy link
Author

jrasor commented Aug 5, 2019

How does https://github.com/google/ftc-object-detection/tree/master/training make use of the records in https://github.com/google/ftc-object-detection/tree/master/training/train_data? The ftc-object-detection/training tutorial makes no mention of its directory train_data after telling us our records are there. Mine are there.

From that training sub-repo, grep -R train_data ./ gives no hits for anything looking like records in that directory. There is only one hit mentioning that directory itself without contents, that last mention in README.md. grep -R train_data ./ does give hits: the records themselves. grep -R record ./ | grep -v experimental gets hits only in README.md, and in convert_labels_to_records.py.

No scripts, python or any other sort, seem to be aware of those records sitting in there.

@jrasor
Copy link
Author

jrasor commented Aug 5, 2019

Following https://github.com/google/ftc-object-detection/tree/master/training, I made a new video of the poker chip only, and processed it according to that tutorial. I then copied the records into the ftc-object-detection/training/models/sample_mobilenet_v1_0.5_ssd_quantized directory with checkpoints already there. Then I ran python3 $MODEL_RESEARCH_DIR/object_detection/export_tflite_ssd_graph.py, got brand new training/models/sa*zed/tflite/tflite_graph.pb and tflite_graph.pbtxt, then invoked bazel to turn those into chipNdrives.tflite. In a Teleop mode, that model detected only Gold Minerals. It seemed to ignore the poker chip derived records.

How can I get that training tutorial to make use of my records?

@ftctechnh
Copy link

hi jrasor,

Before you ran the export_tflite_ssd_graphy.py, did you first run the model training script (model_main.py) to generate a new inference model? I believe you need to run the script and generate the new custom model first, before you export it to a format that you can then convert it into a .tflite file.

It might be helpful to review this tutorial. You want to copy your .record files to a data directory, and also create a pipeline.config file that the model_pain.py script will use to do the model training.

Note that the tutorial shows you how to run the training on a TPU cluster. However, the procedure is very similar for running it locally, but instead of using the cloud storage and initializing a Cloud (TPU) job, you run the training locally using the model_main.py script.

Also, you might want to adjust the batch size in the example pipeline.config file provided in the tutorial that i referenced (see the section entitled "Training a quantized model with Cloud TPUs on Cloud ML Engine"). The example pipeline has a large batch size value since the job is run on a TPU cluster in the example. If you run the job on your workstation, you might need to decrease the batch size (and increase the number of training steps) so the data set sizes aren't too big (I had some crashes on my laptop when I ran the training model for a few hundred steps and I believe these crashes were due to memory issues... I did not experience these crashes when I ran the model training on a TPU cluster.

Also, once you start your job locally, please google search how to use Tensorboard. You can use the utility to monitor the training. You can even view the training images and monitor the training's progress. This might be useful so you can see (and verify) that the process is using your new (poker chip) training records.

Definitely check out that tutorial and maybe even work through their example. If you can get their example running on your laptop, then you can get your custom training running on your laptop. Also, once you have it set up for your laptop, it's easy to then convert the solution to run on a TPU cluster.

I hope this helps.

Tom

@jrasor
Copy link
Author

jrasor commented Aug 8, 2019

Thanks, Tom, lots to work with here.

It will be a few days before I can fully implement your suggestions; school starting up shortly. When I have something definite, I will report here.

@jrasor
Copy link
Author

jrasor commented Aug 26, 2019

Some progress. Summary of findings since I last commented.
= = = = = =
Show-stopper path element contrib is still in tutorial https://github.com/google/ftc-object-detection/tree/master/training bazel call. See issue #14.
Laptop can train on poker chip in 3 hours with poor accuracy, but better than untrained guess (WAG), using 30 minute tutorial https://medium.com/tensorflow/traini...s-b78971cf1193 as a guide.
30 minute tutorial has show-stopper error in a switch for the detection job. Error fixed, job runs, but fails after 5 minutes with "Please provide a TPU Name to connect to". Evaluation job runs, but fails after 5 minutes with "Expected string but found: 'input_path".

@jrasor
Copy link
Author

jrasor commented Aug 26, 2019

Forgot to mention: Tom suggested invoke model_main.py. That was the missing step that enabled this partial success. It is not mentioned anywhere in this tutorial. It is the local analog of ml_engine in the 30 minute tutorial https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193. That tutorial has it as a switch in the evaluation job.

@powersurge-luke
Copy link

powersurge-luke commented Oct 8, 2019

Hi, @jrasor, I seem to have the same problem as you, have you had any success yet? I made the records, but can't find out what tutorial it is specifying. I am using Linux and not Google Cloud.

@jrasor
Copy link
Author

jrasor commented Oct 8, 2019 via email

@powersurge-luke
Copy link

powersurge-luke commented Oct 8, 2019

@jrasor Thanks for your reply. How do I change the commands to run locally. Do they need to run using a command line, or what? Previously I have been using other training methods using the cmd line, but they didn't work in the ftc app, but that is the only experience that I have.

@powersurge-luke
Copy link

@jrasor, @ftctechnh , its been over a week with no response. Can someone please tell me how to do this?

@jrasor
Copy link
Author

jrasor commented Oct 16, 2019 via email

@powersurge-luke
Copy link

powersurge-luke commented Oct 18, 2019 via email

@jrasor
Copy link
Author

jrasor commented Oct 19, 2019

I was able to train a model to distinguish poker chips from usb memory sticks with a little better than wild guess accuracy. I did it on a laptop. I'm attaching some log files. Some takeaways:

Tom Eng really worked hard to help me on this. See the first attachment.
The training tutorial in this repo has problems. See the second attachment.
I really worked hard to get Tensorflow to train on arbitrary objects. See the third attachment.

Tom Eng Help.docx
Training Tutorial Problems.docx
Training Tensorflow log.docx

@jrasor
Copy link
Author

jrasor commented Oct 19, 2019

For powersurge-luke, the conversion of Google Cloud commands to local ones is in the log. Search for "gotta translate". I'm sure Tom or somebody can do better than I did.

Adjusting the confidence level was no help.

@powersurge-luke
Copy link

Thanks for the help, I'll take a look at the logs and see what works.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants