Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bail out of training immediately if no training data batches are provided #187

Open
shuttle1987 opened this issue Aug 19, 2018 · 4 comments
Assignees
Milestone

Comments

@shuttle1987
Copy link
Member

I propose we bail out immediately if no training data is found, currently we have this:

                    batch_gen = self.corpus_reader.train_batch_gen()
    
                    train_ler_total = 0
                    print("\tBatch...", end="")
                    for batch_i, batch in enumerate(batch_gen):
                        print("%d..." % batch_i, end="")
                        sys.stdout.flush()
                        batch_x, batch_x_lens, batch_y = batch
    
                        feed_dict = {self.batch_x: batch_x,
                                    self.batch_x_lens: batch_x_lens,
                                    self.batch_y: batch_y}
    
                        _, ler, = sess.run([self.optimizer, self.ler],
                                        feed_dict=feed_dict)
    
                        train_ler_total += ler

But if batch_gen provides nothing we have an error state where the code will break at some future unspecified point.

Instead I'm thinking we can do something like this:

                    batch_gen = self.corpus_reader.train_batch_gen()
    
                    train_ler_total = 0
                    print("\tBatch...", end="")
                    for batch_i, batch in enumerate(batch_gen):
                        print("%d..." % batch_i, end="")
                        sys.stdout.flush()
                        batch_x, batch_x_lens, batch_y = batch
    
                        feed_dict = {self.batch_x: batch_x,
                                    self.batch_x_lens: batch_x_lens,
                                    self.batch_y: batch_y}
    
                        _, ler, = sess.run([self.optimizer, self.ler],
                                        feed_dict=feed_dict)
    
                        train_ler_total += ler
                    else:
                        raise PersephoneError("No training data was provided, check your batch generation")

This would make it really clear that something has gone wrong and would immediately report the failure.

@shuttle1987
Copy link
Member Author

This is related closely to #184 and #185

@shuttle1987 shuttle1987 changed the title Bail out early if no data is provided for training Bail out of training immediately if no training data batches are provided Aug 19, 2018
@oadams oadams added this to the 0.4.0 milestone Oct 13, 2018
@oadams oadams self-assigned this Oct 13, 2018
@oadams oadams closed this as completed in 0926356 Oct 13, 2018
@shuttle1987
Copy link
Member Author

There's actually a regression here because previously working test cases now fail such as this one: https://travis-ci.org/persephone-tools/persephone-web-API/jobs/441199123

@shuttle1987 shuttle1987 reopened this Oct 14, 2018
@alexis-michaud
Copy link

alexis-michaud commented Oct 14, 2018

"There's actually a regression here"

Shush! The world is watching. Instead of just 'regression', could you kindly refer to this as, let's say, deep regression, or (maybe even better) vanilla deep regression? In the profuse terminology of deep learning, there are plenty of terms that could come in handy as euphemisms 🙊

@oadams
Copy link
Collaborator

oadams commented Oct 14, 2018

Sorry about that, I've changed it back. I shouldn't have been working on master directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants