We will create a very simple linear classifier in this example. The data set of interest will be the Iris data. We will optimize a linear separator between pedal length and pedal width to classify if a flower is I. setosa or not. The reason we setup the data this way is because it will end up being linear seperable.
We use the sigmoid cross entropy loss because we are predicting a binary class (sigmoid) and it is a classification problem (cross entropy).
Viewing the resulting graph and separable line: