Stratified label splits #39

Maikello · 2019-11-19T11:39:11Z

The current label splits are not stratified. This could cause issues with not all labels being present in the train or test set, which gives errors when training the model. Please replace the following code with the code down below:

newdf1 = np.random.rand(len(rnewdf)) < 0.8

train = rnewdf[newdf1]
test = rnewdf[~newdf1]

trainfeatures = train.iloc[:, :-1]
trainlabel = train.iloc[:, -1:]
testfeatures = test.iloc[:, :-1]
testlabel = test.iloc[:, -1:]

from sklearn.model_selection import StratifiedShuffleSplit
X = rnewdf.iloc[:, :-1]
y = rnewdf.iloc[:, -1:]

def dataSplitting(X, y):
"""Returns training and test set matrices/vectors for X and y"""
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]

return X_train, X_test, y_train, y_test

trainfeatures, testfeatures, trainlabel, testlabel = dataSplitting(X, y)

Using this code will ensure that all labels are presented equally when training, causing no errors when making a random selection that would have led to the one hot encoding to a categorical variable not making an output layer of size 10

The text was updated successfully, but these errors were encountered:

thisislohith6 · 2020-12-01T18:51:55Z

Hi,

Initially, I tried to build the model without replacing the code that you mentioned and got an error " ValueError: Shapes (None, 4) and (None, 10) are incompatible" and later I replaced the code that you mentioned above and built the model and at the time fitting the model again I am facing the error "ValueError: Shapes (16, 4) and (16, 10) are incompatible" .

So, could you please suggest me what changes do I need to do?

Thanks in advance, Appreciate your help!

SyedaFaiqaFIAZ · 2021-03-29T16:20:05Z

Hi,

Initially, I tried to build the model without replacing the code that you mentioned and got an error " ValueError: Shapes (None, 4) and (None, 10) are incompatible" and later I replaced the code that you mentioned above and built the model and at the time fitting the model again I am facing the error "ValueError: Shapes (16, 4) and (16, 10) are incompatible" .

So, could you please suggest me what changes do I need to do?

Thanks in advance, Appreciate your help!

same error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stratified label splits #39

Stratified label splits #39

Maikello commented Nov 19, 2019

thisislohith6 commented Dec 1, 2020

SyedaFaiqaFIAZ commented Mar 29, 2021

Stratified label splits #39

Stratified label splits #39

Comments

Maikello commented Nov 19, 2019

thisislohith6 commented Dec 1, 2020

SyedaFaiqaFIAZ commented Mar 29, 2021