Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p1ch8/1_convolution.ipynb L2 regularization problem #91

Open
JuanCab opened this issue May 23, 2022 · 0 comments
Open

p1ch8/1_convolution.ipynb L2 regularization problem #91

JuanCab opened this issue May 23, 2022 · 0 comments

Comments

@JuanCab
Copy link

JuanCab commented May 23, 2022

I've been working my way through the Jupyter Notebook for Chapter 8.

When I run the cell that trains using L2 regularization

model = Net().to(device=device)
optimizer = optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()

training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader,
)
all_acc_dict["l2 reg"] = validate(model, train_loader, val_loader)

The network will not train since the loss is 'nan'. I am curious if there is an error in the definition of training_loop_l2reg in the previous cell:

def training_loop_l2reg(n_epochs, optimizer, model, loss_fn,
                        train_loader):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            imgs = imgs.to(device=device)
            labels = labels.to(device=device)
            outputs = model(imgs)
            loss = loss_fn(outputs, labels)

            l2_lambda = 0.001
            # Replace pow(2.0) with abs() for L1 regularization
            l2_norm = sum(p.pow(2.0).sum()
                          for p in model.parameters())  
            loss = loss + l2_lambda * l2_norm

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            loss_train += loss.item()

        if epoch == 1 or epoch % 10 == 0:
            print('{} Epoch {}, Training loss {}'.format(
                datetime.datetime.now(), epoch,
                loss_train / len(train_loader)))

Since if I instead train using the weight_decay parameter in SGD instead:

model = NetWidth(n_chans1=32).to(device=device)
optimizer = optim.SGD(model.parameters(), weight_decay=0.001, lr=1e-2)
loss_fn = nn.CrossEntropyLoss()

training_loop(
    n_epochs = 100,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

all_acc_dict["width"] = validate(model, train_loader, val_loader)

I have no problem with the loss converging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant