Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] No termination within reasonable time for Poisson regression in a specific case #504

Open
brtang63 opened this issue May 5, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@brtang63
Copy link
Contributor

brtang63 commented May 5, 2023

I've encountered a strange issue: abess() does not terminate in a specific situation. The following code produces a reproducible example. It runs for at least 10 mins without termination. However, by simply setting support.size = 0:13 or support.size = 14, it terminates immediately (perhaps within 1 second). Moreover, when tune.type = "gic", this issue also didn't happen, which makes me really confused.

The version of abess is 0.4.7 (installed from CRAN). I've tested the code on two different Linux systems. The same issue is encountered.

library(abess)
seed <- 1
n <- 100
p <- 1000
family <- "poisson"
snr <- Inf
beta <- rep(0, p)
nonzero <- sample(1:p, 10)
beta[nonzero] <- c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
k <- 10

data <- generate.data(n, p, beta = beta, snr = snr, family = family, support.size = k, seed = seed)
x <- data$x
y <- data$y

abess(x, y, tune.type = "cv", family = "poisson", support.size = 0:14)
@Mamba413 Mamba413 self-assigned this May 5, 2023
@Mamba413
Copy link
Collaborator

Mamba413 commented May 5, 2023

Thanks. I can reproduce this on my laptop. It may be caused by the extremely large value of the deviance when setting support.size = 0:14.

> abess(x, y, tune.type = "gic", family = "poisson", support.size = 0:13)
Call:
abess.default(x = x, y = y, family = "poisson", tune.type = "gic",  support.size = 0:13)

  support.size           dev          GIC
1             0 -7.581848e+14 -1.51637e+15
2             1 -2.298525e+34 -4.59705e+34
3             2 -2.298525e+34 -4.59705e+34
4             3 -2.298525e+34 -4.59705e+34
5             4 -2.298525e+34 -4.59705e+34
6             5 -2.298525e+34 -4.59705e+34
7             6 -2.298525e+34 -4.59705e+34
8             7 -2.298525e+34 -4.59705e+34
9             8 -2.298525e+34 -4.59705e+34
10            9 -2.298525e+34 -4.59705e+34
11           10 -2.298525e+34 -4.59705e+34
12           11 -2.298525e+34 -4.59705e+34
13           12 -2.298525e+34 -4.59705e+34
14           13 -2.298525e+34 -4.59705e+34

@Mamba413 Mamba413 added the bug Something isn't working label May 5, 2023
@Mamba413
Copy link
Collaborator

Mamba413 commented May 5, 2023

@oooo26 , I have uploaded two files poisson_y.csv and poisson_x.csv that corresponds to y and x, respectively. Can you test whether this issue happens in python?
poisson_x.csv
poisson_y.csv

@oooo26
Copy link
Collaborator

oooo26 commented May 16, 2023

Hi, sorry for the late response. I have checked in Python, but the problem seems to not happen.

ABESS version: latest, v0.4.6(PyPI)
Python version: 3.9.12

Here is the test code:

import numpy as np
import pandas as pd
import abess

X = pd.read_csv("poisson_x.csv")
y = pd.read_csv("poisson_y.csv").squeeze()
print(X.shape)
print(y.shape)

model = abess.PoissonRegression(
    support_size=range(15),     # 0:14
    cv=5                        # both CV and IC are working
)
model.fit(X, y)

print(f"Sparsity: {np.count_nonzero(model.coef_)}")
print(f"Non-zero: {np.nonzero(model.coef_)[0]}")
print(f"Train Loss: {model.train_loss_}")
print(f"Test Loss: {model.eval_loss_}")
######
# Sparsity: 4
# Non-zero: [122 352 573 769]
# Train Loss: -2360540438301305.5
# Test Loss: -729389503380903.0
######

@Mamba413 Mamba413 changed the title No termination within expected time for Poisson regression in a specific case [Bug] No termination within reasonable time for Poisson regression in a specific case Nov 23, 2023
@Mamba413
Copy link
Collaborator

@brtang63 , can you check this issue on the latest abess R package? I believe this problem has been addressed.

@brtang63
Copy link
Contributor Author

brtang63 commented Jun 14, 2024

Sorry for the late reply. I've tested with the latest CRAN version 0.4.8. I find this problem still happens occasionally. Note that the previous example I posted is not a good one, as seed is only set for generate.data(), but not for sample(). The following code is more reproducible. set.seed(1) works fine, but set.seed(2) still leads to this problem.

R version 4.3.1
abess version: 0.4.8

library(abess)

set.seed(2)
n <- 100
p <- 1000
family <- "poisson"
snr <- Inf
beta <- rep(0, p)
nonzero <- sample(1:p, 10)
beta[nonzero] <- c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
k <- 10

data <- generate.data(n, p, beta = beta, snr = snr, family = family, support.size = k)
x <- data$x
y <- data$y

abess(x, y, tune.type = "cv", family = "poisson", support.size = 0:14)

@Mamba413
Copy link
Collaborator

@brtang63 I guess this is because the estimated coefficients are unbounded because of the natural of poisson distribution. In the new version of abess library, you can use the beta.max and beta.min to control the range of estimated coefficients. You may refer this link: #510 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants