[BUG] Results from liblinear and SAGAClassifier do not match #113

arthurmensch · 2017-03-02T08:17:06Z

I wanted to check the differences between the output of SAGAClassifier and the output of liblinear, from LogisticRegression in sklearn. It turns out that both estimators return very different coefficients when solving what I thought would be the same problem. I suspect that I do not set the regularisation parameters correctly, would you be able to enlighten me ?

  iris = load_iris()
    X, y = iris.data, iris.target
    X = np.concatenate([X] * 10)
    y = np.concatenate([y] * 10)

    X_bin = X[y <= 1]
    y_bin = y[y <= 1] * 2 - 1

    X_sparse, y_sparse = make_classification(n_samples=100, n_features=50,
                                             random_state=0)
    X_sparse = sparse.csr_matrix(X_sparse)

    for penalty in ['l2', 'l1']:
        for (X, y) in ((X_bin, y_bin), (X_sparse, y_sparse)):
            n_samples = X.shape[0]
            for alpha in [1e-4, 0.001, 0.01, 0.1, 1, 10, 100]:
                liblinear = LogisticRegression(
                    C=1. / (n_samples * alpha),
                    solver='liblinear',
                    multi_class='ovr',
                    max_iter=500,
                    fit_intercept=False,
                    penalty=penalty, random_state=0, tol=1e-24)

                if penalty == 'l1':
                    lalpha = 0
                    lbeta = alpha
                    lpenalty = 'l1'
                else:
                    lalpha = alpha
                    lbeta = 0
                    lpenalty = None
                lsaga = SAGAClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=5000,
                                       random_state=0)

                lsaga.fit(X, y)
                liblinear.fit(X, y)
                print('[penalty=%s, alpha=%s, solver=liblinear]' % (penalty, alpha),
                      liblinear.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning]' % (penalty, alpha),
                      lsaga.coef_[0, :4])
                print('-------------------------------')

returns

[penalty=l2, alpha=0.0001, solver=liblinear] [-1.02531282 -3.15400719  4.88951892  2.4742867 ]
[penalty=l2, alpha=0.0001, solver=lightning] [-0.96940347 -2.93327997  4.42703281  2.06674462]
-------------------------------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.70002361 -2.29196084  3.47376739  1.6533943 ]
[penalty=l2, alpha=0.001, solver=lightning] [-0.81731435 -2.54960375  3.90450615  1.82232541]
-------------------------------
[penalty=l2, alpha=0.01, solver=liblinear] [-0.44234397 -1.48544888  2.23987699  1.01147785]
[penalty=l2, alpha=0.01, solver=lightning] [-0.48432713 -1.47725009  2.27903945  1.0521148 ]
-------------------------------
[penalty=l2, alpha=0.1, solver=liblinear] [-0.22184315 -0.77409488  1.17300667  0.51035061]
[penalty=l2, alpha=0.1, solver=lightning] [-0.24775234 -0.78256783  1.16843028  0.50506395]
-------------------------------
[penalty=l2, alpha=1, solver=liblinear] [-0.05236005 -0.22889097  0.36543505  0.15461335]
[penalty=l2, alpha=1, solver=lightning] [-0.03478164 -0.22241039  0.38099032  0.15941418]
-------------------------------
[penalty=l2, alpha=10, solver=liblinear] [ 0.00451328 -0.02524653  0.05703947  0.0231134 ]
[penalty=l2, alpha=10, solver=lightning] [ 0.00504563 -0.02507184  0.05748536  0.02325224]
-------------------------------
[penalty=l2, alpha=100, solver=liblinear] [ 0.00194671 -0.0018153   0.00675508  0.00263555]
[penalty=l2, alpha=100, solver=lightning] [ 0.00196397 -0.00180612  0.00676533  0.00263853]
-------------------------------
[penalty=l2, alpha=0.0001, solver=liblinear] [-0.82022128  0.92745245  0.42971564 -1.32700661]
[penalty=l2, alpha=0.0001, solver=lightning] [-0.2659946   0.39512438  0.2258429  -0.81652081]
-------------------------------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.46367915  0.56980419  0.26686545 -0.89167593]
[penalty=l2, alpha=0.001, solver=lightning] [-0.22980838  0.36281145  0.20797067 -0.74839268]
-------------------------------
[penalty=l2, alpha=0.01, solver=liblinear] [-0.14942186  0.2984446   0.13926796 -0.51601786]
[penalty=l2, alpha=0.01, solver=lightning] [-0.1077949   0.26593897  0.11853815 -0.47146184]
-------------------------------
[penalty=l2, alpha=0.1, solver=liblinear] [-0.01941695  0.15948197  0.04128331 -0.15189835]
[penalty=l2, alpha=0.1, solver=lightning] [-0.02838471  0.15804439  0.02516528 -0.14709557]
-------------------------------
[penalty=l2, alpha=1, solver=liblinear] [-0.00669644  0.04957467 -0.00320788 -0.00985278]
[penalty=l2, alpha=1, solver=lightning] [-0.0062626   0.04967    -0.00297072 -0.00952412]
-------------------------------
[penalty=l2, alpha=10, solver=liblinear] [-0.00093277  0.00708531 -0.00238096  0.00041233]
[penalty=l2, alpha=10, solver=lightning] [-0.0009186   0.00712768 -0.00248968  0.00046571]
-------------------------------
[penalty=l2, alpha=100, solver=liblinear] [ -9.60703453e-05   7.45333077e-04  -2.91400764e-04   7.26364227e-05]
[penalty=l2, alpha=100, solver=lightning] [ -9.66109354e-05   7.45361637e-04  -2.92248813e-04   7.30893945e-05]
-------------------------------
[penalty=l1, alpha=0.0001, solver=liblinear] [-0.32907481 -5.48237105  6.84810188  0.        ]
[penalty=l1, alpha=0.0001, solver=lightning] [-0.97892471 -2.97072992  4.4728889   2.07649284]
-------------------------------
[penalty=l1, alpha=0.001, solver=liblinear] [ 0.         -4.09888584  4.5791252   0.        ]
[penalty=l1, alpha=0.001, solver=lightning] [-0.88400508 -2.86859767  4.29820189  1.88896483]
-------------------------------
[penalty=l1, alpha=0.01, solver=liblinear] [ 0.         -2.51730703  2.81037122  0.        ]
[penalty=l1, alpha=0.01, solver=lightning] [-0.23747537 -2.27213624  2.99581196  0.24480634]
-------------------------------
[penalty=l1, alpha=0.1, solver=liblinear] [ 0.         -1.05958628  1.22152771  0.        ]
[penalty=l1, alpha=0.1, solver=lightning] [ 0.         -1.07317208  1.23880475  0.        ]
-------------------------------
[penalty=l1, alpha=1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=10, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=10, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=100, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=100, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=0.0001, solver=liblinear] [-0.44078639  0.01458268  1.32293892 -1.10445545]
[penalty=l1, alpha=0.0001, solver=lightning] [-0.25885727  0.38927315  0.22696332 -0.81394116]
-------------------------------
[penalty=l1, alpha=0.001, solver=liblinear] [-0.07313686  0.06330551  0.64000195 -0.76230004]
[penalty=l1, alpha=0.001, solver=lightning] [-0.16498383  0.31122609  0.21626616 -0.72676074]
-------------------------------
[penalty=l1, alpha=0.01, solver=liblinear] [ 0.          0.          0.09711175 -0.36113837]
[penalty=l1, alpha=0.01, solver=lightning] [ 0.          0.11207056  0.08627249 -0.30509918]
-------------------------------
[penalty=l1, alpha=0.1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=0.1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=10, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=10, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=100, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=100, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------

The text was updated successfully, but these errors were encountered:

arthurmensch · 2017-03-02T09:00:55Z

I should add that SAGClassifier (for l2 penalty) seems to behave the same way.

    iris = load_iris()
    X, y = iris.data, iris.target
    X = np.concatenate([X] * 10)
    y = np.concatenate([y] * 10)

    X_bin = X[y <= 1]
    y_bin = y[y <= 1] * 2 - 1

    X_sparse, y_sparse = make_classification(n_samples=1000, n_features=50,
                                             random_state=0)
    X_sparse = sparse.csr_matrix(X_sparse)

    for (X, y) in ((X_bin, y_bin), (X_sparse, y_sparse)):
        for penalty in ['l2']:
            n_samples = X.shape[0]
            for alpha in np.logspace(-3, 3, 5):
                liblinear = LogisticRegression(
                    C=1. / (n_samples * alpha),
                    solver='liblinear',
                    multi_class='ovr',
                    max_iter=500,
                    fit_intercept=False,
                    penalty=penalty, random_state=0, tol=1e-24)

                if penalty == 'l1':
                    lalpha = 0
                    lbeta = alpha
                    lpenalty = 'l1'
                else:
                    lalpha = alpha
                    lbeta = 0
                    lpenalty = None
                lsag = SAGClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=3000,
                                       random_state=0)

                lsaga = SAGAClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=3000,
                                       random_state=0)

                lsaga.fit(X, y)
                lsag.fit(X, y)
                liblinear.fit(X, y)
                print('[penalty=%s, alpha=%s, solver=liblinear]' % (penalty, alpha),
                      liblinear.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning saga]' % (penalty, alpha),
                      lsaga.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning sag]' % (penalty, alpha),
                      lsag.coef_[0, :4])
                print('-------------------------------')

--------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.70002361 -2.29196084  3.47376739  1.6533943 ]
[penalty=l2, alpha=0.001, solver=lightning saga] [-0.81731435 -2.54960375  3.90450615  1.82232541]
[penalty=l2, alpha=0.001, solver=lightning sag] [-1.20407646 -4.19130767  6.18008358  2.65898967]
-------------------------------
[penalty=l2, alpha=0.0316227766017, solver=liblinear] [-0.32789047 -1.11596898  1.68437735  0.74534804]
[penalty=l2, alpha=0.0316227766017, solver=lightning saga] [-0.37339261 -1.12060485  1.68188194  0.74885022]
[penalty=l2, alpha=0.0316227766017, solver=lightning sag] [-0.33469617 -1.11047128  1.6849291   0.74818174]
-------------------------------
[penalty=l2, alpha=1.0, solver=liblinear] [-0.05236005 -0.22889097  0.36543505  0.15461335]
[penalty=l2, alpha=1.0, solver=lightning saga] [-0.03478164 -0.22241039  0.38099032  0.15941418]
[penalty=l2, alpha=1.0, solver=lightning sag] [-0.05261102 -0.2286055   0.36511032  0.15468706]
-------------------------------
[penalty=l2, alpha=31.6227766017, solver=liblinear] [ 0.00430706 -0.00666999  0.02016419  0.00797502]
[penalty=l2, alpha=31.6227766017, solver=lightning saga] [ 0.00447114 -0.00657897  0.02025385  0.00799994]
[penalty=l2, alpha=31.6227766017, solver=lightning sag] [ 0.00431818 -0.00666541  0.02017375  0.00797861]
-------------------------------
[penalty=l2, alpha=1000.0, solver=liblinear] [ 0.0002283  -0.00016419  0.00069642  0.00026974]
[penalty=l2, alpha=1000.0, solver=lightning saga] [ 0.00022848 -0.00016409  0.00069653  0.00026977]
[penalty=l2, alpha=1000.0, solver=lightning sag] [ 0.00022836 -0.00016415  0.00069646  0.00026975]
-------------------------------
--------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.07378359 -0.08176338  0.15239206  0.150337  ]
[penalty=l2, alpha=0.001, solver=lightning saga] [-0.11194055 -0.08520462  0.17792485  0.13378968]
[penalty=l2, alpha=0.001, solver=lightning sag] [-0.10485795 -0.0808454   0.17528227  0.15793832]
-------------------------------
[penalty=l2, alpha=0.0316227766017, solver=liblinear] [-0.00666195 -0.057333    0.07570911  0.06161208]
[penalty=l2, alpha=0.0316227766017, solver=lightning saga] [-0.00618323 -0.07040822  0.10739897  0.01089067]
[penalty=l2, alpha=0.0316227766017, solver=lightning sag] [-0.00214345 -0.05492471  0.07050465  0.06115048]
-------------------------------
[penalty=l2, alpha=1.0, solver=liblinear] [ 0.00666119 -0.00775256  0.01907765  0.01085946]
[penalty=l2, alpha=1.0, solver=lightning saga] [ 0.00667705 -0.0076178   0.01907703  0.01098435]
[penalty=l2, alpha=1.0, solver=lightning sag] [ 0.00669436 -0.00763707  0.01888839  0.01085222]
-------------------------------
[penalty=l2, alpha=31.6227766017, solver=liblinear] [ 0.00041758 -0.00028051  0.00096995  0.00054382]
[penalty=l2, alpha=31.6227766017, solver=lightning saga] [ 0.00041997 -0.00028113  0.0009699   0.00054591]
[penalty=l2, alpha=31.6227766017, solver=lightning sag] [ 0.00041891 -0.00028166  0.00096977  0.00054561]
-------------------------------
[penalty=l2, alpha=1000.0, solver=liblinear] [  1.35809058e-05  -8.90077970e-06   3.13036897e-05   1.75473968e-05]
[penalty=l2, alpha=1000.0, solver=lightning saga] [  1.35823360e-05  -8.90193905e-06   3.13035749e-05   1.75491991e-05]
[penalty=l2, alpha=1000.0, solver=lightning sag] [  1.35823230e-05  -8.90193147e-06   3.13035523e-05   1.75492055e-05]

fabianp · 2017-03-02T13:27:24Z

Add tol=-1 to the kwargs of SAG*Classifier :-)

Also, note that lightning has a definition the logistic loss that is very slightly different from that of liblinear. In particular, the one in lightning is an approximation using some heuristics. See https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/sgd_fast.pyx , class Log . This only matters typically if you look beyond 10^-4 (e.g. if you plot convergence plots).

arthurmensch · 2017-03-02T13:40:03Z

Indeed, sorry for the noise, it now works. Do you use this approximation for speed or underflow control ?

fabianp · 2017-03-02T13:46:25Z

Mathieu did this, but I would say speed as underflow can be controlled by other ways (liblinear does this pretty well).

mblondel · 2017-03-06T02:17:43Z

If I am not mistaken, the log loss trick comes from scikit-learn, which itself comes from Leon Bottou's SGD code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Results from liblinear and SAGAClassifier do not match #113

[BUG] Results from liblinear and SAGAClassifier do not match #113

arthurmensch commented Mar 2, 2017

arthurmensch commented Mar 2, 2017 •

edited

Loading

fabianp commented Mar 2, 2017

arthurmensch commented Mar 2, 2017

fabianp commented Mar 2, 2017 •

edited

Loading

mblondel commented Mar 6, 2017

[BUG] Results from liblinear and SAGAClassifier do not match #113

[BUG] Results from liblinear and SAGAClassifier do not match #113

Comments

arthurmensch commented Mar 2, 2017

arthurmensch commented Mar 2, 2017 • edited Loading

fabianp commented Mar 2, 2017

arthurmensch commented Mar 2, 2017

fabianp commented Mar 2, 2017 • edited Loading

mblondel commented Mar 6, 2017

arthurmensch commented Mar 2, 2017 •

edited

Loading

fabianp commented Mar 2, 2017 •

edited

Loading