Skip to content

Commit

Permalink
Merge branch 'intel:main' into dev/pytest-cov
Browse files Browse the repository at this point in the history
  • Loading branch information
icfaust authored Oct 15, 2024
2 parents 4393a77 + 2dd89cd commit 226e361
Show file tree
Hide file tree
Showing 10 changed files with 256 additions and 89 deletions.
3 changes: 3 additions & 0 deletions doc/sources/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,9 @@ Dimensionality Reduction

- ``svd_solver`` not in [`'full'`, `'covariance_eigh'`]
- Sparse data is not supported
* - `IncrementalPCA`
- All parameters are supported
- Sparse data is not supported
* - `TSNE`
- All parameters are supported except:

Expand Down
1 change: 1 addition & 0 deletions doc/sources/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
"notfound.extension",
"sphinx_design",
"sphinx_copybutton",
"sphinx.ext.napoleon",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
1 change: 1 addition & 0 deletions doc/sources/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ Enable Intel(R) GPU optimizations
algorithms.rst
oneAPI and GPU support <oneapi-gpu.rst>
distributed-mode.rst
non-scikit-algorithms.rst
verbose.rst
deprecation.rst

Expand Down
44 changes: 44 additions & 0 deletions doc/sources/non-scikit-algorithms.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. ******************************************************************************
.. * Copyright 2024 Intel Corporation
.. *
.. * Licensed under the Apache License, Version 2.0 (the "License");
.. * you may not use this file except in compliance with the License.
.. * You may obtain a copy of the License at
.. *
.. * http://www.apache.org/licenses/LICENSE-2.0
.. *
.. * Unless required by applicable law or agreed to in writing, software
.. * distributed under the License is distributed on an "AS IS" BASIS,
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. * See the License for the specific language governing permissions and
.. * limitations under the License.
.. *******************************************************************************/
Non-Scikit-Learn Algorithms
===========================
Algorithms not presented in the original scikit-learn are described here. All algorithms are
available for both CPU and GPU (including distributed mode)

BasicStatistics
---------------
.. autoclass:: sklearnex.basic_statistics.BasicStatistics
.. automethod:: sklearnex.basic_statistics.BasicStatistics.fit

IncrementalBasicStatistics
--------------------------
.. autoclass:: sklearnex.basic_statistics.IncrementalBasicStatistics
.. automethod:: sklearnex.basic_statistics.IncrementalBasicStatistics.fit
.. automethod:: sklearnex.basic_statistics.IncrementalBasicStatistics.partial_fit

IncrementalEmpiricalCovariance
------------------------------
.. autoclass:: sklearnex.covariance.IncrementalEmpiricalCovariance
.. automethod:: sklearnex.covariance.IncrementalEmpiricalCovariance.fit
.. automethod:: sklearnex.covariance.IncrementalEmpiricalCovariance.partial_fit

IncrementalLinearRegression
---------------------------
.. autoclass:: sklearnex.linear_model.IncrementalLinearRegression
.. automethod:: sklearnex.linear_model.IncrementalLinearRegression.fit
.. automethod:: sklearnex.linear_model.IncrementalLinearRegression.partial_fit
.. automethod:: sklearnex.linear_model.IncrementalLinearRegression.predict
61 changes: 44 additions & 17 deletions sklearnex/basic_statistics/basic_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,36 +42,63 @@ class BasicStatistics(BaseEstimator):
"""
Estimator for basic statistics.
Allows to compute basic statistics for provided data.
Note, some results can exhibit small variations due to
floating point error accumulation and multithreading.
Parameters
----------
result_options: string or list, default='all'
List of statistics to compute
Used to set statistics to calculate. Possible values are ``'min'``, ``'max'``, ``'sum'``, ``'mean'``, ``'variance'``,
``'variation'``, ``sum_squares'``, ``sum_squares_centered'``, ``'standard_deviation'``, ``'second_order_raw_moment'``
or a list containing any of these values. If set to ``'all'`` then all possible statistics will be
calculated.
Attributes (are existing only if corresponding result option exists)
Attributes
----------
min : ndarray of shape (n_features,)
min_ : ndarray of shape (n_features,)
Minimum of each feature over all samples.
max : ndarray of shape (n_features,)
max_ : ndarray of shape (n_features,)
Maximum of each feature over all samples.
sum : ndarray of shape (n_features,)
sum_ : ndarray of shape (n_features,)
Sum of each feature over all samples.
mean : ndarray of shape (n_features,)
mean_ : ndarray of shape (n_features,)
Mean of each feature over all samples.
variance : ndarray of shape (n_features,)
variance_ : ndarray of shape (n_features,)
Variance of each feature over all samples.
variation : ndarray of shape (n_features,)
variation_ : ndarray of shape (n_features,)
Variation of each feature over all samples.
sum_squares : ndarray of shape (n_features,)
sum_squares_ : ndarray of shape (n_features,)
Sum of squares for each feature over all samples.
standard_deviation : ndarray of shape (n_features,)
standard_deviation_ : ndarray of shape (n_features,)
Standard deviation of each feature over all samples.
sum_squares_centered : ndarray of shape (n_features,)
sum_squares_centered_ : ndarray of shape (n_features,)
Centered sum of squares for each feature over all samples.
second_order_raw_moment : ndarray of shape (n_features,)
second_order_raw_moment_ : ndarray of shape (n_features,)
Second order moment of each feature over all samples.
Note
----
Attribute exists only if corresponding result option has been provided.
Note
----
Attributes' names without the trailing underscore are
supported currently but deprecated in 2025.1 and will be removed in 2026.0
Note
----
Some results can exhibit small variations due to
floating point error accumulation and multithreading.
Examples
--------
>>> import numpy as np
>>> from sklearnex.basic_statistics import BasicStatistics
>>> bs = BasicStatistics(result_options=['sum', 'min', 'max'])
>>> X = np.array([[1, 2], [3, 4]])
>>> bs.fit(X)
>>> bs.sum_
np.array([4., 6.])
>>> bs.min_
np.array([1., 2.])
"""

def __init__(self, result_options="all"):
Expand Down Expand Up @@ -176,14 +203,14 @@ def fit(self, X, y=None, *, sample_weight=None):
Parameters
----------
X : array-like of shape (n_samples, n_features)
Data for compute, where `n_samples` is the number of samples and
`n_features` is the number of features.
Data for compute, where ``n_samples`` is the number of samples and
``n_features`` is the number of features.
y : Ignored
Not used, present for API consistency by convention.
sample_weight : array-like of shape (n_samples,), default=None
Weights for compute weighted statistics, where `n_samples` is the number of samples.
Weights for compute weighted statistics, where ``n_samples`` is the number of samples.
Returns
-------
Expand Down
82 changes: 60 additions & 22 deletions sklearnex/basic_statistics/incremental_basic_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,10 @@
@control_n_jobs(decorated_methods=["partial_fit", "_onedal_finalize_fit"])
class IncrementalBasicStatistics(BaseEstimator):
"""
Incremental estimator for basic statistics.
Allows to compute basic statistics if data are splitted into batches.
Calculates basic statistics on the given data, allows for computation when the data are split into
batches. The user can use ``partial_fit`` method to provide a single batch of data or use the ``fit`` method to provide
the entire dataset.
Parameters
----------
result_options: string or list, default='all'
Expand All @@ -53,40 +55,76 @@ class IncrementalBasicStatistics(BaseEstimator):
batch_size : int, default=None
The number of samples to use for each batch. Only used when calling
``fit``. If ``batch_size`` is ``None``, then ``batch_size``
is inferred from the data and set to ``5 * n_features``, to provide a
balance between approximation accuracy and memory consumption.
is inferred from the data and set to ``5 * n_features``.
Attributes (are existing only if corresponding result option exists)
Attributes
----------
min : ndarray of shape (n_features,)
min_ : ndarray of shape (n_features,)
Minimum of each feature over all samples.
max : ndarray of shape (n_features,)
max_ : ndarray of shape (n_features,)
Maximum of each feature over all samples.
sum : ndarray of shape (n_features,)
sum_ : ndarray of shape (n_features,)
Sum of each feature over all samples.
mean : ndarray of shape (n_features,)
mean_ : ndarray of shape (n_features,)
Mean of each feature over all samples.
variance : ndarray of shape (n_features,)
variance_ : ndarray of shape (n_features,)
Variance of each feature over all samples.
variation : ndarray of shape (n_features,)
variation_ : ndarray of shape (n_features,)
Variation of each feature over all samples.
sum_squares : ndarray of shape (n_features,)
sum_squares_ : ndarray of shape (n_features,)
Sum of squares for each feature over all samples.
standard_deviation : ndarray of shape (n_features,)
standard_deviation_ : ndarray of shape (n_features,)
Standard deviation of each feature over all samples.
sum_squares_centered : ndarray of shape (n_features,)
sum_squares_centered_ : ndarray of shape (n_features,)
Centered sum of squares for each feature over all samples.
second_order_raw_moment : ndarray of shape (n_features,)
second_order_raw_moment_ : ndarray of shape (n_features,)
Second order moment of each feature over all samples.
n_samples_seen_ : int
The number of samples processed by the estimator. Will be reset on
new calls to ``fit``, but increments across ``partial_fit`` calls.
batch_size_ : int
Inferred batch size from ``batch_size``.
n_features_in_ : int
Number of features seen during ``fit`` or ``partial_fit``.
Note
----
Attribute exists only if corresponding result option has been provided.
Note
----
Attributes' names without the trailing underscore are
supported currently but deprecated in 2025.1 and will be removed in 2026.0
Examples
--------
>>> import numpy as np
>>> from sklearnex.basic_statistics import IncrementalBasicStatistics
>>> incbs = IncrementalBasicStatistics(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> incbs.partial_fit(X[:1])
>>> incbs.partial_fit(X[1:])
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.min_
np.array([1., 2.])
>>> incbs.fit(X)
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.max_
np.array([3., 4.])
"""

_onedal_incremental_basic_statistics = staticmethod(onedal_IncrementalBasicStatistics)
Expand Down Expand Up @@ -244,17 +282,17 @@ def partial_fit(self, X, sample_weight=None, check_input=True):
Parameters
----------
X : array-like of shape (n_samples, n_features)
Data for compute, where `n_samples` is the number of samples and
`n_features` is the number of features.
Data for compute, where ``n_samples`` is the number of samples and
``n_features`` is the number of features.
y : Ignored
Not used, present for API consistency by convention.
sample_weight : array-like of shape (n_samples,), default=None
Weights for compute weighted statistics, where `n_samples` is the number of samples.
Weights for compute weighted statistics, where ``n_samples`` is the number of samples.
check_input : bool, default=True
Run check_array on X.
Run ``check_array`` on X.
Returns
-------
Expand All @@ -280,14 +318,14 @@ def fit(self, X, y=None, sample_weight=None):
Parameters
----------
X : array-like of shape (n_samples, n_features)
Data for compute, where `n_samples` is the number of samples and
`n_features` is the number of features.
Data for compute, where ``n_samples`` is the number of samples and
``n_features`` is the number of features.
y : Ignored
Not used, present for API consistency by convention.
sample_weight : array-like of shape (n_samples,), default=None
Weights for compute weighted statistics, where `n_samples` is the number of samples.
Weights for compute weighted statistics, where ``n_samples`` is the number of samples.
Returns
-------
Expand Down
28 changes: 23 additions & 5 deletions sklearnex/covariance/incremental_covariance.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@
@control_n_jobs(decorated_methods=["partial_fit", "fit", "_onedal_finalize_fit"])
class IncrementalEmpiricalCovariance(BaseEstimator):
"""
Incremental estimator for covariance.
Allows to compute empirical covariance estimated by maximum
likelihood method if data are splitted into batches.
Maximum likelihood covariance estimator that allows for the estimation when the data are split into
batches. The user can use the ``partial_fit`` method to provide a single batch of data or use the ``fit`` method to provide
the entire dataset.
Parameters
----------
Expand Down Expand Up @@ -84,13 +84,31 @@ class IncrementalEmpiricalCovariance(BaseEstimator):
n_samples_seen_ : int
The number of samples processed by the estimator. Will be reset on
new calls to fit, but increments across ``partial_fit`` calls.
new calls to ``fit``, but increments across ``partial_fit`` calls.
batch_size_ : int
Inferred batch size from ``batch_size``.
n_features_in_ : int
Number of features seen during :term:`fit` `partial_fit`.
Number of features seen during ``fit`` or ``partial_fit``.
Examples
--------
>>> import numpy as np
>>> from sklearnex.covariance import IncrementalEmpiricalCovariance
>>> inccov = IncrementalEmpiricalCovariance(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> inccov.partial_fit(X[:1])
>>> inccov.partial_fit(X[1:])
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
>>> inccov.fit(X)
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
"""

_onedal_incremental_covariance = staticmethod(onedal_IncrementalEmpiricalCovariance)
Expand Down
Loading

0 comments on commit 226e361

Please sign in to comment.