Merge pull request #6 from WenjieDu/dev

Refactor the code and update the doc
WenjieDu · Aug 30, 2023 · 5ed3214 · 5ed3214
2 parents 176a0c9 + 8df63a7
commit 5ed3214
Show file tree

Hide file tree

Showing 9 changed files with 252 additions and 180 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,20 @@
+# ignore special files or folds
+*~
+.idea
+.DS_Store
+
+# ignore all building results
+dist
+build
+docs/_build
+*.egg-info
+
+# ignore all testing/running results
+.run
+.coverage
+.pytest_cache
+*__pycache__*
+*testing_results*
+
+# ignore specific kinds of files like all PDFs
+*.pdf
diff --git a/README.md b/README.md
@@ -1,29 +1,34 @@
-<a href='https://github.com/WenjieDu/PyCorruptor'><img src='https://raw.githubusercontent.com/PyPOTS/pypots.github.io/main/static/figs/pypots_logos/PyCorruptor_logo_FFBG.svg?sanitize=true' width='350' align='right' /></a>
+<a href='https://github.com/WenjieDu/PyCorruptor'><img src='https://raw.githubusercontent.com/PyPOTS/pypots.github.io/main/static/figs/pypots_logos/PyCorruptor_logo_FFBG.svg?sanitize=true' width='375' align='right' /></a>
 
 # <p align='center'>Welcome to PyCorruptor</p>
 
 **<p align='center'>A Python Toolbox for Data Corruption</p>**
+
 <p align='center'>
-    <!-- Python version -->
-    <img src='https://img.shields.io/badge/python-v3-yellowgreen'>
-    <!-- PyPI version -->
-    <img alt="PyPI" src="https://img.shields.io/pypi/v/pycorruptor?color=green&label=PyPI">
-    <!-- GitHub Testing -->
-    <a alt='GitHub Testing' href='https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml'>
-        <img src='https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml/badge.svg'>
+    <a href='https://github.com/WenjieDu/PyCorruptor'>
+        <img alt='Python version' src='https://img.shields.io/badge/python-v3-E97040?logo=python&logoColor=white'>
     </a>
-    <!-- Coveralls report -->
-    <a alt='Coveralls report' href='https://coveralls.io/github/WenjieDu/PyCorruptor'>
-        <img src='https://img.shields.io/coverallsCoverage/github/WenjieDu/PyCorruptor?branch=main&logo=coveralls'>
+    <a href="https://github.com/WenjieDu/PyCorruptor/releases">
+        <img alt="the latest release version" src="https://img.shields.io/github/v/release/wenjiedu/PyCorruptor?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white">
     </a>
-    <a href="https://anaconda.org/conda-forge/pycorruptor">
-        <img alt="Conda downloads" src="https://img.shields.io/conda/dn/conda-forge/pycorruptor?label=Conda%20Downloads&color=AED0ED&logo=anaconda&logoColor=white">
+    <a href="https://github.com/WenjieDu/PyCorruptor/blob/main/LICENSE">
+        <img alt="GPL-v3 license" src="https://img.shields.io/badge/License-GPL--v3-E9BB41?logo=opensourceinitiative&logoColor=white">
     </a>
-    <a href="https://pypi.org/project/pycorruptor">
-        <img alt="PyPI downloads" src="https://static.pepy.tech/personalized-badge/pycorruptor?period=total&units=international_system&left_color=grey&right_color=blue&left_text=PyPI%20Downloads">
+    <a  href='https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml'>
+        <img alt='GitHub Testing' src='https://img.shields.io/github/actions/workflow/status/wenjiedu/PyCorruptor/testing_ci.yml?logo=github&color=C8D8E1&label=CI'>
+    </a>
+    <a href="https://codeclimate.com/github/WenjieDu/PyCorruptor">
+        <img alt="Code Climate maintainability" src="https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyCorruptor?color=3C7699&label=Maintainability&logo=codeclimate">
+    </a>
+    <a href='https://coveralls.io/github/WenjieDu/PyCorruptor'>
+        <img alt='Coveralls report' src='https://img.shields.io/coverallsCoverage/github/WenjieDu/PyCorruptor?branch=main&logo=coveralls&color=75C1C4&label=Coverage'>
+    </a>
+    <a href="https://anaconda.org/conda-forge/PyCorruptor">
+        <img alt="Conda downloads" src="https://img.shields.io/conda/dn/conda-forge/PyCorruptor?label=Conda%20Downloads&color=AED0ED&logo=anaconda&logoColor=white">
+    </a>
+    <a href='https://pepy.tech/project/PyCorruptor'>
+        <img alt='PyPI download number' src='https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FWenjieDu%2FWenjieDu%2Fmain%2Ffigs%2Fprojects%2Fpycorruptor_downloads.json'>
     </a>
-    <!-- Visit number -->
-
 </p>
 
 <a href='https://github.com/WenjieDu/PyPOTS'><img src='https://raw.githubusercontent.com/PyPOTS/pypots.github.io/main/static/figs/pypots_logos/PyPOTS_logo_FFBG.svg?sanitize=true' width='160' align='left' /></a>
@@ -64,5 +69,5 @@ or
 
 <details>
 <summary>🏠 Visits</summary>
-<img align='left' src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits&edge_flat=false'>
+<img align='left' src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits+since+May+2022&edge_flat=false'>
 </details>
diff --git a/docs/index.rst b/docs/index.rst
@@ -11,17 +11,35 @@ Welcome to PyCorruptor's documentation!
    :target: https://github.com/WenjieDu/PyCorruptor
    :alt: PyCorruptor logo
 
-.. centered:: A Python Toolbox for Data Corruption
+**A Python Toolbox for Data Corruption**
 
-.. image:: https://img.shields.io/badge/python-v3-yellowgreen
+.. image:: https://img.shields.io/badge/python-v3-E97040?logo=python&logoColor=white
    :alt: Python version
-.. image:: https://img.shields.io/pypi/v/pycorruptor?color=green&label=PyPI
-   :alt: PyPI version
-   :target: https://pypi.org/project/pycorruptor
-.. image:: https://static.pepy.tech/personalized-badge/pycorruptor?period=total&units=none&left_color=gray&right_color=blue&left_text=Total%20Downloads
-   :alt: PyPI download number
-.. image:: https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits&edge_flat=false
-   :alt: Visit number
+.. image:: https://img.shields.io/github/v/release/wenjiedu/pycorruptor?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white
+   :alt: the latest release version
+   :target: https://img.shields.io/github/v/release/wenjiedu/pycorruptor?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white
+.. image:: https://img.shields.io/badge/License-GPL--v3-E9BB41?logo=opensourceinitiative&logoColor=white
+   :alt: License
+   :target: https://github.com/WenjieDu/PyCorruptor/blob/main/LICENSE
+.. image:: https://img.shields.io/github/actions/workflow/status/wenjiedu/pycorruptor/testing_ci.yml?logo=github&color=C8D8E1&label=CI
+   :alt: GitHub Testing
+   :target: https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml
+.. image:: https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyCorruptor?color=3C7699&label=Maintainability&logo=codeclimate
+   :alt: Code Climate maintainability
+   :target: https://codeclimate.com/github/WenjieDu/PyCorruptor
+.. image:: https://img.shields.io/coverallsCoverage/github/WenjieDu/PyCorruptor?branch=main&logo=coveralls&color=75C1C4&label=Coverage
+   :alt: Coveralls report
+   :target: https://coveralls.io/github/WenjieDu/PyCorruptor
+.. image:: https://img.shields.io/conda/dn/conda-forge/pycorruptor?label=Conda%20Downloads&color=AED0ED&logo=anaconda&logoColor=white
+   :alt: Conda downloads
+   :target: https://anaconda.org/conda-forge/pypots
+.. image:: https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FWenjieDu%2FWenjieDu%2Fmain%2Ffigs%2Fprojects%2Fpycorruptor_downloads.json
+   :alt: PyPI downloads
+   :target: https://pepy.tech/project/pycorruptor
+.. image:: https://img.shields.io/badge/Contributor%20Covenant-v2.1-4baaaa.svg
+   :alt: CODE of CONDUCT
+.. image:: https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits+since+May+2022&edge_flat=false
+   :alt: Visit num
 
 In data analysis and modeling, sometimes we may need to corrupt the original data to achieve our goal, for instance, evaluating models' ability to reconstruct corrupted data or assessing the model's performance on only partially-observed data. PyCorruptor is such a tool to help you corrupt your data, which provides several patterns to create missing values in the given data.
 

diff --git a/docs/pycorruptor.tests.rst b/docs/pycorruptor.tests.rst
diff --git a/pycorruptor/__init__.py b/pycorruptor/__init__.py
@@ -1,5 +1,5 @@
 """
-PyCorruptor package
+PyCorruptor package.
 """
 
 # Created by Wenjie Du <[email protected]>
@@ -24,12 +24,11 @@
 __version__ = "0.0.4"
 
 try:
-    from pycorruptor.corrupt import (
+    from pycorruptor.mcar import mcar
+    from pycorruptor.utils import (
         cal_missing_rate,
         masked_fill,
-        mcar,
     )
-
 except Exception as e:
     print(e)
 

diff --git a/pycorruptor/mar.py b/pycorruptor/mar.py
@@ -0,0 +1,43 @@
+"""
+Corrupt data by adding missing values to it with MAR (missing at random) pattern.
+"""
+
+# Created by Wenjie Du <[email protected]>
+# License: GLP-v3
+
+
+def mar(X, rate, nan=0):
+    """Create random missing values (MAR case).
+
+    Parameters
+    ----------
+    X : array,
+        Data vector. If X has any missing values, they should be numpy.nan.
+
+    rate : float, in (0,1),
+        Artificially missing rate, rate of the observed values which will be artificially masked as missing.
+
+        Note that,
+        `rate` = (number of artificially missing values) / np.sum(~np.isnan(self.data)),
+        not (number of artificially missing values) / np.product(self.data.shape),
+        considering that the given data may already contain missing values,
+        the latter way may be confusing because if the original missing rate >= `rate`,
+        the function will do nothing, i.e. it won't play the role it has to be.
+
+    nan : int/float, optional, default=0
+        Value used to fill NaN values.
+
+    Returns
+    -------
+
+    """
+    # TODO: Create missing values in MAR case
+    raise NotImplementedError("MAR case has not been implemented yet.")
+
+
+def _mar_numpy(X, rate, nan=0):
+    pass
+
+
+def _mar_torch(X, rate, nan=0):
+    pass
diff --git a/pycorruptor/corrupt.py → pycorruptor/mcar.py b/pycorruptor/corrupt.py → pycorruptor/mcar.py
@@ -1,5 +1,5 @@
 """
-Corrupt data by adding missing values to it with optional missing patterns (MCAR,MAR,MNAR).
+Corrupt data by adding missing values to it with MCAR (missing completely at random) pattern.
 """
 
 # Created by Wenjie Du <[email protected]>
@@ -13,89 +13,6 @@
     pass
 
 
-def cal_missing_rate(X):
-    """Calculate the originally missing rate of the raw data.
-
-    Parameters
-    ----------
-    X : array-like,
-        Data array that may contain missing values.
-
-    Returns
-    -------
-    originally_missing_rate, float,
-        The originally missing rate of the raw data.
-    """
-    if isinstance(X, list):
-        X = np.asarray(X)
-
-    if isinstance(X, np.ndarray):
-        originally_missing_rate = np.sum(np.isnan(X)) / np.product(X.shape)
-    elif isinstance(X, torch.Tensor):
-        originally_missing_rate = torch.sum(torch.isnan(X)) / np.product(X.shape)
-        originally_missing_rate = originally_missing_rate.item()
-    else:
-        raise TypeError(
-            "X must be type of list/numpy.ndarray/torch.Tensor, " f"but got {type(X)}"
-        )
-
-    return originally_missing_rate
-
-
-def masked_fill(X, mask, val):
-    """Like torch.Tensor.masked_fill(), fill elements in given `X` with `val` where `mask` is True.
-
-    Parameters
-    ----------
-    X : array-like,
-        The data vector.
-
-    mask : array-like,
-        The boolean mask.
-
-    val : float
-        The value to fill in with.
-
-    Returns
-    -------
-    array,
-        mask
-    """
-    assert X.shape == mask.shape, (
-        "Shapes of X and mask must match, "
-        f"but X.shape={X.shape}, mask.shape={mask.shape}"
-    )
-    assert isinstance(X, type(mask)), (
-        "Data types of X and mask must match, " f"but got {type(X)} and {type(mask)}"
-    )
-
-    if isinstance(X, list):
-        X = np.asarray(X)
-        mask = np.asarray(mask)
-
-    if isinstance(X, np.ndarray):
-        mask = mask.astype(bool)
-        X[mask] = val
-    elif isinstance(X, torch.Tensor):
-        mask = mask.type(torch.bool)
-        X[mask] = val
-    else:
-        raise TypeError(
-            "X must be type of list/numpy.ndarray/torch.Tensor, " f"but got {type(X)}"
-        )
-
-    return X
-
-
-def little_mcar_test(X):
-    """Little's MCAR Test.
-
-    Refer to :cite:`little1988TestMCAR`
-    """
-    # TODO: Little's MCAR test
-    raise NotImplementedError("MCAR test has not been implemented yet.")
-
-
 def mcar(X, rate, nan=0):
     """Create completely random missing values (MCAR case).
 
@@ -195,59 +112,10 @@ def _mcar_torch(X, rate, nan=0):
     return X_intact, X, missing_mask, indicating_mask
 
 
-def mar(X, rate, nan=0):
-    """Create random missing values (MAR case).
-
-    Parameters
-    ----------
-    X : array,
-        Data vector. If X has any missing values, they should be numpy.nan.
-
-    rate : float, in (0,1),
-        Artificially missing rate, rate of the observed values which will be artificially masked as missing.
-
-        Note that,
-        `rate` = (number of artificially missing values) / np.sum(~np.isnan(self.data)),
-        not (number of artificially missing values) / np.product(self.data.shape),
-        considering that the given data may already contain missing values,
-        the latter way may be confusing because if the original missing rate >= `rate`,
-        the function will do nothing, i.e. it won't play the role it has to be.
-
-    nan : int/float, optional, default=0
-        Value used to fill NaN values.
-
-    Returns
-    -------
-
-    """
-    # TODO: Create missing values in MAR case
-    raise NotImplementedError("MAR case has not been implemented yet.")
-
-
-def mnar(X, rate, nan=0):
-    """Create not-random missing values (MNAR case).
-
-    Parameters
-    ----------
-    X : array,
-        Data vector. If X has any missing values, they should be numpy.nan.
-
-    rate : float, in (0,1),
-        Artificially missing rate, rate of the observed values which will be artificially masked as missing.
-
-        Note that,
-        `rate` = (number of artificially missing values) / np.sum(~np.isnan(self.data)),
-        not (number of artificially missing values) / np.product(self.data.shape),
-        considering that the given data may already contain missing values,
-        the latter way may be confusing because if the original missing rate >= `rate`,
-        the function will do nothing, i.e. it won't play the role it has to be.
-
-    nan : int/float, optional, default=0
-        Value used to fill NaN values.
-
-    Returns
-    -------
+def little_mcar_test(X):
+    """Little's MCAR Test.
 
+    Refer to :cite:`little1988TestMCAR`
     """
-    # TODO: Create missing values in MNAR case
-    raise NotImplementedError("MNAR case has not been implemented yet.")
+    # TODO: Little's MCAR test
+    raise NotImplementedError("MCAR test has not been implemented yet.")