Skip to content

Commit

Permalink
Merge pull request #6 from WenjieDu/dev
Browse files Browse the repository at this point in the history
Refactor the code and update the doc
  • Loading branch information
WenjieDu authored Aug 30, 2023
2 parents 176a0c9 + 8df63a7 commit 5ed3214
Show file tree
Hide file tree
Showing 9 changed files with 252 additions and 180 deletions.
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# ignore special files or folds
*~
.idea
.DS_Store

# ignore all building results
dist
build
docs/_build
*.egg-info

# ignore all testing/running results
.run
.coverage
.pytest_cache
*__pycache__*
*testing_results*

# ignore specific kinds of files like all PDFs
*.pdf
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,34 @@
<a href='https://github.com/WenjieDu/PyCorruptor'><img src='https://raw.githubusercontent.com/PyPOTS/pypots.github.io/main/static/figs/pypots_logos/PyCorruptor_logo_FFBG.svg?sanitize=true' width='350' align='right' /></a>
<a href='https://github.com/WenjieDu/PyCorruptor'><img src='https://raw.githubusercontent.com/PyPOTS/pypots.github.io/main/static/figs/pypots_logos/PyCorruptor_logo_FFBG.svg?sanitize=true' width='375' align='right' /></a>

# <p align='center'>Welcome to PyCorruptor</p>

**<p align='center'>A Python Toolbox for Data Corruption</p>**

<p align='center'>
<!-- Python version -->
<img src='https://img.shields.io/badge/python-v3-yellowgreen'>
<!-- PyPI version -->
<img alt="PyPI" src="https://img.shields.io/pypi/v/pycorruptor?color=green&label=PyPI">
<!-- GitHub Testing -->
<a alt='GitHub Testing' href='https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml'>
<img src='https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml/badge.svg'>
<a href='https://github.com/WenjieDu/PyCorruptor'>
<img alt='Python version' src='https://img.shields.io/badge/python-v3-E97040?logo=python&logoColor=white'>
</a>
<!-- Coveralls report -->
<a alt='Coveralls report' href='https://coveralls.io/github/WenjieDu/PyCorruptor'>
<img src='https://img.shields.io/coverallsCoverage/github/WenjieDu/PyCorruptor?branch=main&logo=coveralls'>
<a href="https://github.com/WenjieDu/PyCorruptor/releases">
<img alt="the latest release version" src="https://img.shields.io/github/v/release/wenjiedu/PyCorruptor?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white">
</a>
<a href="https://anaconda.org/conda-forge/pycorruptor">
<img alt="Conda downloads" src="https://img.shields.io/conda/dn/conda-forge/pycorruptor?label=Conda%20Downloads&color=AED0ED&logo=anaconda&logoColor=white">
<a href="https://github.com/WenjieDu/PyCorruptor/blob/main/LICENSE">
<img alt="GPL-v3 license" src="https://img.shields.io/badge/License-GPL--v3-E9BB41?logo=opensourceinitiative&logoColor=white">
</a>
<a href="https://pypi.org/project/pycorruptor">
<img alt="PyPI downloads" src="https://static.pepy.tech/personalized-badge/pycorruptor?period=total&units=international_system&left_color=grey&right_color=blue&left_text=PyPI%20Downloads">
<a href='https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml'>
<img alt='GitHub Testing' src='https://img.shields.io/github/actions/workflow/status/wenjiedu/PyCorruptor/testing_ci.yml?logo=github&color=C8D8E1&label=CI'>
</a>
<a href="https://codeclimate.com/github/WenjieDu/PyCorruptor">
<img alt="Code Climate maintainability" src="https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyCorruptor?color=3C7699&label=Maintainability&logo=codeclimate">
</a>
<a href='https://coveralls.io/github/WenjieDu/PyCorruptor'>
<img alt='Coveralls report' src='https://img.shields.io/coverallsCoverage/github/WenjieDu/PyCorruptor?branch=main&logo=coveralls&color=75C1C4&label=Coverage'>
</a>
<a href="https://anaconda.org/conda-forge/PyCorruptor">
<img alt="Conda downloads" src="https://img.shields.io/conda/dn/conda-forge/PyCorruptor?label=Conda%20Downloads&color=AED0ED&logo=anaconda&logoColor=white">
</a>
<a href='https://pepy.tech/project/PyCorruptor'>
<img alt='PyPI download number' src='https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FWenjieDu%2FWenjieDu%2Fmain%2Ffigs%2Fprojects%2Fpycorruptor_downloads.json'>
</a>
<!-- Visit number -->

</p>

<a href='https://github.com/WenjieDu/PyPOTS'><img src='https://raw.githubusercontent.com/PyPOTS/pypots.github.io/main/static/figs/pypots_logos/PyPOTS_logo_FFBG.svg?sanitize=true' width='160' align='left' /></a>
Expand Down Expand Up @@ -64,5 +69,5 @@ or
<details>
<summary>🏠 Visits</summary>
<img align='left' src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits&edge_flat=false'>
<img align='left' src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits+since+May+2022&edge_flat=false'>
</details>
36 changes: 27 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,35 @@ Welcome to PyCorruptor's documentation!
:target: https://github.com/WenjieDu/PyCorruptor
:alt: PyCorruptor logo

.. centered:: A Python Toolbox for Data Corruption
**A Python Toolbox for Data Corruption**

.. image:: https://img.shields.io/badge/python-v3-yellowgreen
.. image:: https://img.shields.io/badge/python-v3-E97040?logo=python&logoColor=white
:alt: Python version
.. image:: https://img.shields.io/pypi/v/pycorruptor?color=green&label=PyPI
:alt: PyPI version
:target: https://pypi.org/project/pycorruptor
.. image:: https://static.pepy.tech/personalized-badge/pycorruptor?period=total&units=none&left_color=gray&right_color=blue&left_text=Total%20Downloads
:alt: PyPI download number
.. image:: https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits&edge_flat=false
:alt: Visit number
.. image:: https://img.shields.io/github/v/release/wenjiedu/pycorruptor?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white
:alt: the latest release version
:target: https://img.shields.io/github/v/release/wenjiedu/pycorruptor?color=EE781F&include_prereleases&label=Release&logo=github&logoColor=white
.. image:: https://img.shields.io/badge/License-GPL--v3-E9BB41?logo=opensourceinitiative&logoColor=white
:alt: License
:target: https://github.com/WenjieDu/PyCorruptor/blob/main/LICENSE
.. image:: https://img.shields.io/github/actions/workflow/status/wenjiedu/pycorruptor/testing_ci.yml?logo=github&color=C8D8E1&label=CI
:alt: GitHub Testing
:target: https://github.com/WenjieDu/PyCorruptor/actions/workflows/testing_ci.yml
.. image:: https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyCorruptor?color=3C7699&label=Maintainability&logo=codeclimate
:alt: Code Climate maintainability
:target: https://codeclimate.com/github/WenjieDu/PyCorruptor
.. image:: https://img.shields.io/coverallsCoverage/github/WenjieDu/PyCorruptor?branch=main&logo=coveralls&color=75C1C4&label=Coverage
:alt: Coveralls report
:target: https://coveralls.io/github/WenjieDu/PyCorruptor
.. image:: https://img.shields.io/conda/dn/conda-forge/pycorruptor?label=Conda%20Downloads&color=AED0ED&logo=anaconda&logoColor=white
:alt: Conda downloads
:target: https://anaconda.org/conda-forge/pypots
.. image:: https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FWenjieDu%2FWenjieDu%2Fmain%2Ffigs%2Fprojects%2Fpycorruptor_downloads.json
:alt: PyPI downloads
:target: https://pepy.tech/project/pycorruptor
.. image:: https://img.shields.io/badge/Contributor%20Covenant-v2.1-4baaaa.svg
:alt: CODE of CONDUCT
.. image:: https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FWenjieDu%2FPyCorruptor&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits+since+May+2022&edge_flat=false
:alt: Visit num

In data analysis and modeling, sometimes we may need to corrupt the original data to achieve our goal, for instance, evaluating models' ability to reconstruct corrupted data or assessing the model's performance on only partially-observed data. PyCorruptor is such a tool to help you corrupt your data, which provides several patterns to create missing values in the given data.

Expand Down
11 changes: 0 additions & 11 deletions docs/pycorruptor.tests.rst

This file was deleted.

7 changes: 3 additions & 4 deletions pycorruptor/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
PyCorruptor package
PyCorruptor package.
"""

# Created by Wenjie Du <[email protected]>
Expand All @@ -24,12 +24,11 @@
__version__ = "0.0.4"

try:
from pycorruptor.corrupt import (
from pycorruptor.mcar import mcar
from pycorruptor.utils import (
cal_missing_rate,
masked_fill,
mcar,
)

except Exception as e:
print(e)

Expand Down
43 changes: 43 additions & 0 deletions pycorruptor/mar.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""
Corrupt data by adding missing values to it with MAR (missing at random) pattern.
"""

# Created by Wenjie Du <[email protected]>
# License: GLP-v3


def mar(X, rate, nan=0):
"""Create random missing values (MAR case).
Parameters
----------
X : array,
Data vector. If X has any missing values, they should be numpy.nan.
rate : float, in (0,1),
Artificially missing rate, rate of the observed values which will be artificially masked as missing.
Note that,
`rate` = (number of artificially missing values) / np.sum(~np.isnan(self.data)),
not (number of artificially missing values) / np.product(self.data.shape),
considering that the given data may already contain missing values,
the latter way may be confusing because if the original missing rate >= `rate`,
the function will do nothing, i.e. it won't play the role it has to be.
nan : int/float, optional, default=0
Value used to fill NaN values.
Returns
-------
"""
# TODO: Create missing values in MAR case
raise NotImplementedError("MAR case has not been implemented yet.")


def _mar_numpy(X, rate, nan=0):
pass


def _mar_torch(X, rate, nan=0):
pass
144 changes: 6 additions & 138 deletions pycorruptor/corrupt.py → pycorruptor/mcar.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Corrupt data by adding missing values to it with optional missing patterns (MCAR,MAR,MNAR).
Corrupt data by adding missing values to it with MCAR (missing completely at random) pattern.
"""

# Created by Wenjie Du <[email protected]>
Expand All @@ -13,89 +13,6 @@
pass


def cal_missing_rate(X):
"""Calculate the originally missing rate of the raw data.
Parameters
----------
X : array-like,
Data array that may contain missing values.
Returns
-------
originally_missing_rate, float,
The originally missing rate of the raw data.
"""
if isinstance(X, list):
X = np.asarray(X)

if isinstance(X, np.ndarray):
originally_missing_rate = np.sum(np.isnan(X)) / np.product(X.shape)
elif isinstance(X, torch.Tensor):
originally_missing_rate = torch.sum(torch.isnan(X)) / np.product(X.shape)
originally_missing_rate = originally_missing_rate.item()
else:
raise TypeError(
"X must be type of list/numpy.ndarray/torch.Tensor, " f"but got {type(X)}"
)

return originally_missing_rate


def masked_fill(X, mask, val):
"""Like torch.Tensor.masked_fill(), fill elements in given `X` with `val` where `mask` is True.
Parameters
----------
X : array-like,
The data vector.
mask : array-like,
The boolean mask.
val : float
The value to fill in with.
Returns
-------
array,
mask
"""
assert X.shape == mask.shape, (
"Shapes of X and mask must match, "
f"but X.shape={X.shape}, mask.shape={mask.shape}"
)
assert isinstance(X, type(mask)), (
"Data types of X and mask must match, " f"but got {type(X)} and {type(mask)}"
)

if isinstance(X, list):
X = np.asarray(X)
mask = np.asarray(mask)

if isinstance(X, np.ndarray):
mask = mask.astype(bool)
X[mask] = val
elif isinstance(X, torch.Tensor):
mask = mask.type(torch.bool)
X[mask] = val
else:
raise TypeError(
"X must be type of list/numpy.ndarray/torch.Tensor, " f"but got {type(X)}"
)

return X


def little_mcar_test(X):
"""Little's MCAR Test.
Refer to :cite:`little1988TestMCAR`
"""
# TODO: Little's MCAR test
raise NotImplementedError("MCAR test has not been implemented yet.")


def mcar(X, rate, nan=0):
"""Create completely random missing values (MCAR case).
Expand Down Expand Up @@ -195,59 +112,10 @@ def _mcar_torch(X, rate, nan=0):
return X_intact, X, missing_mask, indicating_mask


def mar(X, rate, nan=0):
"""Create random missing values (MAR case).
Parameters
----------
X : array,
Data vector. If X has any missing values, they should be numpy.nan.
rate : float, in (0,1),
Artificially missing rate, rate of the observed values which will be artificially masked as missing.
Note that,
`rate` = (number of artificially missing values) / np.sum(~np.isnan(self.data)),
not (number of artificially missing values) / np.product(self.data.shape),
considering that the given data may already contain missing values,
the latter way may be confusing because if the original missing rate >= `rate`,
the function will do nothing, i.e. it won't play the role it has to be.
nan : int/float, optional, default=0
Value used to fill NaN values.
Returns
-------
"""
# TODO: Create missing values in MAR case
raise NotImplementedError("MAR case has not been implemented yet.")


def mnar(X, rate, nan=0):
"""Create not-random missing values (MNAR case).
Parameters
----------
X : array,
Data vector. If X has any missing values, they should be numpy.nan.
rate : float, in (0,1),
Artificially missing rate, rate of the observed values which will be artificially masked as missing.
Note that,
`rate` = (number of artificially missing values) / np.sum(~np.isnan(self.data)),
not (number of artificially missing values) / np.product(self.data.shape),
considering that the given data may already contain missing values,
the latter way may be confusing because if the original missing rate >= `rate`,
the function will do nothing, i.e. it won't play the role it has to be.
nan : int/float, optional, default=0
Value used to fill NaN values.
Returns
-------
def little_mcar_test(X):
"""Little's MCAR Test.
Refer to :cite:`little1988TestMCAR`
"""
# TODO: Create missing values in MNAR case
raise NotImplementedError("MNAR case has not been implemented yet.")
# TODO: Little's MCAR test
raise NotImplementedError("MCAR test has not been implemented yet.")
Loading

0 comments on commit 5ed3214

Please sign in to comment.