Skip to content

Latest commit

 

History

History
161 lines (138 loc) · 3.56 KB

README.md

File metadata and controls

161 lines (138 loc) · 3.56 KB

Library modeled after preprocessing module of scikit-learn. This library intends to implement the following transformers

  • StandardScaler
  • MinMaxScaler
  • MaxAbsScaler
  • Binarizer
  • Normalizer
  • OneHotEncoder

TODO:

  • Generalize to handle 1D arrays
julia> using PreProcessing

julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1

julia> clf = fit(StandardScaler, x)
PreProcessing.StandardScaler{Float64}([-1.875, 3.75, 1.5, -1.875], [5.55512, 5.06828, 6.61438, 5.92532], 4, 1)

julia> xnew = transform(clf, x)
8×4 Array{Float64,2}:
  0.877569    0.838548   -0.52915   -1.37123
  0.877569    1.23316    -0.831522   0.822741
 -1.2826      0.0493264   1.13389   -0.527398
  0.337526   -0.147979    1.13389   -1.37123
  1.41761    -2.31834     0.377964  -0.358631
 -0.0225018   0.246632   -0.982708   1.49781
 -1.46261    -0.345285    1.13389    0.822741
 -0.742558    0.443937   -1.43626    0.485206

julia> inverse_transform(clf, xnew)
8×4 Array{Float64,2}:
   3.0   8.0  -2.0  -10.0
   3.0  10.0  -4.0    3.0
  -9.0   4.0   9.0   -5.0
   0.0   3.0   9.0  -10.0
   6.0  -8.0   4.0   -4.0
  -2.0   5.0  -5.0    7.0
 -10.0   2.0   9.0    3.0
  -6.0   6.0  -8.0    1.0
julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1

julia> clf = fit(MinMaxScaler, x, range_min=-4, range_max=4)
PreProcessing.MinMaxScaler{Float64,Int64}([-10.0, -8.0, -8.0, -10.0], [6.0, 10.0, 9.0, 7.0], -4, 4, 4, 1)

julia> xnew = transform(clf, x)
8×4 Array{Float64,2}:
 3.25  3.55556  1.41176   0.0    
 3.25  4.0      0.941176  3.05882
 0.25  2.66667  4.0       1.17647
 2.5   2.44444  4.0       0.0    
 4.0   0.0      2.82353   1.41176
 2.0   2.88889  0.705882  4.0    
 0.0   2.22222  4.0       3.05882
 1.0   3.11111  0.0       2.58824

julia> inverse_transform(clf, xnew)
8×4 Array{Float64,2}:
   3.0   8.0  -2.0  -10.0
   3.0  10.0  -4.0    3.0
  -9.0   4.0   9.0   -5.0
   0.0   3.0   9.0  -10.0
   6.0  -8.0   4.0   -4.0
  -2.0   5.0  -5.0    7.0
 -10.0   2.0   9.0    3.0
  -6.0   6.0  -8.0    1.0

julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1

julia> clf = fit(Binarizer, x)
PreProcessing.Binarizer{Int64}(0, 4, 1)

julia> xnew = transform(clf, x)
8×4 Array{Int64,2}:
 1  1  0  0
 1  1  0  1
 0  1  1  0
 0  1  1  0
 1  0  1  0
 0  1  0  1
 0  1  1  1
 0  1  0  1

julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1
  
julia> clf = fit(MaxAbsScaler, x)
MaxAbsScaler transformer with 4 features

julia> xnew = transform(clf, x)
8×4 Array{Float64,2}:
  0.3   0.8  -0.222222  -1.0
  0.3   1.0  -0.444444   0.3
 -0.9   0.4   1.0       -0.5
  0.0   0.3   1.0       -1.0
  0.6  -0.8   0.444444  -0.4
 -0.2   0.5  -0.555556   0.7
 -1.0   0.2   1.0        0.3
 -0.6   0.6  -0.888889   0.1

julia> inverse_transform(clf, xnew)
8×4 Array{Float64,2}:
   3.0   8.0  -2.0  -10.0
   3.0  10.0  -4.0    3.0
  -9.0   4.0   9.0   -5.0
   0.0   3.0   9.0  -10.0
   6.0  -8.0   4.0   -4.0
  -2.0   5.0  -5.0    7.0
 -10.0   2.0   9.0    3.0
  -6.0   6.0  -8.0    1.0