Title: | Fast Algorithms for Large Scale Generalized Distance Weighted Discrimination |
---|---|
Description: | Solving large scale distance weighted discrimination. The main algorithm is a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method. See Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018) <doi:10.48550/arXiv.1604.05473> for more details. |
Authors: | Xin-Yee Lam [aut, cre], J.S. Marron [aut], Defeng Sun [aut], Kim-Chuan Toh [aut] |
Maintainer: | Xin-Yee Lam <[email protected]> |
License: | GPL-2 |
Version: | 0.2-0 |
Built: | 2024-11-04 21:42:37 UTC |
Source: | https://github.com/cran/DWDLargeR |
Solving large scale distance weighted discrimination. The main algorithm is a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method.
The package DWDLargeR
contains two main functions:penaltyParameter
genDWD
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
Solve the generalized DWD model by using a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method.
genDWD(X,y,C,expon, tol = 1e-5, maxIter = 2000, method = 1, printDetails = 0, rmzeroFea = 1, scaleFea = 1)
genDWD(X,y,C,expon, tol = 1e-5, maxIter = 2000, method = 1, printDetails = 0, rmzeroFea = 1, scaleFea = 1)
X |
A |
y |
A vector of length |
C |
A number representing the penalty parameter for the generalized DWD model. |
expon |
A positive number representing the exponent |
tol |
The stopping tolerance for the algorithm. (Default = 1e-5) |
maxIter |
Maximum iteration allowed for the algorithm. (Default = 2000) |
method |
Method for solving generalized DWD model. The default is set to be 1 for the highly efficient sGS-ADMM algorithm. User can also select |
printDetails |
Switch for printing details of the algorithm. Default is set to be 0 (not printing). |
rmzeroFea |
Switch for removing zero features in the data matrix. Default is set to be 1 (removing zero features). |
scaleFea |
Switch for scaling features in the data matrix. This is to make the features having roughly similar magnitude. Default is set to be 1 (scaling features). |
This is a symmetric Gauss-Seidel based alternating method of multipliers (sGS-ADMM) algorithm for solving the generalized DWD model of the following formulation:
subject to the constraints
where ,
is a given positive vector such that
, and
is a function defined by
if
and
if
.
A list consists of the result from the algorithm.
w |
The unit normal of hyperplane that distinguishes the two classes. |
beta |
The distance of the hyperplane to the origin ( |
xi |
A slack variable of length |
r |
The residual |
alpha |
Dual variable of the generalized DWD model. |
info |
A list consists of the information from the algorithm. |
runhist |
A list consists of the run history throughout the iterations. |
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
# load the data data("mushrooms") # calculate the best penalty parameter C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1) # solve the generalized DWD model result = genDWD(mushrooms$X,mushrooms$y,C,expon=1)
# load the data data("mushrooms") # calculate the best penalty parameter C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1) # solve the generalized DWD model result = genDWD(mushrooms$X,mushrooms$y,C,expon=1)
This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.
data(mushrooms)
data(mushrooms)
List containing a 112x8124 matrix of 8124 training samples with 112 features; and a vector of length 8124 training labels.
The data could be downloaded from the UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Mushroom
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Find the best penalty parameter for the generalized distance weighted discrimination (DWD) model.
penaltyParameter(X,y,expon,rmzeroFea = 1, scaleFea = 1)
penaltyParameter(X,y,expon,rmzeroFea = 1, scaleFea = 1)
X |
A |
y |
A vector of length |
expon |
A positive number representing the exponent |
rmzeroFea |
Switch for removing zero features in the data matrix. Default is set to be 1 (removing zero features). |
scaleFea |
Switch for scaling features in the data matrix. This is to make the features having roughly similar magnitude. Default is set to be 1 (scaling features). |
The best parameter is empirically found to be inversely proportional to the typical distance between different samples raised to the power of ().
It is also dependent on the sample size
and feature dimension
.
A number which represents the best penalty parameter for the generalized DWD model.
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
# load the data data("mushrooms") # calculate the best penalty parameter C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1)
# load the data data("mushrooms") # calculate the best penalty parameter C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1)