Title: | Maximum Diversity Weighting |
---|---|
Description: | Dimension-reduction methods aim at defining a score that maximizes signal diversity. Three approaches, tree weight, maximum entropy weights, and maximum variance weights are provided. These methods are described in He and Fong (2019) <DOI:10.1002/sim.8212>. |
Authors: | Zonglin He [aut], Youyi Fong [cre] |
Maintainer: | Youyi Fong <[email protected]> |
License: | GPL-2 |
Version: | 2024.8-1 |
Built: | 2025-01-20 05:26:16 UTC |
Source: | https://github.com/cran/mdw |
asym.v.e produces estimated asymptotic covariance matrix of the first p-1 maximum entropy weights (because the p weights sum to 1).
asym.v.e(X, w, h)
asym.v.e(X, w, h)
X |
n by p maxtrix containing observations of p biomarkers of n subjects. |
w |
maximum entropy weights for dateset X with bandwidth h used |
h |
bandwidth for kernel density estimation. |
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) h = 1 w <- entropy.weight(X,h) asym.v.e(X,w,h)
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) h = 1 w <- entropy.weight(X,h) asym.v.e(X,w,h)
asym.v.v produces estimated asymptotic covariance matrix of the first p-1 maximum variance weights (because the p weights sum to 1).
asym.v.v(X, w)
asym.v.v(X, w)
X |
n by p maxtrix containing observations of p biomarkers of n subjects. |
w |
maximum variance weights for dateset X |
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) w <- var.weight(X) asym.v.v(X,w)
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) w <- var.weight(X) asym.v.v(X,w)
entropy.weight produces a set of weights that maximizes the total weighted entropy of the distribution of different biomarkers within each subject, values of biomarkers can be either continuous or categorical.
entropy.weight(X, h)
entropy.weight(X, h)
X |
n by p maxtrix containing observations of p biomarkers of n subjects. |
h |
bandwidth for kernel density estimation. if data is categorical, set to 'na'. |
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) set.seed(1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) entropy.weight(X, h=1) ### # a three categorical biomarkers dataset set.seed(1) tmp=mvrnorm(n=10,mu=c(0,0,0),Sigma = diag(3)) dat=t(apply(tmp, 1, function(x) cut(x,c(-Inf,-0.5,0.5,Inf),labels=1:3))) entropy.weight(dat,h='na')
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) set.seed(1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) entropy.weight(X, h=1) ### # a three categorical biomarkers dataset set.seed(1) tmp=mvrnorm(n=10,mu=c(0,0,0),Sigma = diag(3)) dat=t(apply(tmp, 1, function(x) cut(x,c(-Inf,-0.5,0.5,Inf),labels=1:3))) entropy.weight(dat,h='na')
get.bw applies a specified bandwidth selection method to the dataset subject-wisely and return the median of the n selected bandwidths as the choice of bandwidth for entropy.weight.
get.bw(x, bw = c("nrd", "ucv", "bcv", "SJ"), nb)
get.bw(x, bw = c("nrd", "ucv", "bcv", "SJ"), nb)
x |
n by p maxtrix containing observations of p biomarkers of n subjects. |
bw |
bandwidth selectors of nrd, ucv, bcv, and SJ corresponding to R functions bw.nrd, bw.ucv, bw.bcv, and bw.SJ. |
nb |
number of bins to use, 'na' if bw='nrd' |
library(MASS) # a ten biomarkers dataset generated from independent normal(0,1) x = mvrnorm(n = 100, mu=rep(0,10), Sigma=diag(10), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) get.bw(x,bw='ucv',nb=100) get.bw(x,bw='nrd',nb='na')
library(MASS) # a ten biomarkers dataset generated from independent normal(0,1) x = mvrnorm(n = 100, mu=rep(0,10), Sigma=diag(10), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) get.bw(x,bw='ucv',nb=100) get.bw(x,bw='nrd',nb='na')
pca.weight produce the coefficients of the first principal compoment
pca.weight(emp.cor)
pca.weight(emp.cor)
emp.cor |
empirical correlation matrix of the dataset |
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) emp.cor <- cor(X) pca.weight(emp.cor)
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) emp.cor <- cor(X) pca.weight(emp.cor)
tree.weight
Produce a set of weights for different end points based on a correlation matrix using the GSC tree method
tree.weight (cor.mat, method="GSC", clustering.method="average", plot=TRUE, orientation=c("vertical","horizontal"), ...)
tree.weight (cor.mat, method="GSC", clustering.method="average", plot=TRUE, orientation=c("vertical","horizontal"), ...)
cor.mat |
a matrix, correlation matrix |
method |
a string. GSC, implementation of Gerstein et al., is the only implemented currently |
clustering.method |
a string, how the bottom-up hierarchical clustering tree is built, is passed to hclust as the method parameter |
plot |
a Boolean, whether to plot the tree |
orientation |
vertical or horizontal |
... |
additional args |
A vector of weights that sum to 1.
Youyi Fong [email protected]
Gerstein, M., Sonnhammer, E., and Chothia, C. (1994), Volume changes in protein evolution. J Mol Biol, 236, 1067-78.
cor.mat=diag(rep(1,3)) cor.mat[1,2]<-cor.mat[2,1]<-0.9 cor.mat[1,3]<-cor.mat[3,1]<-0.1 cor.mat[2,3]<-cor.mat[3,2]<-0.1 tree.weight(cor.mat)
cor.mat=diag(rep(1,3)) cor.mat[1,2]<-cor.mat[2,1]<-0.9 cor.mat[1,3]<-cor.mat[3,1]<-0.1 cor.mat[2,3]<-cor.mat[3,2]<-0.1 tree.weight(cor.mat)
var.weight produces a set of weights that maximizes the total weighted variance of the distribution of different biomarkers within each subject.
var.weight(X, method = c("optim", "mosek"))
var.weight(X, method = c("optim", "mosek"))
X |
n by p maxtrix containing observations of p biomarkers of n subjects. |
method |
optim (default) using R constrOptim function from stats package for optimization, mosek using mosek function from Rmosek package for optimization |
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) # compute maximum variance weights using constrOptim for optimization var.weight(X) ## Not run: # need mosek installed # compute maximum variance weights using mosek for optimization library(Rmosek) var.weight(X,'mosek') ## End(Not run)
library(MASS) # a three biomarkers dataset generated from independent normal(0,1) X = mvrnorm(n = 100, mu=rep(0,3), Sigma=diag(3), tol = 1e-6, empirical = FALSE, EISPACK = FALSE) # compute maximum variance weights using constrOptim for optimization var.weight(X) ## Not run: # need mosek installed # compute maximum variance weights using mosek for optimization library(Rmosek) var.weight(X,'mosek') ## End(Not run)