Title: | Maximum Likelihood Principal Component Analysis |
---|---|
Description: | R implementation of Maximum Likelihood Principal Component Analysis The main idea of this package is to have an alternative way of PCA for subspace modeling that considers measurement errors. More details can be found in Peter D. Wentzell (2009) <doi:10.1016/B978-0-444-64165-6.03029-9>. |
Authors: | Renan Santos Barbosa [aut, cre] |
Maintainer: | Renan Santos Barbosa <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2024-11-15 04:06:47 UTC |
Source: | https://github.com/renands/rmlpca |
A random covariance matrix to simulate data errors The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
cov_d
cov_d
A matrix with 20 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A random array of covariance matrices to simulate data errors The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
cov_e
cov_e
An array of dimension 20,20,30
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset generated by the rotation of a bivariate normal density, the method applied to get this dataset is described on Wentzell, P. D., and S. Hou. "Exploratory data analysis with noisy measurements." Journal of Chemometrics 26.6 (2012): 264-281.
data_clean
data_clean
A matrix with 300 rows and 20 columns
Wentzell, P. D., and S. Hou. "Exploratory data analysis with noisy measurements." Journal of Chemometrics 26.6 (2012): 264-281.
A dataset generated by the rotation of a bivariate normal density, the method applied to get this dataset is described on Wentzell, P. D., and S. Hou. "Exploratory data analysis with noisy measurements." Journal of Chemometrics 26.6 (2012): 264-281.
data_clean_e
data_clean_e
A matrix with 30 rows and 20 columns
Wentzell, P. D., and S. Hou. "Exploratory data analysis with noisy measurements." Journal of Chemometrics 26.6 (2012): 264-281.
A dataset where the values are estimated after mlpca_b is applied.
data_cleaned_mlpca_b
data_cleaned_mlpca_b
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where the values are estimated after mlpca_c is applied.
data_cleaned_mlpca_c
data_cleaned_mlpca_c
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where the values are estimated after mlpca_d is applied.
data_cleaned_mlpca_d
data_cleaned_mlpca_d
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where the values are estimated after mlpca_e is applied.
data_cleaned_mlpca_e
data_cleaned_mlpca_e
A matrix with 30 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where each column contain values from a normal density with mean = 0 and standard deviation from 0.2 to 1, the standard deviations differs in the column. The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
data_error_b
data_error_b
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where each column contain values from a normal density with mean = 0 and standard deviations simulated by a lognormal density with meanlog = -4.75 and sdlog = 2.5, all the standard deviations are different. The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
data_error_c
data_error_c
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where the values come from a 20 -multivariate normal density where all the means are 0 and the covariance matrix from cov_d. The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
data_error_d
data_error_d
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where the values come from a 20 -multivariate normal density where all the means are 0 and the covariance matrix from cov_e. The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
data_error_e
data_error_e
A matrix with 30 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
Performs maximum likelihood principal components analysis for mode B error conditions (independent errors, homoscedastic within a column). Equivalent to perfoming PCA on data scaled by the error SD, but results are rescaled to the original space.
mlpca_b(X, Xsd, p)
mlpca_b(X, Xsd, p)
X |
MxN matrix of measurements. |
Xsd |
MxN matrix of measurements error standard deviations. |
p |
Rank of the model's subspace, p must be than the minimum of M and N. |
The returned parameters, U, S and V, are analogs to the truncated SVD solution, but have somewhat different properties since they represent the MLPCA solution. In particular, the solutions for different values of p are not necessarily nested (the rank 1 solution may not be in the space of the rank 2 solution) and the eigenvectors do not necessarily account for decreasing amounts of variance, since MLPCA is a subspace modeling technique and not a variance modeling technique.
The parameters returned are the results of SVD on the estimated subspace. The quantity Ssq represents the sum of squares of weighted residuals. All the results are nested in a list format.
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
library(RMLPCA) data(data_clean) data(data_error_b) data(sds_b) # data that you will usually have on hands data_noisy <- data_clean + data_error_b # run mlpca_b with rank p = 2 results <- RMLPCA::mlpca_b( X = data_noisy, Xsd = sds_b, p = 2 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
library(RMLPCA) data(data_clean) data(data_error_b) data(sds_b) # data that you will usually have on hands data_noisy <- data_clean + data_error_b # run mlpca_b with rank p = 2 results <- RMLPCA::mlpca_b( X = data_noisy, Xsd = sds_b, p = 2 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
Performs maximum likelihood principal components analysis for mode C error conditions (independent errors, general heteroscedastic case). Employs ALS algorithm.
mlpca_c(X, Xsd, p, MaxIter = 20000)
mlpca_c(X, Xsd, p, MaxIter = 20000)
X |
MxN matrix of measurements |
Xsd |
MxN matrix of measurements error standard deviations |
p |
Rank of the model's subspace, p must be than the minimum of M and N |
MaxIter |
Maximum no. of iterations |
The returned parameters, U, S and V, are analogs to the truncated SVD solution, but have somewhat different properties since they represent the MLPCA solution. In particular, the solutions for different values of p are not necessarily nested (the rank 1 solution may not be in the space of the rank 2 solution) and the eigenvectors do not necessarily account for decreasing amounts of variance, since MLPCA is a subspace modeling technique and not a variance modeling technique.
The parameters returned are the results of SVD on the estimated subspace. The quantity Ssq represents the sum of squares of weighted residuals. ErrFlag indicates the convergence condition, with 0 indicating normal termination and 1 indicating the maximum number of iterations have been exceeded.
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
library(RMLPCA) data(data_clean) data(data_error_c) data(sds_c) # data that you will usually have on hands data_noisy <- data_clean + data_error_c # run mlpca_c with rank p = 5 results <- RMLPCA::mlpca_c( X = data_noisy, Xsd = sds_c, p = 2 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
library(RMLPCA) data(data_clean) data(data_error_c) data(sds_c) # data that you will usually have on hands data_noisy <- data_clean + data_error_c # run mlpca_c with rank p = 5 results <- RMLPCA::mlpca_c( X = data_noisy, Xsd = sds_c, p = 2 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
Performs maximum likelihood principal components analysis for mode D error conditions (commom row covariance matrices). Employs rotation and scaling of the original data.
mlpca_d(X, Cov, p)
mlpca_d(X, Cov, p)
X |
IxJ matrix of measurements |
Cov |
JxJ matrix of measurement error covariance, which is commom to all rows |
p |
Rank of the model's subspace |
The returned parameters, U, S and V, are analogs to the truncated SVD solution, but have somewhat different properties since they represent the MLPCA solution. In particular, the solutions for different values of p are not necessarily nested (the rank 1 solution may not be in the space of the rank 2 solution) and the eigenvectors do not necessarily account for decreasing amounts of variance, since MLPCA is a subspace modeling technique and not a variance modeling technique.
The parameters returned are the results of SVD on the estimated subspace. The quantity Ssq represents the sum of squares of weighted residuals.
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
library(RMLPCA) data(data_clean) data(data_error_d) # covariance matrix data(cov_d) data(data_cleaned_mlpca_d) # data that you will usually have on hands data_noisy <- data_clean + data_error_d # run mlpca_c with rank p = 5 results <- RMLPCA::mlpca_d( X = data_noisy, Cov = cov_d, p = 2 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
library(RMLPCA) data(data_clean) data(data_error_d) # covariance matrix data(cov_d) data(data_cleaned_mlpca_d) # data that you will usually have on hands data_noisy <- data_clean + data_error_d # run mlpca_c with rank p = 5 results <- RMLPCA::mlpca_d( X = data_noisy, Cov = cov_d, p = 2 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
Performs maximum likelihood principal components analysis for mode E error conditions (correlated errors, with a different covariance matrix for each row, but no error correlation between the rows). Employs an ALS algorithm.
mlpca_e(X, Cov, p, MaxIter = 20000)
mlpca_e(X, Cov, p, MaxIter = 20000)
X |
IxJ matrix of measurements |
Cov |
JXJXI matrices of measurement error covariance |
p |
Rank of the model's subspace, p must be than the minimum of I and J |
MaxIter |
Maximum no. of iterations |
The returned parameters, U, S and V, are analogs to the truncated SVD solution, but have somewhat different properties since they represent the MLPCA solution. In particular, the solutions for different values of p are not necessarily nested (the rank 1 solution may not be in the space of the rank 2 solution) and the eigenvectors do not necessarily account for decreasing amounts of variance, since MLPCA is a subspace modeling technique and not a variance modeling technique.
The parameters returned are the results of SVD on the estimated subspace. The quantity Ssq represents the sum of squares of weighted residuals. ErrFlag indicates the convergence condition, with 0 indicating normal termination and 1 indicating the maximum number of iterations have been exceeded.
Renan Santos Barbosa
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
library(RMLPCA) data(data_clean_e) data(data_error_e) # covariance matrix data(cov_e) data(data_cleaned_mlpca_e) # data that you will usually have on hands data_noisy <- data_clean_e + data_error_e # run mlpca_e with rank p = 1 results <- RMLPCA::mlpca_e( X = data_noisy, Cov = cov_e, p = 1 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
library(RMLPCA) data(data_clean_e) data(data_error_e) # covariance matrix data(cov_e) data(data_cleaned_mlpca_e) # data that you will usually have on hands data_noisy <- data_clean_e + data_error_e # run mlpca_e with rank p = 1 results <- RMLPCA::mlpca_e( X = data_noisy, Cov = cov_e, p = 1 ) # estimated clean dataset data_cleaned_mlpca <- results$U %*% results$S %*% t(results$V)
The RMLPCA package provides four algorithms that to deals with measurement errors
A dataset where each column contain the standard deviations from 0.2 to 1 that is necessary to run mlpca_b. The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
sds_b
sds_b
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
A dataset where each value come from a lognormal density with meanlog = -4.75 and sdlog = 2.5. The main ideia is described in figure 3 on Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.
sds_c
sds_c
A matrix with 300 rows and 20 columns
Wentzell, P. D. "Other topics in soft-modeling: maximum likelihood-based soft-modeling methods." (2009): 507-558.