The following R software is freely available.
pgmm: Parsimonious Gaussian Mixture Models
Performs model-based clustering and classification using parsimonious Gaussian mixture models. The mixture of factor analyzers and mixture of probabilistic principal components analyzers models are special cases.
Available for download on CRAN.
Relevant reports and papers:
- ◆ McNicholas, P.D., ElSherbiny, A., McDaid, A.F. and Murphy, T.B. (2019), pgmm: Parsimonious Gaussian mixture models. R package version 1.2.4.
- ◆ McNicholas, P.D. and Murphy, T.B. (2010), 'Model-based clustering of microarray expression data via latent Gaussian mixture models', Bioinformatics 26(21), 2705-2712. [doi]
- ◆ McNicholas, P.D. (2010), 'Model-based classification using latent Gaussian mixture models', Journal of Statistical Planning and Inference 140(5), 1175-1181. [doi]
- ◆ McNicholas, P. D., Murphy, T. B., McDaid, A. F. and Frost, D. (2010), 'Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models', Computational Statistics and Data Analysis 54(3), 711-723. [doi]
- ◆ McNicholas, P.D. and Murphy, T.B. (2008), 'Parsimonious Gaussian mixture models', Statistics and Computing 18(3), 285-296. [doi]
oclust: Gaussian Model-Based Clustering with Outliers
Performs model-based clustering while accounting for outliers without requiring pre-specification of the number of outliers.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Clark, K.M. and McNicholas, P.D. (2019), Gaussian model-based clustering with outliers. R package version 0.1.0.
- ◆ Clark, K.M. and McNicholas, P.D. (2019), 'Using subset log-likelihoods to trim outliers in Gaussian mixture models'. arXiv preprint arXiv:1907.01136v2.
ClickClustCont: Mixtures of Continuous Time Markov Models
Performs model-based clustering for clickstream data.
Available for download on CRAN .
Relevant reports and papers:
- ◆ Gallaugher, M.P.B. and McNicholas, P.D. (2019), ClickClustCont: Mixtures of continuous time Markov models. R package version 0.1.7.
- ◆ Gallaugher, M.P.B. and McNicholas, P.D. (2018), 'Clustering and semi-supervised classification for clickstream data via mixture models'. arXiv preprint arXiv:1802.04849 .
mixSPE: Mixtures of Power Exponential and Skew Power Exponential Distributions for Use in Model-Based Clustering and Classification
Performs model-based clustering and classification via mixtures of multivariate skew power exponential and power exponential distributions.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Dang, U.J., Browne, R.P. and McNicholas, P.D. (2019), mixSPE: Mixtures of power exponential and skew power exponential distributions for use in model-based clustering and classification. R package version 0.1.1.
- ◆ Dang, U.J., Gallaugher, M.P.B., Browne, R.P. and McNicholas, P.D. (2019), Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions'. arXiv preprint arXiv:1907.01938
- ◆ Dang, U.J., Browne, R.P. and McNicholas, P.D. (2015), 'Mixtures of multivariate power exponential distributions', Biometrics 71(4), 1081-1089. [doi]
longclust: Model-Based Clustering and Classification for Longitudinal Data
Performs model-based clustering and classification for longitudinal data. A modified Cholesky decomposition is used and there is the option to use a linear mode for the mean. The default model is a mixture of multivariate t-distributions but a mixture of Gaussian distributions is also available.
Available for download on CRAN.
Relevant reports and papers:
- ◆ McNicholas, P.D., Jampani, K.R. and Subedi, S. (2019). longclust: Model-based clustering and classification for longitudinal data. R package version 1.2.3.
- ◆ McNicholas, P.D. and Subedi, S. (2012), 'Clustering gene expression time course data using mixtures of multivariate t-distributions', Journal of Statistical Planning and Inference 142(5), 1114-1127. [doi]
- ◆ McNicholas, P.D. and Murphy, T.B. (2010), 'Model-based clustering of longitudinal data', The Canadian Journal of Statistics 38(1), 153-168. [doi]
MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions
Performs model-based clustering, classification, and discriminant analysis using approaches based on generalized hyperbolic mixture models. Several approaches are implemented, including a mixture of generalized hyperbolic factor analyzers, a mixture of multiple scaled hyperbolic distributions, and a mixture of coalesced generalized hyperbolic distributions.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Tortora, C., Browne, R.P., Franczak, B.C. and McNicholas, P.D. (2019), MixGHD: Model based clustering, classification and discriminant analysis using the mixture of generalized hyperbolic distributions. Rpackage version 2 .3.2.
- ◆ Tortora, C., Franczak, B.C., Browne, R.P. and McNicholas, P.D. (2017), 'A mixture of coalesced generalized hyperbolic distributions'. arXiv preprint arXiv: 1403.2332
- ◆ Tortora, C., McNicholas, P.D. and Browne, R.P. (2016) , 'A mixture of generalized hyperbolic factor analyzers', Advances in Data Analysis and Classification 10(4), 423-440. [doi]
- ◆ Browne, R.P. and McNicholas, P.D. (2015), 'A mixture of generalized hyperbolic distributions', The Canadian Journal of Statistics 43(2), 176-198. [doi]
teigen: Model-Based Clustering and Classification with the Multivariate t-Distribution
Performs model-based clustering and classification using the teigen family of mixtures of multivariate t-distributions. An eigen-decomposed component covariance structure is used.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Andrews, J.L., Wickins, J.R., Boers, N.M. and McNicholas, P.D. (2018). teigen: Model-based clustering and classification with the multivariate t-distribution. R package version 2.2.2.
- ◆ Andrews, J.L, Wickins, J.R., Boers, N.M. and McNicholas, P.D. (2018), 'teigen: An R package for model-based clustering and classification via the multivariate t distribution', Journal of Statistical Software 83 :7. [doi] (open access)
- ◆ Andrews, J.L. and McNicholas, P.D. (2012), 'Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions', Statistics and Computing 22(5), 1021-1029. [doi]
mixture: Mixture Models for Clustering and Classification
Performs model-based clustering and classification using Gaussian parsimonious mixture models. All 14 mixture models used by Celeux and Goavert (1995) are implemented.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Browne, R.P., ElSherbiny, A. and McNicholas, P.D. (2018). mixture: Mixture models for clustering and classification. R package version 1.5.
- ◆ Browne, R.P. and McNicholas, P.D. (2014), 'Estimating common principal components in high dimensions', Advances in Data Analysis and Classification 8(2), 217-226. [doi]
- ◆ Celeux, G. and Govaert, G. (1995), ‘Gaussian parsimonious clustering models’, Pattern Recognition 28(5), 781-793. [doi]
ContaminatedMixt: Model-Based Clustering and Classification with the Multivariate Contaminated Normal Distribution
Performs model-based clustering and classification with the multivariate contaminated normal distribution.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Punzo A, Mazza A, McNicholas PD. (2018). ContaminatedMixt: Model-based clustering and classification with the multivariate contaminated normal distribution. R package version 1.3.2.
- ◆ Punzo, A. and McNicholas, P.D. (2016), 'Parsimonious mixtures of multivariate contaminated normal distributions', Biometrical Journal 58(6), 1506-1537. [doi]
FPDclustering: PD-Clustering and Factor PD-Clustering
Performs probabilistic distance (PD) clustering and factor PD-clustering .
Available for download on CRAN.
Relevant reports and papers:
- ◆ Tortora, C. and McNicholas, P.D. (2017). FPDC: PD-clustering and factor PD-clustering. R package version 1.2.
sensory: Simultaneous Model-Based Clustering and Imputation via a Progressive Expectation-Maximization Algorithm
An implementation of the CUU PGMM model with missing data.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Franczak, B.C., Browne, R.P. and McNicholas, P.D. (2016). sensory: Simultaneous model-based clustering and imputation via a progressive expectation-maximization algorithm. R package version 1.1.
pmcgd: Parsimonious Mixtures of Contaminated Gaussian Distributions
Performs robust model-based clustering via mixtures of contaminated Gaussian distributions .
Available for download on CRAN.
Relevant reports and papers:
- ◆ Punzo, A. and McNicholas, P.D. (2013). pmcgd: Parsimonious Mixtures of Contaminated Gaussian Distributions. R package version 1 .1.
vscc: Variable Selection for Clustering and Classification
Performs variable selection for model-based clustering and classification.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Andrews, J.L. and McNicholas, P.D. (2013), vscc: Variable selection for clustering and classification. R package version 0.2.
- ◆ Andrews, J.L. and McNicholas, P.D. (2014), 'Variable selection for clustering and classification’, Journal of Classification 31(2), 136-153. [doi]
VLF: Frequency Matrix Approach for Assessing Very Low Frequency Variants in Sequence Records
Using frequency matrices, very low frequency variants are assessed for amino acid and nucleotide sequences.
Available for download on CRAN.
Relevant reports and papers:
- ◆ Athey, T.B.T. and McNicholas, P.D. (2013). VLF: Frequency matrix approach for assessing very low frequency variants in sequence records. R package version 1.0.