IJPAM: Volume 86, No. 1 (2013)

CLUSTERING CONSTRAINED BY DEPENDENCIES

David R. Easterling1, Naren Ramakrishnan2, Layne T. Watson3
1,2,3Department of Computer Science
Virginia Polytechnic Institute and State University
Blacksburg, VA, 24061, USA
3Department of Mathematics
Virginia Polytechnic Institute and State University
Blacksburg, VA, 24061, USA


Abstract. Clustering is the unsupervised method of grouping data samples to form a partition of a given dataset. Such grouping is typically done based on homogeneity assumptions of clusters over an attribute space and hence the precise definition of the similarity metric affects the clusters inferred. In recent years, new formulations of clustering have emerged that posit indirect constraints on clustering, typically in terms of preserving dependencies between data samples and auxiliary variables. These formulations find applications in bioinformatics, web mining, social network analysis, and many other domains. The purpose of this survey is to provide a gentle introduction to these formulations, their mathematical assumptions, and the contexts under which they are applicable.

Received: April 9, 2013

AMS Subject Classification: 65K05, 90C30, 94A15

Key Words and Phrases: auxiliary information, data compression, data mining, information theory, optimization, unsupervised learning

Download paper from here.



DOI: 10.12732/ijpam.v86i1.11 How to cite this paper?
Source:
International Journal of Pure and Applied Mathematics
ISSN printed version: 1311-8080
ISSN on-line version: 1314-3395
Year: 2013
Volume: 86
Issue: 1