Take a walk and cluster genes: a TSP-based approach to optimal rearrangement clustering

Sharlee Climer, Weixiong Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

Cluster analysis is a fundamental problem and technique in many areas related to machine learning. In this paper, we consider rearrangement clustering, which is the problem of finding sets of objects that share common or similar features by arranging the rows (objects) of a matrix (specifying object features) in such a way that adjacent objects are similar to each other (based on a similarity measure of the features) so as to maximize the overall similarity. Based on formulating this problem as the Traveling Salesman Problem (TSP), we develop a new TSP-based optimal clustering algorithm called TSPCluster. We overcome a flaw that is inherent in previous approaches by relaxing restrictions on dissimilarities between clusters. Our new algorithm has three important features: finding the optimal  k  clusters for a given  k , automatically detecting cluster borders, and ascertaining a set of most viable clustering results that make good balances among maximizing the overall similarity within clusters and dissimilarity between clusters. We apply TSPCluster to cluster and display ~500 genes of flowering plant  Arabidopsis  which are regulated under various abiotic stress conditions. We compare TSPCluster to the bond energy algorithm and two existing clustering algorithms. Our TSPCluster code is available at (Climer & Zhang, 2004).
Original languageAmerican English
JournalInternational Conference on Machine Learning
DOIs
StatePublished - Jul 4 2004
Externally publishedYes

Disciplines

  • Mathematics
  • Artificial Intelligence and Robotics

Cite this