Rearrangement Clustering: Pitfalls, Remedies, and Applications

Sharlee Climer, Weixiong Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

<p> Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangementclustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many &filig;elds over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The &filig;rst pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It o&fflig;ers an extra bene&filig;t as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467-gene expression data clustering problem. As expected, our new algorithm identi&filig;es better clusters than those found by previous approaches in all &filig;ve cases. Overall, our results demonstrate the bene&filig;ts of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites.</p>
Original languageAmerican English
JournalDefault journal
DOIs
StatePublished - Jan 4 2005
Externally publishedYes

Keywords

  • Bond Energy Algorithm
  • Clustering
  • Restricted Partitioning
  • Traveling Salesman Problem
  • Visualization of Patterns in Data

Cite this