TY - JOUR
T1 - A custom correlation coefficient (CCC) approach for fast identification of multi-SNP association patterns in genome-wide SNPs data.
AU - Climer, Sharlee
AU - Yang, Wei
AU - de las Fuentes, Lisa
AU - Dávila-Román, Victor G.
AU - Gu, C. Charles
N1 - Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of Custom Correlation Coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously.
PY - 2014/1/11
Y1 - 2014/1/11
N2 - Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of Custom Correlation Coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns. This method identified 42 candidate multi-SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (6 genes) included 13 in SLC8A1 (aka NCX1 , an essential component of cardiac excitation-contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1 . While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1 , modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease-associated multi-SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.
AB - Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of Custom Correlation Coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns. This method identified 42 candidate multi-SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (6 genes) included 13 in SLC8A1 (aka NCX1 , an essential component of cardiac excitation-contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1 . While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1 , modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease-associated multi-SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.
KW - custom correlation coefficient
KW - gene-gene interaction
KW - genome-wide interactions study (GWIS)
KW - multi-SNP association
KW - network analysis
UR - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4190009/
U2 - 10.1002/gepi.21833
DO - 10.1002/gepi.21833
M3 - Article
VL - 38
JO - Genetic Epidemiology
JF - Genetic Epidemiology
ER -