The increasing amounts of high-throughput biological datasets stimulate the information engineering and machine learning research community to direct more studies towards designing and applying novel methods which are sophisticated and specialised to tackle the problems that are specific in such datasets. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method tackles the problem of scrutinising multiple gene expression microarray datasets to identify the subsets of genes which are consistently co-expressed across them. It allows for clustering results which better reflect the biological fact that most of the genes in any cell are expected to be irrelevant to the specific context in hand, as well as the fact that many genes might participate in multiple processes. This has been achieved by clustering the given set of genes while allowing any gene to have any of the three eventualities, to be exclusively assigned to a single cluster, to be simultaneously assigned to multiple clusters, or not to be assigned to any of the clusters. In this study, we expand the scope of application of the Bi-CoPaM method by applying it, for the first time, to bacterial datasets, namely to a set of five Escherichia coli bacterial datasets generated under different biological conditions, in order to identify the subsets of genes which are consistently co-expressed, i.e. well correlated with each other. We identify two clusters with such consistent co-expression, and interestingly, they themselves are consistently negatively correlated with each other. The first cluster is enriched with genes participating in protein synthesis and DNA repair while the second is enriched with transporting genes. Consequently, we draw biological hypotheses that relate some of the genes with currently unknown biological processes to their potential processes. These hypotheses can serve as pilots for focused future gene discovery studies.
Bibliographical noteFunding Information:
This article summarises independent research funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004). The views expressed are those of the author (s) and not necessarily those of the NHS, the NIHR or the Department of Health. A.K. Nandi would like to thank TEKES for their award of the Finland Distinguished Professorship.
© 2014, Springer Science+Business Media New York.
- Consistent co-expression
- Escherichia coli
- Genome-wide analysis
- Multiple datasets analysis