Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights

Basel Abu-Jamous, Chao Liu, David J. Roberts, Elvira Brattico, Asoke K. Nandi*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope of information contained in these datasets can usually answer much broader questions than what was originally intended. Moreover, many existing big datasets are related to each other but have different detailed specifications, and the mutual information that can be extracted from them collectively has been not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field-specific novel findings which can be revealed from the data without being driven by specific questions or hypotheses. To realise this paradigm, we introduced the binarisation of consensus partition matrices (Bi- CoPaM) method, with the ability of analysing collections of heterogeneous big datasets to identify clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression datasets identified a novel cluster of genes and some new biological hypotheses regarding their function and regulation. In the other application, the analysis of 1,856 big fMRI datasets identified three functionally connected neural networks related to visual, reward and auditory systems during affective processing. These experiments reveal the broad applicability of this paradigm to various fields, and thus encourage exploring the large amounts of partially exploited existing datasets, preferably as collections of related datasets, with a similar approach.

Original languageEnglish
Title of host publicationFrontiers in Electronic Technologies - Trends and Challenges
EditorsS.R.S. Prabaharan, V.S.Kanchana Bhaaskaran, Nadia Magnenat Thalmann
PublisherSpringer Verlag
Pages25-53
Number of pages29
ISBN (Print)9789811042348
DOIs
Publication statusPublished - 2017
Externally publishedYes
EventInternational Conference on NextGen Electronic Technologies : Silicon to Software, ICNETS2 2017 - Chennai, India
Duration: 23 Mar 201725 Mar 2017

Publication series

NameLecture Notes in Electrical Engineering
Volume433
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

ConferenceInternational Conference on NextGen Electronic Technologies : Silicon to Software, ICNETS2 2017
Country/TerritoryIndia
CityChennai
Period23/03/1725/03/17

Bibliographical note

Publisher Copyright:
© Springer Nature Singapore Pte Ltd. 2017.

Keywords

  • Bi-CoPaM
  • Data-driven analysis
  • Gene expression
  • Heterogeneous datasets
  • Tunable consensus clustering
  • fMRI

Fingerprint

Dive into the research topics of 'Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights'. Together they form a unique fingerprint.

Cite this