Data Mining for Predictive Proteomics

Graham Ball*, Ali Al-Shahib

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


Mass spectrometry based proteomic methods generate a large quantity of complex highly dimensional data, particularly when used to analyse intact biological samples. Analysis of this data requires approaches that can cope with this complexity and dimensionality. The main goal of data mining approaches are to develop a numerical classification of samples, addressing a particular biological question and using a parsimonious set of features from the data. To achieve this, a pipeline is followed starting from raw spectral data to a classification model based on a shortlist of features. In this chapter, we will follow this pipeline, outline the methodologies involved and provide examples of their application in proteomics.

Original languageEnglish
Title of host publicationMass Spectrometry for Microbial Proteomics
PublisherJohn Wiley and Sons
Number of pages14
ISBN (Print)9780470681992
Publication statusPublished - 15 Jun 2010


  • Artificial Neural Networks (ANNs)-algorithms determining solution to problems iteratively fitting a complex function (model) to data
  • Biological sample analysis using mass spectrometry (MS) based methods-number of challenges
  • Data mining for predictive proteomics
  • Feature selection method-Support Vector Machines Recursive Feature Elimination (SVMRFE)
  • Noise reduction, baseline removal and normalization
  • Peptide spectrum match (PSM)-collection of statistics
  • Pre-processing MS data
  • Proteomics data mining workflow
  • Statistical analysis of 2D gels and mass spectral data analysis
  • Vectors explaining variation in data-using principal component analysis


Dive into the research topics of 'Data Mining for Predictive Proteomics'. Together they form a unique fingerprint.

Cite this