Abstract
Mass spectrometry based proteomic methods generate a large quantity of complex highly dimensional data, particularly when used to analyse intact biological samples. Analysis of this data requires approaches that can cope with this complexity and dimensionality. The main goal of data mining approaches are to develop a numerical classification of samples, addressing a particular biological question and using a parsimonious set of features from the data. To achieve this, a pipeline is followed starting from raw spectral data to a classification model based on a shortlist of features. In this chapter, we will follow this pipeline, outline the methodologies involved and provide examples of their application in proteomics.
Original language | English |
---|---|
Title of host publication | Mass Spectrometry for Microbial Proteomics |
Publisher | John Wiley and Sons |
Pages | 409-422 |
Number of pages | 14 |
ISBN (Print) | 9780470681992 |
DOIs | |
Publication status | Published - 15 Jun 2010 |
Keywords
- Artificial Neural Networks (ANNs)-algorithms determining solution to problems iteratively fitting a complex function (model) to data
- Biological sample analysis using mass spectrometry (MS) based methods-number of challenges
- Data mining for predictive proteomics
- Feature selection method-Support Vector Machines Recursive Feature Elimination (SVMRFE)
- Noise reduction, baseline removal and normalization
- Peptide spectrum match (PSM)-collection of statistics
- Pre-processing MS data
- Proteomics data mining workflow
- Statistical analysis of 2D gels and mass spectral data analysis
- Vectors explaining variation in data-using principal component analysis