Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis

Jemila S. Hamid, Christopher Meaney, Natasha S. Crowcroft, Julia Granerod, Joseph Beyene*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)


Background: Encephalitis is an acute clinical syndrome of the central nervous system (CNS), often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology.Methods: We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis.Results: Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors.Conclusions: It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents. These similarities and/or differences with respect to symptom and diagnostic measurements might be attributed to host factors. The idea that characteristics of the host may be more important than the pathogen is also consistent with the observation that for some causes, such as herpes simplex virus (HSV), encephalitis is a rare outcome of a common infection.

Original languageEnglish
Article number364
JournalBMC Infectious Diseases
Publication statusPublished - 31 Dec 2010


Dive into the research topics of 'Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis'. Together they form a unique fingerprint.

Cite this