Presented by Zheqi Lin
Masters Candidate in Biostatistics
Plan B Adviser: Dr. Erika Helgeson
Cluster analysis is widely used to identify patient subgroups in chronic obstructive pulmonary disease (COPD), yet most studies rely on Euclidean-based algorithms that treat binary comorbidity indicators as continuous variables. Whether adapted methods designed for mixed continuous–binary data yield different cluster structures remains unclear. This study compared six clustering methods — three traditional Euclidean-based (k-means, hierarchical Ward, SOM) and three mixed-type adapted counterparts (K-prototypes, Gower+Ward, Supersom with Tanimoto distance) — applied to 235 COPD patients from the HiFlo trial. Clustering used 5 continuous clinical variables and 12 binary comorbidity indicators. Traditional methods produced clusters separated primarily along continuous severity gradients (FEV1%, BMI, 6MWD), whereas adapted methods generated clusters defined by comorbidity combinations. Adapted methods identified clinically relevant subgroups not found by traditional methods, including a Psychological Distress cluster (anxiety 82–90%, depression 85–96%) and a Cardiometabolic Multimorbid cluster (diabetes 84%, hypertension 84%, heart failure 47%). These findings suggest that adapted distance metrics should be considered alongside traditional approaches when clustering COPD patients on combined clinical and comorbidity features.


