Results of COVID-19 subgroups by symptoms and comorbidities using the open nCov-2019 dataset. The nCov-2019 dataset comprises a collection of publicly available information on worldwide cases confirmed during the ongoing nCoV-2019 outbreak.

Materials: We analyzed the raw nCov-2019 dataset release at 2020-05-11. We included those cases were at least one symptom and an outcome were available. Then, we fixed duplicates and homogenized values in outcomes, comorbidities and symptoms. We mapped the latter to ICD-10 terms. The final sample included 170 cases.

Methods: We applied a Multiple Correspondence Analysis 3-dimensional embedding of symptoms and outcomes and a hierarchical clustering. The proper number of clusters for both age-independent and age group analyses were selected by supervised inspection of group consistency.

Results: We found clinically meaningful patient subgroups based on symptoms and comorbidities for specific age groups and age-independent analyses. However, the two most prevalent source countries were divided into separate subgroups with different manifestations of severity.

For further details read our publication:

Carlos Sáez, Nekane Romero, J Alberto Conejero, Juan M García-Gómez. Potential limitations in COVID-19 machine learning due to data source variability: a case study in the nCov2019 dataset. Journal of the American Medical Informatics Association. 28 (2): 360-364. February 2021. doi: 10.1093/jamia/ocaa258

Code available for replication in our COVID-19 Subgroup Discovery tool GitHub repository.



Results of COVID-19 subgroups stratified based on gender and age groups, using clinical phenotypes and demographic features on the open Mexico Covid-19 dataset. The dataset comprises a collection of publicly available information on Mexican nationwide cases tested during the ongoing COVID-19 pandemic.

Materials: We analyzed the dataset release at 2020-11-02. We included patients confirmed as positive of SARS-CoV-2. We excluded cases who presented missing information in at least one chronic disease, and those showing inconsistent or non-plausible records for some combinations of variables, particularly dates. We also excluded patients who presented symptoms after September 30 to allow complete 30-day survival metrics. The final sample included 778 692 cases.

Methods: We applied a Multiple Correspondence Analysis 3-dimensional embedding on the clinical phenotypes and demographic features followed by a hierarchical clustering. The UI allows navigating through different number of clusters. The proper number of clusters in terms of group consistency and clinical validity, selected according to expert supervision, appear by default.

Results: We found clinically meaningful patient subgroups for specific age groups and age-independent analyses. A total of 56 age-gender clusters were further grouped in 11 clinically distinguishable meta-clusters with specific severity outcomes.

For further details read our publication:

Lexin Zhou, Nekane Romero, Juan Martínez-Miranda, J Alberto Conejero, Juan M García-Gómez, Carlos Sáez. Heterogeneity in COVID-19 severity patterns among age-gender groups: an analysis of 778 692 Mexican patients through a meta-clustering technique. Pending of publication in medRxiv.

Code available for replication in our COVID-19 Meta-Clustering GitHub repository.



Results of COVID-19 subgroups in patients over 64 years admitted at the Intensive Care Unit (ICU) based on their early blood test results at hospital admission.

Developed in the context of the project Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19) funded by Fondo Supera COVID-19 by CRUE (Conferncia de Rectores de las Universidades Españolas) - Santander Universidades (Santander Bank), from Oct 2020 to Dec 2021.

Materials: We analyzed the dataset release at 2021-03-16 including 193 ICU patients from the Hospital 12 de Octubre, Madrid, and Hospital Clinico Universitario de Valencia.

Methods: We applied a t-SNE embedding on the full blood count lab data followed by a hierarchical clustering. The UI allows navigating through different number of clusters.

Publication in process of submission.


Loading...
Loading...
Loading...

Loading...
Loading...

Loading...
Loading...
Loading...
Loading...

Loading...
Loading...

Loading...
Loading...

Loading...
Loading...
Loading...
Loading...

Loading...
Loading...