Oral Presentation 31st Annual Lorne Proteomics Symposium 2026

Bridging clinical ontologies and proteomics to overcome statistical limitations in rare disease research (133011)

Corazon Ericka Mae Itang 1 2 , Vincent Albrecht 1 , Alicia-Sophie Schebesta 1 2 , Marvin Thielert 1 , Anna-Lisa Lanz 3 , Katharina Danhauser 3 , Jessica Jin 3 , Tobias Prell 3 , Sophie Strobel 3 , Christoph Klein 2 3 , Matthias Mann 1 , Susanne Pangratz-Fuehrer 2 3 , Johannes Bruno Mueller 1 2
  1. Max Planck Insitute of Biochemistry, Planegg, BAVARIA, Germany
  2. German Center for Child and Adolescent Health, partner site Munich, Munich, Bavaria, Germany
  3. Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig Maximilian University, Munich, Bavaria, Germany

Proteomics analyses of highly fragmented, heterogeneous cohorts are often hampered by statistical limitations due to small sample size, making it difficult to draw meaningful conclusions about disease mechanisms. We address this limitation by developing an analytical framework that aggregates diagnostic labels based on their biological similarity, as implied by their clinical relationships in the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT).

We applied this to a heterogeneous cohort of 1,140 children and adolescents (aged 3-17 years) from the Dr. von Hauner Children’s Hospital in Munich, comprising 394 distinct pediatric conditions. The cohort includes 131 healthy individuals and 1,009 patients, all sex- and age-matched. Both urine and plasma samples were analyzed. Urine samples were processed using an on-bead PAC-based workflow and quantified by multiplexed-DIA (mDIA) with pooled urine as a reference. Plasma samples were analyzed through two complementary strategies: label-free and perchloric acid depletion. All workflows were automated in a 384-well format and run on timsTOF and Orbitrap-Astral platforms. Protein identification and quantification were performed using DIA-NN and directLFQ.

After stringent filtering and batch correction, we quantified over 5,000 proteins in urine, 900 in undepleted plasma, and 1,900 in perchloric acid-depleted plasma. Principal component analysis (PCA) revealed no discrete clustering by age but showed a continuous gradient along PC1, correlating with increasing age. PC1 values rose sharply after ages 10-12, reflecting hormonal and developmental changes associated with puberty. These findings align with prior work highlighting the influence of age and sex on pediatric proteomes.

To account for these confounding factors, we applied analysis of covariance (ANCOVA) to identify significant differential protein expression across the ten most prevalent diseases. For rarer conditions (<5 patients), we leveraged SNOMED CT to group diagnostic labels into biologically meaningful clusters. A local network of SNOMED CT terms was embedded into a 128-dimensional latent space using node2vec, followed by unsupervised clustering optimized by Silhouette scores. This reduced the number of underpowered disease categories from 344 to 8, enabling statistical comparison across aggregated groups.

This ontology-guided framework reveals molecular signatures across developmental stages and disease clusters while controlling for age- and sex-related variation. It provides a generalizable solution for analyzing heterogeneous patient populations where traditional case-control designs are impractical, bridging clinical classification with molecular profiling in pediatric and rare disease research.

  1. Itang, E.C.M., Albrecht, V., Schebesta, A.-S., Thielert, M., Lanz, A.-L., Danhauser, K., Jin, J., Prell, T., Strobel, S., Klein, C., et al. (2025). Ontology-guided clustering enables proteomic analysis of rare pediatric disorders. EMBO Mol. Med. 17, 1842–1867. https://doi.org/10.1038/s44321-025-00253-z.
  2. Govender, I.S., Mokoena, R., Stoychev, S., and Naicker, P. (2023). Urine-HILIC: Automated Sample Preparation for Bottom-Up Urinary Proteome Profiling in Clinical Proteomics. Proteomes 11, 29. https://doi.org/10.3390/proteomes11040029.
  3. Thielert, M., Itang, E.C., Ammar, C., Rosenberger, F.A., Bludau, I., Schweizer, L., Nordmann, T.M., Skowronek, P., Wahle, M., Zeng, W., et al. (2023). Robust dimethyl‐based multiplex‐DIA doubles single‐cell proteome depth via a reference channel. Mol. Syst. Biol. 19, e11503. https://doi.org/10.15252/msb.202211503.
  4. Albrecht, V., Müller-Reif, J.B., Brennsteiner, V., and Mann, M. (2025). A simplified perchloric acid workflow with neutralization (PCA-N) for democratizing deep plasma proteomics at population scale. Preprint at Biochemistry, https://doi.org/10.1101/2025.03.24.645089.
  5. Demichev, V., Messner, C.B., Vernardis, S.I., Lilley, K.S., and Ralser, M. (2020). DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41–44. https://doi.org/10.1038/s41592-019-0638-x.
  6. Ammar, C., Schessner, J.P., Willems, S., Michaelis, A.C., and Mann, M. (2023). Accurate Label-Free Quantification by directLFQ to Compare Unlimited Numbers of Proteomes. Mol. Cell. Proteomics 22, 100581. https://doi.org/10.1016/j.mcpro.2023.100581.