Poster Presentation 31st Annual Lorne Proteomics Symposium 2026

Motif-based clustering of phosphosites reveals co-regulatory relationships in Saccharomyces cerevisiae (#135)

Heather McDonald-Haynes 1 2 , Marc Wilkins 1 2
  1. School of Biotechnology & Biomolecular Sciences, University of New South Wales, Kensington, NSW, Australia
  2. ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems, Kensington, NSW, Australia

Protein phosphorylation is a critical intracellular process which can modify protein activity and drive intracellular signalling networks. During phosphorylation, a kinase catalyses the transfer of a phosphate group onto a target amino acid - forming a phosphosite. While thousands of phosphosites are known, kinase-phosphosite mapping remains challenging with a very small number of phosphosites being mapped to a kinase. As a result, the majority of signalling pathways and networks remain undefined. However, an underexplored way of building phosphorylation networks is to closely investigate sequence motifs surrounding all known phosphosites in a proteome. Since there are 127 protein kinases in yeast, a motif-based analysis should theoretically define 127 kinase recognition motifs. This would reveal phosphosites likely to arise from the same kinase and infer a range of regulatory relationships between substrate proteins.

A phosphosite dataset for Saccharomyces cerevisiae was collated and filtered to yield a final set of 11,221 sites. Sequence motifs shared by at least two phosphosites (site ±4 amino acids) were identified, allowing up to four mismatches. Sequence similarity of phosphosites found within the same motif were scored using the EDSSMat50 substitution matrix for disordered protein regions and by scoring matching residue types at each position. Top scoring motif-based phosphosite pairs were used as seed clusters to build a motif-based phosphosite network. These clusters were then expanded using a novel greedy, seed-and-grow clustering method that added lower-scoring phosphosite pairs overlapping an existing cluster where the addition did not lower the overall cluster score. Average-linkage hierarchical clustering consolidated the motif-based phosphosite clusters into 124 groups. On average each group contained 10 clusters and 34 phosphosites, with the largest group comprising 146 clusters and 521 phosphosites. Consensus sequences derived from these motif-based groups recapitulated known kinase recognition motifs, such as S/T-P motifs, and those that are basophilic or acidophilic. Novel kinase-recognition motifs classes and subclasses were also identified, including R/K-X-X-S/T-X-D/E and G-X-G-S-F. Using this motif-based phosphosite network, known kinase-substrate relationships can now be propagated to infer the kinase(s) most likely targeting phosphosites within the same cluster.