The functional readout of an organism’s physiological condition is reflected in its proteomic profile. There is a great deal of scientific interest in learning more about what determines and regulates protein abundances. The model organism Saccharomyces cerevisiae has been extensively researched; databases like the Saccharomyces Genome Database and well curated genome-scale metabolic models like Yeast8 have a wealth of organized information on yeast systems biology. These richly information-rich datasets, which are developed over decades of trials, follow semantically relevant ontologies.
The authors used relational learning to construct data descriptors by capturing this knowledge in an expressive Datalog database. These, when paired with supervised machine learning, allowed them to predict protein abundances in a comprehensible way. They discovered that protein abundances, functions, and phenotypes—like α-amino acid accumulations and variations in chronological lifespan—are predictively correlated. We further illustrate the efficacy of this methodology by comparing quantitative abundances to qualitative biological notions using the proteins His4 and Ilv2.
The following Github repository contains all of the data and processing scripts: https://github.com/DanielBrunnsaker/ProtPredict
Reference:
Brunnsaker D. (2024) Interpreting protein abundance in Saccharomyces cerevisiae through relational learning. Bioinformatics 40(2) : btae050