An understanding of the underlying workings of transformer models for the prediction of protein function

By expanding the popular XAI technique of integrated gradients to include latent representations inside transformer models that were tuned to Gene Ontology term and Enzyme Commission number prediction, the authors investigated how explainable artificial intelligence (XAI) can aid in illuminating the inner workings of neural networks for protein function prediction.

With the help of this method, the authors were able to pinpoint the amino acids that the transformers specifically target in the sequences. The authors also demonstrated that these pertinent sequence segments align with biological and chemical expectations, both within the model and in the embedding layer, where they found transformer heads that had a statistically significant correspondence of attribution maps with ground truth sequence annotations (such as transmembrane regions and active sites) in a wide range of proteins.

Visit https://github.com/markuswenzel/xai-proteins to view the source code.

Reference:
Wenzel M. et. al. (2024) Insights into the inner workings of transformer models for protein function prediction. Bioinformatics 40 (3): btae031