The fundamentally diverse, modular, and elongated geometries of solenoid proteins, a subset of tandem repeat proteins, set them apart from globular proteins. Protein binding, enzymatic catalysis, ice binding, and nucleic acid interactions are just a few of the biological processes in which these proteins are crucial. Accurately identifying and annotating solenoid structures is still difficult, despite their biological significance and growing commercial applications, such as in therapeutically tailored variations like DARPins and customized PPR proteins. Recent developments in protein structure prediction indicate that structure-based solenoid detection techniques are superior than sequence-based ones because solenoid structures are more conserved than their sequences.
The authors introduce SOLeNNoID, a deep-learning-based pipeline for predicting solenoid residues in protein structures. Their method employs a convolutional neural network architecture to analyse protein distance matrices, enabling accurate identification of solenoid-containing regions. SOLeNNoID covers all three solenoid subclasses: α-, α/β-, and β-solenoids. Comparative evaluation against existing structure-based methods demonstrates the superior performance of their approach. Applying SOLeNNoID to the entire Protein Data Bank led to a 71% increase in detected solenoid-containing entries compared to the gold-standard RepeatsDB database, significantly expanding the known solenoid protein repertoire.
Python-based SOLeNNoID can be found on github at https://github.com/gnik2018/SOLeNNoID. The pre-trained models and source code are available under a free software license. Zenodo offers training data at https://zenodo.org/records/14927497.
Reference:
Georgi I Nikov et. al.(2025) SOLeNNoID: a deep learning pipeline for solenoid residue detection in protein structures.Bioinormatics 41(8): btaf415

Leave a Reply