The function of RNA depends on its secondary structure. Several articles have tried to apply machine learning to enhance de novo RNA secondary structure prediction over the past few years. The results for intra-family predictions are frequently spectacular in these articles, but the more trickier (and useful) inter-family problem is rarely discussed.
In order to evaluate the effectiveness of learning-based models, the authors suggest a more exacting method for inter-family cross-validation. With the use of this technique, the authors further show that intra-family performance, despite the popular assumption in the field, is insufficient proof of generalisation and offer compelling evidence that many existing learning-based models have not generalised inter-family.
The source code can be obtained at the following URL: https://github.com/marcellszi/dl-rna.
Reference
Szikszai Marcell et. al.(2022)Deep learning models for RNA secondary structure prediction (probably) do not generalize across families Bioinformatics 38(16):3892-3899.