To annotate probable transcription factor binding sites or other RNA/DNA binding sites, it is common practise to look for probabilistic themes in a sequence. Position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models are useful motif representations. (HMMs). Dinucleotide PWMs incorporate dependency between neighbouring points in the motif in addition to the PWMs’ inherent simplicity—a matrix form and a cumulative scoring system. (unlike PWMs which disregard any dependency).SPRy-SARUS and MOODS are two programmes that can look for instances of di-PWMs in sequences at the moment.
The authors have suggested the use of the Python module dipwmsearch, which offers a unique and effective algorithm for this purpose. (it first enumerates matching words for the di-PWM, and then searches these all at once in the sequence, even if the latter contains IUPAC codes). The use of di-PWMs is made simpler for the user by an easy installation process using Pypi or conda, thorough documentation, and executable scripts.
dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.
Reference.
Mille M. et. al.(2023) dipwmsearch: a Python package for searching di-PWM motifs Bioinformatics 39(4):btad141.