BPLA kernels for structural RNA analysis

Base-pairing profile local alignment (BPLA) kernel is a powerful method that evaluates the similarity between a pair of RNAs taking into account secondary structure information. Based on the accurate similarity calculated by BPLA kernel, you can perform a broad range of structural RNA analysis, including family prediction, hierarchical clustering, and remote homology search. The method is applicable not only to classical genomic screens, but also to unannotated noncoding transcripts produced by next-generation sequencing technologies (RNA-seq).

Software

<URL:bpla_kernel-1.0.tar.gz>

Genomic screens

In [Morita et al., Nucleic Acids Res, 2009], we developed a family prediction method for noncoding RNAs using BPLA kernel as a kernel function of SVM classifiers. We demonstrated the effectiveness of our method through the genome-wide search for snoRNAs in C. elegans. qRT-PCR experiments showed that 14 out of 48 predicted candidates are significantly expressed compared to unannotated intronic or intergenic regions.

Supplementary materials for [Morita et al., Nucleic Acids Res, 2009]. These files contain the annotations data for WormBase.

Fast parameter optimization

In [Sato et al., Genome Inform, 2009], we developed a gradient-based optimization method for the parameters of BPLA kernel. In most existing tools for structural RNA analysis, parameters are fixed to empirical values, possibly degrading the performance, or are determined by brute-force grid search, resulting in the massive computation. In contrast, our method iteratively improves the parameters for a given dataset by evaluating the gradient of performance measures such as ROC scores. Our experiments showed that this procedure can find a nearly optimal set of parameters much faster than the grid search.

Profile-profile search

In [Saito et al., BMC Bioinformatics, 2010], we developed an extension of BPLA kernel, called Profile BPLA kernel, which predicts noncoding RNA families from alignment data rather than from single sequences. By utilizing the profile information contained in alignment data, the proposed method can achieve better accuracy than the original BPLA kernel. We showed that Profile BPLA kernel outperforms existing prediction methods that also utilize the profile information. Furthermore, our systematic evaluation demonstrated that Profile BPLA kernel can keep its performance under the practical situations where the quality of input alignments is not necessarily high. Profile BPLA kernel is integrated into the current distribution of the BPLA kernel package, which handles input RNAs regardless of single sequences or alignment data

Hierarchical clustering

In [Saito et al., BMC Bioinformatics, 2011], we developed a hierarchical clustering method based on the accurate similarity calculated by BPLA kernel. For more details, visit here.

Index-based search

In [Saito et al., submitted, 2011], we developed an index-based method that graetly speeds up the computation time of BPLA kernel for database search at a small loss of sensitivity. For more details, visit here.

References

Contact

Kengo SATO, Yutaka SAITO, and Yasubumi SAKAKIBARA