Welcome to EMDLP server:

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction


Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. The exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious.

Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network(CNN) and long short-term memory(LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing(NLP), deep learning(DL) for these reasons.

This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM(BiLSTM), which helps to take better advantage of the local and global information for site prediction.

The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network(DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, different encoding methods are integrated by vote to predict methylation modification sites comprehensively. Experiment results on m1A and m6A reveal that the EMDLP outperforms the state-of-the-art models.

Fig. 1 We built a computational framework based on RGloVe, DCNN, and BiLSTM neural networks to predict the RNA methylation location.

Fig. 2 Structure of EMDLP predictor. The diagrams depicted our method's architecture. Three different deep learning classifiers predicted the RNA methylation sequences, and an ensemble vote decided the final findings.


Dataset:

The Dataset Used in Our Paper

m6A:

The training set, validation set for building model:
train_balance_group(fasta)

The independent test set for assessment predictor:
test_balance_group(fasta)

m1A:

The training set, validation set for building model:
train_unbalance_group(txt)

The independent test set for assessment predictor:
test_unbalance_group(txt)

See the text of the paper for more detail information.

Contact Us:
Lin Zhang, Professor
Institute of Bioinformatics, China University of Mining and Technology
Address: No.1, Daxue Road, Xuzhou, Jiangsu, 221116, P. R. China
E-mail: lin.zhang@cumt.edu.cn
Hui Liu, Associate Professor
Institute of Bioinformatics, China University of Mining and Technology
Address: No.1, Daxue Road, Xuzhou, Jiangsu, 221116, P. R. China
E-mail: hui.liu@cumt.edu.cn
Honglei Wang, PhD
Institute of Bioinformatics, China University of Mining and Technology
Address: No.1, Daxue Road, Xuzhou, Jiangsu, 221116, P. R. China
E-mail: wanghonglei@cumt.edu.cn
We recommend that you always use the current version of Chrome as browser and set the resolution greater than 1280 x 720 for better browsing experience. If you use other browsers, you might notice that some functions and features would not working properly.