Recent transcriptome studies have revealed that large number of
transcripts in mammals and other organisms do not encode proteins
but function as noncoding RNAs (ncRNAs) instead. As millions of
transcripts are generated by large-scale cDNA and EST sequencing
projects every year, there is a need for automatic methods to
distinguish protein-coding RNAs from noncoding RNAs accurately
and quickly. We developed a Support Vector Machine-based classifier,
named Coding Potential Calculator (CPC), to assess the protein-coding
potential of a transcript based on six biologically meaningful
sequence features. 10-fold cross-validation on the training dataset
and independent testing on three large standalone datasets showed
that CPC can discriminate coding from noncoding transcripts with high
accuracy. Furthermore, CPC also runs an order-of-magnitude faster
than a previous state-of-the-art tool and has higher accuracy.
We developed a user-friendly web-based interface of CPC at
http://cpc.cbi.pku.edu.cn. In addition to predicting the coding
potential of the input transcripts, the CPC web server also
graphically displays detailed sequence features and additional
annotations of the transcript that may facilitate users'
further investigation.
The coding potential calculator tool reads
FASTA data format as input.
"A sequence in FASTA format begins with a single-line description, followed by lines
of sequence data. The description line is distinguished from the sequence data by
a greater-than (">") symbol in the first column.(ncbi) "
If you still do not very clear with what we are talking about, please
refer to lemma FASTA at CPC Glossary.
To start your calculate task, click
HERE. And there is a
step by setp guide to teach users how to use our CPC online.
After user input sequences and run, the calculator will assign user a
Task ID which is unique. You can use it to access your results
at our
Data Retrival Page.