|Protein DisOrder prediction System|
Recent progress in structural genomics has revealed that many proteins have regions with very flexible and unstable structures, even in their native states. Such proteins or regions are called natively disordered or unstructured. Disordered protein regions often lead to difficulties in purification and crystallization of proteins, and become a bottleneck in high throughput structural determination. Therefore, it is quite necessary to identify the disordered regions of target proteins from their amino acid sequences.
The prediction of disordered regions is also important for the function annotation of proteins. In the sense of the classical "lock-and-key" theory, it is hard to imagine that natively disordered regions have some biological meaning. However, disordered regions are reportedly involved in many biological processes, such as regulation, signaling and cell cycle control. The primary role of natively disordered regions seems to be the molecular recognition of proteins or DNA. Upon binding with ligands, disorder-to-order transitions are frequently observed, where the flexibility of the disordered regions may be necessary to interact with multiple partners with high-specificity and low-affinity. In addition, recent research has indicated that the phosphorylation sites are frequently found in disordered regions, and thus the prediction of phosphorylation sites is expected to be improved by the high accurate prediction of disordered regions.
Various prediction methods have been reported. Although the details and the mathematical tricks of the methods are different from each other, they use similar information that the disordered regions have the characteristics a higher frequency of hydrophilic and charged residues, and lower sequence complexity.
PrDOS is composed of two predictors, that is, a predictor based on the local amino acid sequence, and the one based on template proteins (or homologous proteins for which structural information is available). The first part is the implemented using support vector machine (SVM) algorithm for the position specific score matrix (or profile) of the input sequence. More precisely, a sliding window is used to map individual residues into a feature space. A similar idea has already been used in a secondary structure prediction, as in PSIPRED. The second part assumes the conservation of intrinsic disorder in protein families, and is simply implemented using PSI-BLAST and our own measure of disorder, as described later. The final prediction is done as the combination of the results of the two predictors.
The performance of the prediction methods has been evaluated by the structural biology community at the CASP benchmark, that is, critical assessment of techniques for protein structure prediction. In 2006, the seventh round of the CASP benchmark was held, and the assessors of the CASP also evaluated our method. As a result, our methods achieved high performance (estimated accuracy (Q2) (> 90%) with the sensitivity of 0.56) especially for short disordered regions. See the details at the CASP7 meeting web page (our group number is 443, team name is fais).