import React from 'react';

class About extends React.Component {
  render() {
    return (
      <div className="about">
        <h1>About D2Odb</h1>
        <p>D2Odb is a database of predicted structural transitions in proteins.</p>
        <p>
          Many proteins, and most notably those involved in cellular regulation and signaling
          contain significant regions of disorder, and belong to the class of Intrinsically
          Disordered Proteins (IDPs). IDPs can undergo disorder-order transitions, typically upon
          binding another protein or DNA (coupled folding and binding). The energy landscapes of
          IDPs are typically rugged, featuring a continuum of conformational states that enables
          interaction with other molecules via both conformational selection and induced fit.
        </p>
        <p>
          In order to establish whether point mutations within the regions encoding disordered
          regions may result in microstructuralization and generate nascent microstructural elements
          that may form substrates for evolution or result in adaptive alterations to protein
          function, we performed a survey of the human mutation database. Specifically, we performed
          a bioinformatic analysis to identify mutations predicted to generate localized regions of
          microstructure in previously disordered regions of target proteins. We interrogated the
          human polymorphisms and disease mutations dataset, and compiled a dataset of 68,383 unique
          human mutations, comprising 28,662 human disease mutations, and 39,721 polymorphisms. We
          then applied standard algorithms for disorder prediction to every mutation and
          polymorphism in the dataset. A predictor voting strategy was employed to determine the
          prediction outcome for each mutation. As there are multiple predictors for protein
          disorder, the residues were deemed to be located within disordered regions if the number
          of predictors assigning residues to disordered regions were equal to or larger than the
          number of predictors assigning the residues to ordered regions. Four types of structural
          transitions were defined: Disorder-to-Order (D-O), Order-to-Disorder (O-D),
          Disorder-to-Disorder (D-D) and Order-to-Order (O-O).
        </p>
        <p>
          Brief methods can be found below. For full methods, see the publication describing this
          work:
        </p>
        <p>
          Li C, Clark LVT, Zhang R, Porebski BT, McCoey JM, Borg NA, Webb GI, Kass I, Buckle M, Song
          J, Woolfson A, Buckle AM. Structural Capacitance in Protein Evolution and Human Diseases.
          J Mol Biol. 2018 Sep 14;430(18 Pt B):3200-3217. [link:{' '}
          <a
            href="https://www.sciencedirect.com/science/article/pii/S0022283618307228"
            target="_blank"
            rel="noopener noreferrer"
          >
            https://www.sciencedirect.com/science/article/pii/S0022283618307228
          </a>
          )
        </p>
        <h3>Databases/predictors for disordered region prediction</h3>
        <p>
          For both wild-type and mutated proteins, the disorder prediction results were defined
          using 4 predictors, namely: VSL2B [2], IUPred (short and long versions) [3, 4] and
          DynaMine [5].
        </p>
        <p>
          <b>D2P2 database:</b> D2P2 [6] is an online knowledgebase for protein disordered regions
          prediction results using nine tools for protein disorder prediction: PONDR VLXT [66],
          PONDR VSL2B [2], IUPred (short and long versions) [3, 4], Espritz-D [8], Espritz-X [7],
          Espritz-N [8], PrDOS [9] and PV2 [10]. In addition, in the updated version of D2P2, MoRF
          regions (predicted by ANCHOR [11, 12]) and post-translational modification sites
          annotation were used for the investigation of protein binding and function within the
          disordered regions.
        </p>
        <p>
          <b>DisProt:</b> DisProt (
          <a href="http://www.disprot.org/index.php" target="_blank" rel="noopener noreferrer">
            http://www.disprot.org/index.php
          </a>{' '}
          [1]; Version: 7 v0.3) harbors experimentally verified intrinsically disorder proteins and
          disordered regions. DisProt provides detailed function classification, function
          description and experimental evidence for each entry in this database. The advantage of
          this database is that the disordered regions harbored in DisProt have been experimentally
          verified. We used the DisProt database to locate mutations that are located in
          experimentally-validated regions of disorder. We then applied four predictors (VSL2B,
          IUPred-L, IUPred-S and DynaMine) to predict disorder-order transitions.
        </p>
        <p>
          <b>IUPred:</b> IUPred (
          <a href="http://iupred.enzim.hu/" target="_blank" rel="noopener noreferrer">
            http://iupred.enzim.hu/
          </a>{' '}
          [3, 4]) maintains two versions of IUPred including IUPred-S and IUPred-L. Here, ‘S’ and
          ‘L’ refer to the long LDRs and SDRs, respectively. For the ‘S’ option, the model was
          trained using a dataset corresponding to missing residues in the protein structures. These
          residues are absent from the protein structures due to missing electron density in the
          corresponding X-ray crystal structures. These disordered regions are usually short.
          Conversely for the ‘L’ option, the dataset used to train models corresponds to long
          disordered regions that are validated by various experimental techniques. In our study,
          residues with predicted scores equal to or above 0.5 were considered to be located in
          disordered regions.
        </p>
        <p>
          <b>PONDR-VSL2B:</b> VSL2B [2] is a widely used sequence-based predictor for intrinsically
          disordered regions, using Support Vector Machine (SVM). Residues with predicted scores
          equal to or above 0.5 are considered to be disordered.
        </p>
        <p>
          <b>DynaMine:</b> DynaMine [5], which is trained with a curated nuclear magnetic resonance
          (NMR) dataset, was used to predict protein disordered regions with only sequence
          information as the input. Residues with predicted scores less than or equal to 0.69 are
          considered to be located in disordered regions, while those with scores greater than or
          equal to 0.8 are predicted to be in the structured regions.
        </p>
        <h3>Amino acid hydrophobicity indices</h3>
        <p>
          Three indices were chosen in our study: Hopp-Woods hydrophilicity index [12],
          Kyte-Doolittle hydropathy index [14] and Eisenberg hydrophobicity index [15].
        </p>
        <h3>Predictor for protein transmembrane helices prediction</h3>
        <p>
          TMHMM [16] employs hidden Markov model for membrane protein topology prediction. Given the
          fact the protein transmembrane domains are structurally stable and ordered, TMHMM was used
          to further validate the predicted disordered regions. Mutations predicted to be in
          transmembrane regions were discarded.
        </p>
        <h3>Protein structure BLAST</h3>
        <p>
          In order to ensure that wild-type proteins with predicted disordered regions that lack
          experimentally-determined structures or homologue structures, we performed a BLAST search
          against the PDB database (
          <a
            href="http://www.rcsb.org/pdb/software/rest.do"
            target="_blank"
            rel="noopener noreferrer"
          >
            http://www.rcsb.org/pdb/software/rest.do
          </a>
          ) [17] using the protein sequences (e-value cutoff = 0.01). Any proteins with predicted
          disordered regions and BLAST hits against the PDB database were removed.
        </p>
        <h3 id="references">References:</h3>
        <ul className="list">
          <li id="1">
            [1] Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, et al. DisProt:
            the Database of Disordered Proteins. Nucleic Acids Res. 2007;35:D786-93.
          </li>
          <li id="2">
            [2] Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z. Length-dependent prediction
            of protein intrinsic disorder. BMC Bioinformatics. 2006;7:208.
          </li>
          <li id="3">
            [3] Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of
            intrinsically unstructured regions of proteins based on estimated energy content.
            Bioinformatics. 2005;21:3433-4.
          </li>
          <li id="4">
            [4] Dosztanyi Z, Csizmok V, Tompa P, Simon I. The pairwise energy content estimated from
            amino acid composition discriminates between folded and intrinsically unstructured
            proteins. Journal of molecular biology. 2005;347:827-39.
          </li>
          <li id="5">
            [5] Cilia E, Pancsa R, Tompa P, Lenaerts T, Vranken WF. From protein sequence to
            dynamics and disorder with DynaMine. Nature communications. 2013;4:2741.
          </li>
          <li id="6">
            [6] Oates ME, Romero P, Ishida T, Ghalwash M, Mizianty MJ, Xue B, et al. (DP2)-P-2:
            database of disordered protein predictions. Nucleic Acids Res. 2013;41:D508-D16.
          </li>
          <li id="7">
            [7] Romero P, Obradovic Z, Li XH, Garner EC, Brown CJ, Dunker AK. Sequence complexity of
            disordered protein. Proteins-Structure Function and Genetics. 2001;42:38-48.
          </li>
          <li id="8">
            [8] Walsh I, Martin AJM, Di Domenico T, Tosatto SCE. ESpritz: accurate and fast
            prediction of protein disorder. Bioinformatics. 2012;28:503-9.
          </li>
          <li id="9">
            [9] Ishida T, Kinoshita K. PrDOS: prediction of disordered protein regions from amino
            acid sequence. Nucleic Acids Res. 2007;35:W460-W4.
          </li>
          <li id="10">
            [10] Ghalwash MF, Dunker AK, Obradovic Z. Uncertainty analysis in protein disorder
            prediction. Mol Biosyst. 2012;8:381-91.
          </li>
          <li id="11">
            [11] Meszaros B, Simon I, Dosztanyi Z. Prediction of Protein Binding Regions in
            Disordered Proteins. PLoS Comp Biol. 2009;5.
          </li>
          <li id="12">
            [12] Dosztanyi Z, Meszaros B, Simon I. ANCHOR: web server for predicting protein binding
            regions in disordered proteins. Bioinformatics. 2009;25:2745-6.
          </li>
          <li id="13">
            [13] Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid
            sequences. Proc Natl Acad Sci U S A. 1981;78:3824-8.
          </li>
          <li id="14">
            [14] Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a
            protein. J Mol Biol. 1982;157:105-32.
          </li>
          <li id="15">
            [15] Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface
            protein sequences with the hydrophobic moment plot. J Mol Biol. 1984;179:125-42.
          </li>
          <li id="16">
            [16] Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein
            topology with a hidden Markov model: application to complete genomes. J Mol Biol.
            2001;305:567-80.
          </li>
          <li id="17">
            [17] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein
            Data Bank. Nucleic Acids Res. 2000;28:235-42.
          </li>
        </ul>
        <p>
          Given the variability in the predictions among the four predictors tested, which necessitated a majority voting approach, we looked for experimental evidence suggesting that the predicted regions were disordered. Accordingly, we cross-referenced our human disease mutations and polymorphisms dataset against DisProt [1], a database providing experimentally verified disordered regions of proteins. For the resulting matches, we applied four protein disordered region predictors to predict the structural changes following mutation events, using majority voting. Disorder prediction using majority voting predicts that 108 mutations in long disordered regions result in a D-O structural transition <a href='/table-1.pdf' download>(Table 1)</a>.
        </p>
      </div>
    );
  }
}

export default About;
