Prediction of the folds of Synaptotagmin and the Beta subunit of Urease The aim of our work is to identify the likely fold of a protein given a family of sequences. This is achieved in two steps: Secondary structure prediction (including prediction of buried residues) and Secondary Structure Mapping (identification of the fold). A secondary structure prediction is obtained from an accurate multiple sequence alignment for the family by identifying positions of insertions/deletions, conserved residues and hydrophobic conservation patterns, and combining this analysis with the results of a variety of conventional single sequence secondary structure prediction algorithms. This strategy has previously been used to generate accurate secondary structure predictions for the annexins, SH2 domains and protein-tyrosine phosphatases prior to the experimental determination of the structures [Barton et al.,1991; Russell et al., 1992; Livingstone & Barton, 1994]. Specifically, helices and strands were predicted by the methods of Lim, Chou and Fasman and Robson as implemented in the ``Leeds'' prediction suite [Lim, 1974a; Lim, 1974b; Chou & Fasman, 1978; Garnier et al., 1978; Eliopoulos, 1989]. Predictions of turn were made using the algorithms of Rose and Wilmot and Thornton [Rose, 1978; Wilmot & Thornton, 1988].
Regions of the multiple alignment where insertions and deletions are seen, or which are varied in composition across the aligned family of sequences were assigned to coil (non-core secondary structure). The intervening regions were assigned either to helix or strand by referring both to the results of the classical prediction algorithms and to patterns of residue conservation identified with the aid of the AMAS program [Livingstone & Barton,1993]. The combined procedure yields a final three-state prediction. Secondary structure predictions by the method of Zvelebil et al. [1987] were used as a "casting vote" where ambiguity between strand and helix assignment could not otherwise be resolved.
For each prediction, a non-redundant set of sequences was extracted from the current version of the PIR [Sidman et al., 1988] and Entrez [Ostell,1992] databases. Multiple sequence alignments were made using the AMPS package [Barton & Sternberg, 1987, Barton,1990]. Poorly conserved regions within the resulting alignments were adjusted by eye in the locality of gaps.
Given the secondary structure prediction, prediction of buried residues and any additional knowledge, for example the location of catalytic residues, we searched a non-redundant set of protein domains to find folds consistent with the secondary structure prediction and other constraining data. This was achieved by the application of a newly developed technique known as Secondary Structure Mapping (SSM) [Russell et al., 1994]. SSM finds topologies consistent with secondary structure assignments and that are consistent with any experimental restraints (i.e. disulphide bonds, active site residues, etc.) available for a protein/protein family. First, all possible alignments of secondary structure elements are generated between a query (i.e. prediction) and a database (i.e. PDB) structure. These alignments (or maps) are then filtered to remove those structures which are 1) uncompact; 2) having ends of secondary structures too far apart to be bridged by predicted loop lengths; 3) having poor beta sheet hydrogen bonding; 4) lacking essential secondary structures (e.g. those with active site residues); 5) do not satisfy topological constraints (e.g. sequential beta strands are most often antiparallel); and 6) those which are unable to satisfy any experimentally derived distance restraints. The program is fast (able to search a representative set of PDB domains in about five minutes) and is able to go from a large number of initial maps (ie. prior to filtering) to a few structurally sensible maps.
Proteins Predicted: Synaptotagmin C2 domain, Urease Beta subunit, The mystery protein (!)
Barton, G. J.(1990). Methods Enzymol. 183, 403-428.
Barton, G. J., Newman, R. H., Freemont, P. F.& Crumpton, M. J. (1991).
Eur. J. Biochem. 198, 749-760.
Barton, G. J. & Sternberg, M. J. E. (1987).J. Mol. Biol. 198, 327-337.
Chou, P. Y. & Fasman, G. D. (1978). Adv. Enzymol. Relat. Areas Mol.
Biol.
Eliopoulos, E. (1989). The leeds protein secondary structure suite.
Garnier, J., Osguthorpe, D. J. & Robson,B. (1978). J. Mol. Biol.
Livingstone, C. D. & Barton, G. J. (1993). Comput. Appl. Biosci.
9, 745-756.
Livingstone, C. D. & Barton, G. J. (1994). Int. J. Pept. Protein
Res.
Lim, V. (1974a). J. Mol. Biol. 88, 857-872.
Lim, V.(1974b). J. Mol. Biol. 88, 873-894.
Ostell,J. (1992). Entrez sequences graphical user interface.
Rose, G. D. (1978).Nature, 272,586-591.
Russell, R. B., Breed, J. & Barton, G. J. (1992). FEBSLett. 304, 15-20.
Russell, R., Copley, R. & Barton, G.(1994). A method for fold prediction.
The method is still under development.
Sidman, K., George, D., Barker, W. & Hunt, L. (1988).Nucleic Acids Res.
Wilmot, A.C. M. & Thornton, J. M. (1988). J. Mol. Biol. 203, 221-232.
Zvelebil, M. J. J. M., Barton, G. J., Taylor, W. R. & Sternberg, M. J. E. (1987). J. Mol. Biol. 195, 957-961.