"Topology fingerprint approach to inverse folding problem" Available protein structures are divided into sequence families by clustering them based on the level of sequence similarity. The best quality structure from each cluster is then included into a database of representative protein structures. Each protein in this database is represented as a topology fingerprint: a contact map with additional information about the local secondary structure and side chain solvent exposure.
A topological fingerprint can be used to calculate a score, later
called energy, of an arbitrary sequence "forced" to adopt this
particular structure. Energy parameters are developed from contact
statistics derived from an idependend database of highly refined
protein structures and include burial, two body and three body
contributions. As shown on several exmples, energy calculated in this
way can be used as an indication of a protein structure quality.
Series of additional simplifying assumptions are necessary to allow the
introduction of gaps into the sequence being "forced" into a given
structure. In this variant of that method, even weak homologies can be
recognized and some cases structural similarities may be predicted.