Many methods have been proposed in the literature to automatically classify structures. Each of them suggests a different approaches to solve this problem, but, as far as we known, they all exploit sequence similarity to discover analogous structures and fit domains in the correct families. However, in some cases, the score associated to a single sequence alignment is not good distance function to cluster proteic domains with respect to their structures. For this reason, methods which take care of the similarity with respect to entire sets of sequences, rather than to their single components, would be preferable.
The notion of fingerprint summarizes the relations between sequences in a set and provides a visual representation of the general characters of the set itself. This is achieved by decomposing the best score alignment between two sequences into a pair of information theory measures, the mutual information and the target frequence divergence. Such pair is the BLOSpectrum of the two sequences.
SCOP family fingerprint is the set of BLOSpectra associated with the all-vs-all sequence alignments among elements of the family itself. It somehow models the canonical way in which a sequence of the family is related to the others.
A graphical representation of the SCOP family fingerprint of family g.8.1.1: x-axis and y-axis correspond to target frequence divergence and mutual information, respectively.
The relative fingerprint of a sequence with respect to a SCOP family is the set of BLOSpectra associated to the alignments of the sequence versus all sequences in the considered family and depict the relation between the considered sequence and all the family. If a sequence are likely to belong to a given set, then the relative fingerprint of it with respect to such a set such be “similar” to the family fingerprint of the set itself.
The relative fingerprint of a sequence in the SCOP family a.1.1.2 is spread over the corresponding family fingerprint. In this case, the investigated sequence is likely to be included into the family.
The relative fingerprint of a sequence which is not included in the SCOP family a.1.1.2 lies at lower-right corner of the family fingerprint and this denotes that the query sequence is not “compatible” with the family.
Given the ASTRAL compendium and a query sequence, the BLOSpectrum tools can build all the SCOP family fingerprints and the corresponding relative fingerprints, compare them, and provide a likelihood ranking for the query sequence. For more details, see the publications.