14th International Workshop on Database and Expert Systems Applications (DEXA'03)
ConsDiff: Identification of Conserved Differences between Sets of Amino Acid Sequences
Prague, Czech Republic
September 01-September 05
ISBN: 0-7695-1993-8
Proteins have been classified into families based on metrics of similarity such as sequence or structural similarity. However, there are significant differences in function even within families. For example, only a subset of the family of matrix metalloproteinases is capable of cleaving collagen. Typically, a scientist scans a multiple sequence alignment by eye to find amino acids that might be responsible for differences in functionality. This has the advantage of relying on expert knowledge but is subjective and non-scalable. We propose an algorithm that automates this process, highlighting key residues that are conserved within each group but different between the groups. This is based on a set of parametric rules using log-odds scores from amino acid substitution matrices and a multiple sequence alignment. ConsDiff is a webserver-based implementation of this approach that uses ClustalW to generate a multiple sequence alignment and then highlights conserved differences between two sets of sequences. ConsDiff offers flexibility in thresholds of detection and the choice of several PAM/BLOSUM matrices, or a user specified matrix or alignment. This allows the automated discovery of candidate residues that may be responsible for critical differences in function, which may then be experimentally verified.
Citation:
Saumil Mehta, Deendayal Dinakarpandian, "ConsDiff: Identification of Conserved Differences between Sets of Amino Acid Sequences," dexa, pp.21, 14th International Workshop on Database and Expert Systems Applications (DEXA'03), 2003