[Brenner Computational Biology Research Group][Ed Green][Pairwise Sequence Comparison Evaluation]

Databases, Results, and Tools

This page accompanies the manuscript

Green RE, Brenner SE. 2002. Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison. Proceedings of the IEEE 9:1834-47. [pdf]

which extents on the work in

Brenner SE, Chothia C, Hubbard TJP. 1998. Assessing sequence comparison methods with reliable structurally-identified distant evolutionary relationships. Proceedings of the National Academy of Sciences of the United States of America 95:6073-6078. [pdf]

Databases

The databases used are listed below. The ASTRAL derived databases were all from version 1.57. More recent versions of astral can be found at the ASTRAL website.

    astral-scopdom-seqres-gd-sel-gs-bib-40-1.57.fa.gz [ASTRAL 1.57 genetic domains, filtered at 40% sequence identity]
    astral-1.57-40.s.fa.gz [same as above, but filtered of low complexity regions with seg]
    astral-1.57-40.s.fa.A.gz [training database, roughly 1/2 of astral-1.5]
    astral-1.57-40.s.fa.B.gz [test database]
    snr.020528.3ma.fa.gz [SNR database]

Results

Superfamily size distribution within training and test databases (data plotted in Figure 9a).
Coverage versus superfamily size (data plotted in Figure 9b).

Table of SCOP and PDB growth (data plotted in Figure 13).

Training phase table (Coverage at 0.01 EPQ under each normalization scheme for all algorithms, substitution matrices, and gap parameter combinations tested).

Tools

All the tools used in this analysis are available below. This Perl code has only been used on RedHat/linux systems, although it will probably run on other platforms:

27 March 2005 - Update

Pairwise Sequence Comparison / Evaluation (PSCE-1.0.1) tools
Updated set of tools with many improvements including Bayesian bootstrap, complete example set and a manual. [36 Mb]
Just the manual (pdf)

Older versions of tools:

    README file from SeqCompEvalTools
    SeqCompEvalToolsEx-1.10.tar.gz [ all tools with entire example set, 24 Mb ]
    SeqCompEvalTools-1.10.tar.gz [ all tools, no example set 27k ]
    SeqCompEvalToolsEx-1.02 [ all tools with entire example set, 20 Mb]
    SeqCompEvalTools-1.02 [just the tools, 26k]

Author

This page was written by Ed Green. It describes work done by Ed Green, Gavin Price, Gavin Crooks, and Steven Brenner at UC Berkeley.