|
Readings
Lecture 1: January 18
no readings
Lecture 2: January 20
Pearson W (2000). Protein sequence comparison and protein evolution (This is the ISMB tutorial.)
Fitch WM (2000). Homology: a personal view on some of the problems. Trends in Genetics 16:227-31.
Winter WP, Walsh KA and Neurath H (1968). Homology as applied to proteins. Science 162:1433.
Fitch WM (1970). Distinguishing homologous from analogous proteins. Systematic Zoology 19:99-113. (For now, just read pages 99-102, 112-113.)
Lecture 3: January 25
Start reading chapter 2 of the Durbin, Eddy, Krogh & Mitchison (DEKM) book.
Finish reading pages 103-111 of the Fitch article. Focus on understanding principles, but not the
details.
Continue reading the Pearson ISMB tutorial.
Lecture 4: January 27
Finish reading DEKM sections 2.1, 2.2, 2.3, 2.4
Lecture 5: February 1
Henikoff S and Henikoff JG (1992). Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 89:10915-19.
Dayhoff MO, Schwartz RM and Orcutt BC (1978). A model of evolutionary change in proteins.
Yu YK, Wootton JC, Altschul SF. 2003. The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A. 100:15688-93.
NEW: Altschul SF (1998). Generalized affine gap costs for protein sequence alignment. Proteins 32:88-96.
NEW: Altschul SF (1991). Amino acid substitution matrices from an information theoretic perspective. J Mol Biol 219:555-565.
Lecture 6: February 3
DEKM section 2.5
Gallison F (2000). The Fasta and Blast programs
Lecture 7: February 8
DEKM section 2.7
Pagni M and Jongeneel CV
(2001). Making sense of score statistics for sequence
alignments. Briefings in Bioinformatics 2:51-67
OPTIONAL READINGS:
- NCBI tutorial: The
statistics of sequence similarity scores
- Pearson, W. R. and Wood,
T. C. (2001). Statistical significance in biological sequence
comparison. In Handbook of Statistical Genetics,
D. J. Balding, M. Bishop, and C. Cannings, ed. (London, UK: Wiley),
pp. 39-65.
- Gish WR
(2004). Introduction to Alignment Scoring Statistics
- Extreme
value distributions from Engineering Statistics
Handbook
- Green RE and
Brenner SE (2002). Bootstrapping and Normalization for Enhanced
Evaluations of Pairwise Sequence Comparison. Proceedings of the
IEEE 90:1834-1847.
- Brenner SE, Chothia C,
Hubbard TJP (1998). Assessing sequence comparison methods with
reliable structurally identified distant evolutionary
relationships. Proc Natl Acad Sci U S A
95:6073-6078.
- Pearson WR
(1995). Comparison of methods for searching protein sequence
databases Protein Science 4:1145-1160.
Lecture 8: February 10
Brenner SE (1999). Errors in
genome annotation. Trends in Genetics 15:132-3.
Ashburner M, Ball CA,
Blake JA, Botstein D, Butler H et al. (2000). Gene Ontology: tool
for the unification of biology. Nature Genetics 25:25-29.
Eisen JA (1998). Phylogenomics:
improving functional predictions for uncharacterized genes by
evolutionary analysis. Genome Research 8:163-7.
Lecture 9: February 15
DEKM Chapter 6
Gonnet GH, Korostensky C, Benner S
(2000). Evaluation measures of multiple sequence alignments.
Journal of Computational Biology 7:261-276.
Lecture 10: February 17
Thompson JD, Higgins DG, and
Gibson TJ (1994). CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.
Nucleic Acids Research 22:4673-4680.
Notredame C, Higgins DG and
Heringa J (2000). T-Coffee: A novel method for fast and accurate
multiple sequence alignment. Journal of Molecular Biology
302:205-217.
Lecture 11: February 22
Katoh K, Misawa K, Kuma K and Miyata T
(2002). MAFFT: a novel method for rapid multiple sequence alignment
based on fast Fourier transform. Nucleic Acids Research
30:3059-3066.
NEW Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32:1792-7.
NEW Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15:330-40.
Lecture 12: February 24
Schaffer AA, Aravind L, Madden
TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, et
al. (2001). Improving the accuracy of PSI-BLAST protein database
searches with composition-based statistics and other refinements.
Nucleic Acids Research 29:2994-3005.
Altschul SF, Madden TL,
Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997). Gapped
BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Research 25:3389-3402.
Lecture 13: March 1
Delcher AL, Phillippy A,
Carlton J, Salzberg SL (2002). Fast algorithms for large-scale
genome alignment and comparison. Nucleic Acids Research
30(11):2478-83.
Brudno M, Do CB, Cooper GM,
Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S; NISC Comparative
Sequencing Program (2003). LAGAN and Multi-LAGAN: efficient tools
for large-scale multiple alignment of genomic DNA. Genome
Research 13(4):721-31.
Brudno M, Malde S, Poliakov
A, Do CB, Couronne O, Dubchak I, Batzoglou S (2003). Glocal
alignment: finding rearrangements during alignment.
Bioinformatics 19 Suppl 1:i54-62.
Lecture 14: March 3
Schwartz S, Zhang Z,
Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W (2000).
PipMaker--a web server for aligning two genomic DNA sequences.
Genome Research 10(4):577-86.
Mayor C, Brudno M, Schwartz
JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I (2000). VISTA:
visualizing global DNA sequence alignments of arbitrary length.
Bioinformatics 16(11):1046-7.
Wheelan SJ, Church DM,
Ostell JM (2001). Spidey: a tool for mRNA-to-genomic
alignments. Genome Research 11(11):1952-7.
Kent WJ (2002). BLAT--the
BLAST-like alignment tool. Genome
Research 12(4):656-64.
Lecture 17: March 15
Stormo GD (2000). DNA binding sites:
representation and discovery. Bioinformatics 16:16-23.
Bailey TL and Elkan C
(1994). Fitting a mixture model by expectation maximization to
discover motifs in biopolymers. Proc Int Conf Intell Syst Mol
Biol 2:28-36.
Lawrence CE,
Altschul SF, Boguski MS, Liu JS, Neuwald AF and Wootton JC
(1993). Detecting subtle sequence signals: a Gibbs sampling
strategy for multiple alignment.
Science 262:208-14.
Lecture 18: March 17
DEKM Chapter 3
Lecture 19: March 29
DEKM Chapter 5
Sonnhammer EL, Eddy SR,
Durbin R (1997). Pfam: a comprehensive database of protein domain
families based on seed alignments. Proteins 28(3):405-20.
Lecture 21: April 5
Burge CB, Karlin S
(1998). Finding the genes in genomic DNA. Curr Opin Struct
Biol 8(3):346-54.
Burge C, Karlin S
(1997). Prediction of complete gene structures in human genomic
DNA. J Mol Biol 268(1):78-94.
Reese MG, Kulp D, Tammana H,
Haussler D (2000). Genie--gene finding in Drosophila
melanogaster. Genome Res 10(4):529-38.
Kulp D, Haussler D, Reese MG,
Eeckman FH (1996). A generalized hidden Markov model for the
recognition of human genes in DNA. Proc Int Conf Intell Syst
Mol Biol 4:134-42.
Alexandersson M,
Cawley S, Pachter L (2003). SLAM: cross-species gene finding and
alignment with a generalized pair hidden Markov model. Genome
Res 13(3):496-502.
|