Topics in Computational Biology and Genomics

Topics in Computational Biology and Genomics

BioE/MCB/PMB c146/c246, Spring 2005

Main

Instructors

Information
and syllabus

Lecture Notes

Scribes/Readers

Readings

Lecture 1: January 18
no readings

Lecture 2: January 20
Pearson W (2000). Protein sequence comparison and protein evolution (This is the ISMB tutorial.)
Fitch WM (2000). Homology: a personal view on some of the problems. Trends in Genetics 16:227-31.
Winter WP, Walsh KA and Neurath H (1968). Homology as applied to proteins. Science 162:1433.
Fitch WM (1970). Distinguishing homologous from analogous proteins. Systematic Zoology 19:99-113. (For now, just read pages 99-102, 112-113.)

Lecture 3: January 25

Start reading chapter 2 of the Durbin, Eddy, Krogh & Mitchison (DEKM) book.

Finish reading pages 103-111 of the Fitch article. Focus on understanding principles, but not the details.

Continue reading the Pearson ISMB tutorial.

Lecture 4: January 27
Finish reading DEKM sections 2.1, 2.2, 2.3, 2.4

Lecture 5: February 1
Henikoff S and Henikoff JG (1992). Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 89:10915-19.
Dayhoff MO, Schwartz RM and Orcutt BC (1978). A model of evolutionary change in proteins.
Yu YK, Wootton JC, Altschul SF. 2003. The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A. 100:15688-93.
NEW: Altschul SF (1998). Generalized affine gap costs for protein sequence alignment. Proteins 32:88-96.
NEW: Altschul SF (1991). Amino acid substitution matrices from an information theoretic perspective. J Mol Biol 219:555-565.

Lecture 6: February 3
DEKM section 2.5
Gallison F (2000). The Fasta and Blast programs

Lecture 7: February 8
DEKM section 2.7
Pagni M and Jongeneel CV (2001). Making sense of score statistics for sequence alignments. Briefings in Bioinformatics 2:51-67

OPTIONAL READINGS

Lecture 8: February 10
Brenner SE (1999). Errors in genome annotation. Trends in Genetics 15:132-3.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25:25-29.
Eisen JA (1998). Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research 8:163-7.

Lecture 9: February 15
DEKM Chapter 6
Gonnet GH, Korostensky C, Benner S (2000). Evaluation measures of multiple sequence alignments. Journal of Computational Biology 7:261-276.

Lecture 10: February 17
Thompson JD, Higgins DG, and Gibson TJ (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22:4673-4680.
Notredame C, Higgins DG and Heringa J (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302:205-217.

Lecture 11: February 22
Katoh K, Misawa K, Kuma K and Miyata T (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30:3059-3066.
NEW Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32:1792-7.
NEW Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15:330-40.

Lecture 12: February 24
Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, et al. (2001). Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Research 29:2994-3005.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389-3402.

Lecture 13: March 1
Delcher AL, Phillippy A, Carlton J, Salzberg SL (2002). Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Research 30(11):2478-83.
Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S; NISC Comparative Sequencing Program (2003). LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Research 13(4):721-31.
Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S (2003). Glocal alignment: finding rearrangements during alignment. Bioinformatics 19 Suppl 1:i54-62.

Lecture 14: March 3
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W (2000). PipMaker--a web server for aligning two genomic DNA sequences. Genome Research 10(4):577-86.
Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I (2000). VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16(11):1046-7.
Wheelan SJ, Church DM, Ostell JM (2001). Spidey: a tool for mRNA-to-genomic alignments. Genome Research 11(11):1952-7.
Kent WJ (2002). BLAT--the BLAST-like alignment tool. Genome Research 12(4):656-64.

Lecture 17: March 15
Stormo GD (2000). DNA binding sites: representation and discovery. Bioinformatics 16:16-23.
Bailey TL and Elkan C (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2:28-36.
Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF and Wootton JC (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208-14.

Lecture 18: March 17
DEKM Chapter 3

Lecture 19: March 29
DEKM Chapter 5
Sonnhammer EL, Eddy SR, Durbin R (1997). Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28(3):405-20.

Lecture 21: April 5
Burge CB, Karlin S (1998). Finding the genes in genomic DNA. Curr Opin Struct Biol 8(3):346-54.
Burge C, Karlin S (1997). Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):78-94.
Reese MG, Kulp D, Tammana H, Haussler D (2000). Genie--gene finding in Drosophila melanogaster. Genome Res 10(4):529-38.
Kulp D, Haussler D, Reese MG, Eeckman FH (1996). A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134-42.
Alexandersson M, Cawley S, Pachter L (2003). SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13(3):496-502.