================================================================================ README file for isoforms.txt isoforms.txt contains descriptions of canonical and alternative isoforms of refseq-coding genes, utilizing an overloaded header line. The basic format of the fasta descriptor is as follows: > (ISOFORM ID) exons{...} spl{...} aspl{...} ptc{...} ISOFORM ID: a few examples : NM_000120-CDS.canonical NM_000029-CDS.alt-isoform-1 NM_000098-CDS.alt-isoform-1.NMD-CANDIDATE NM_000066-CDS.alt-fragment-2 NM_000036-CDS.alt-fragment-1.NMD-CANDIDATE The isoform id consists of up to three indicators fields: 1) RefSeq identifier followed by '-CDS' to indicate that only the coding region is described. ex: NM_000120-CDS 2) canonical-seq - indicates that the corresponding isoform is spliced identically to the canonical refseq entry. ex: canonical-seq or alt-isoform-# - indicates that the entry is the #th alternative isoform of the refseq-coding gene of interest. ex: alt-isoform-1 or alt-fragment-# - indicates that entry is the #th alternative isoform of the refseq-coding gene of interest _and_ that the reading frame for this isoform could not be determined unambigiously (i.e. the most 5' exon of this isoform does not overlap with a coding exon of the RefSeq gene) ex: alt-fragment-2 3) The first two fields are followed by '.NMD-CANDIDATE' if the isoform was found to contain a premature termination codon. exons field: contains the number of exons in the isoform followed by a colon and the exon boundaries in parenthesis, where the range of an exon is indicated by a '-' between coordinates and the range of an intron is indiacted by a '.' between coordinates. ex: exons{19:(1-100.1066-1194.2043-2171.3413-3552.3851-3928.4012-4090.4628-4753.5126-5210.5323-5424.6024-6108.6383-6556.6986-7055.7218-7328.7706-7919.9147-9280.10943-11031.11183-11266.11930-12016.12377-12424)} spl field: contains the number of splice sites in the isoform followed by a colon and a two field descriptor of the splice site. ex: spl{3:(REF-E|152.3776,PS1.C|3856.13072,REF-E|14376.16321)} aspl{1:(PS1.C|3856.13072:1)} splice descriptors and their meanings: REF canonical splice in canonical RefSeq isoform REF-E canonical splice in alternative isoform REF-S splice that aligned within 7 nt of the actual position of the canonical splice, most likely as a result of poor alignments around the splice boundary. these are not thought to be indicative of alternative splicing. PS(#) a perfect skip alternative splice # indicates the number of exons of the canonical isoform that are skipped ALT-(3|5|R)D(3|5|R)A alternative splice with a single alternative splice site and a canonical site 3 and 5 indicate that the alternative donor or acceptor is either 3' or 5' of the corresponding canonical splice donor or acceptor R indicates that the alternative donor or acceptor matches the canonical donor or acceptor CON-(3|5|R)D(3|5|R)A alternative splice with two alternative splice sites EINC-(3|5|R)D(3|5|R)A exon inclusion (complete new exon is added from intron sequence in the canonical isoform) Trailing ".C's or ".N"'s indicate: .C : the splice site following the alternative splice is covered by ESTs or .N : the splice site following the alternative splice is covered by ESTs The coordinates of splice donor and acceptor sites are listed "donor"."acceptor" ex: 389.13072 aspl field: this field is identical to the spl field except that only alternative (and not canonical) splices are described. in addition, the number of ESTs covering the alternative splice sites is listed. ex: aspl{1:(ALT-5DRA.C|66.2644:1)} ptc field: this field has four pieces of information separated by commas 1) coordinate of premature termination codon in translated amino acid sequence 2) the number of splice pairs beyond the premature termination codon that are covered by ESTs 3) splice descriptor corresponding to the alternative splice that introduces the premature termination codon 4) number of ESTs covering the splice that introduces the premature termination codon ex: ptc{210,5,ALT-RD3A.C,1} * note that isoform that do not contain premature termination codons will have an empty ptc field and canonical isoforms will have an empty aspl field ================================================================================