Last modified on 29 August 2008.
The alignments provided at the CRW Site use certain special characters to indicate structural annotation. These characters are defined here.
We also address the issues of DIVIDER lines and other rare anomalies in the alignments.
- Character Case
- Structural Notation Characters
- The IUPAC-IBC Ambiguity Codes for Nucleotides
- Numbers in Sequences?
- DIVIDER Sequences
- Everything Else
|lowercase||Lesser confidence in alignment of that sequence (DEFAULT for newly-aligned sequences).|
|UPPERCASE||Greater confidence in alignment of that sequence.|
|-||DASH||Deletion (relative to at least one other sequence in the alignment).|
|. ~||PERIOD, TILDE||Region not sequenced; thus, presence or absence of any nucleotide is uncertain. Typically appear at the ends of sequences.||Not equivalent to dash.|
||||PIPE||Discontinuity in helix. Used to indicate both irregularities within helices and boundaries between helices.||Treat as a dash for analysis.|
|( )||PARENTHESES||Enclose hairpin loops. Some "hairpin loops" may contain complex structure elements, such as pseudoknots.||Treat as a dash for analysis.|
|ANGLE BRACKETS, SQUARE BRACKETS||Enclose variable regions, with respect to the reference sequence for the molecule. Generally, the regions aligned in square brackets are more conserved and better aligned than those in angle brackets.||Treat as a dash for analysis.|
|//||DOUBLE SLASH||Marks a break in the backbone (i.e., the two nucleotides separated by the "//" are not covalently connected).||Treat as a dash for analysis.|
|+||PLUS SIGN||Marks the insertion position of an intron sequence.||Treat as a dash for analysis.|
|=||EQUALS SIGN||Internal use only (a workaround for an AE2 feature).||Treat as a dash for analysis.|
|*||ASTERISK||Deprecated internal use character.||Treat as a dash for analysis.|
|Meaning||G||A||C||U or T||T or U||G or A
|T, U or C
|A or C
|G, T or U
|G or C
|A, T or U
|A, C, T or U||G, T, U or C||G, A, T or U||G, C or A||G, A, T, U or C|
|Complement||[H]||[B]||[D]||[V]||[V]||[Y]||[R]||[K]||[M]||[W]||[S]||[G]||[A]||[C]||[U or T]||n/a|
The AE2 alignment editor (used by the CRW Project) defaults to disallowing nucleotide edits, but does not prevent the insertion of numbers into a sequence. Users should treat number characters as dashes when they appear (or discard the affected sequence if translating to dashes is not possible).
Sequences that are marked as DIVIDERs are intended to assist with visual inspection of alignments by grouping related sets of sequences together. For rRNA alignments, the DIVIDERs typically delimit taxonomic groups (using the NCBI Taxonomy). For intron alignments, we subdivide by both subgroup and intron position. These DIVIDER sequences should not be used for analysis.
Despite our best efforts, we don't manage to catch every odd entry. Thus, should you encounter any other anomalies in a sequence from these alignments, please contact us, using a Subject: line of "Alignment Issues." Your feedback 1) will help us to keep the alignments and this page useful to the scientific community and 2) is greatly appreciated.