Last modified on 29 August 2008.

The alignments provided at the CRW Site use certain special characters to indicate structural annotation. These characters are defined here.

We also address the issues of DIVIDER lines and other rare anomalies in the alignments.


Character Case:

Case Description
lowercase Lesser confidence in alignment of that sequence (DEFAULT for newly-aligned sequences).
UPPERCASE Greater confidence in alignment of that sequence.

Structural Notation Characters:

Char(s) Char Name(s) Description Analysis
- DASH Deletion (relative to at least one other sequence in the alignment).  
. ~ PERIOD, TILDE Region not sequenced; thus, presence or absence of any nucleotide is uncertain. Typically appear at the ends of sequences. Not equivalent to dash.
| PIPE Discontinuity in helix. Used to indicate both irregularities within helices and boundaries between helices. Treat as a dash for analysis.
( ) PARENTHESES Enclose hairpin loops. Some "hairpin loops" may contain complex structure elements, such as pseudoknots. Treat as a dash for analysis.
< >
[ ]
ANGLE BRACKETS, SQUARE BRACKETS Enclose variable regions, with respect to the reference sequence for the molecule. Generally, the regions aligned in square brackets are more conserved and better aligned than those in angle brackets. Treat as a dash for analysis.
// DOUBLE SLASH Marks a break in the backbone (i.e., the two nucleotides separated by the "//" are not covalently connected). Treat as a dash for analysis.
+ PLUS SIGN Marks the insertion position of an intron sequence. Treat as a dash for analysis.
= EQUALS SIGN Internal use only (a workaround for an AE2 feature). Treat as a dash for analysis.
* ASTERISK Deprecated internal use character. Treat as a dash for analysis.

The IUPAC-IBC Ambiguity Codes for Nucleotides:

Reference: Cornish-Bowden A. (1985). Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acid Research 13:3021-3030.
Ratio 1:1 1:2 1:3 1:4
Symbol G A C U T R Y M K S W H B D V N
Meaning G A C U or T T or U G or A
T, U or C
A or C
G, T or U
G or C
A, T or U
A, C, T or U G, T, U or C G, A, T or U G, C or A G, A, T, U or C
Complement [H] [B] [D] [V] [V] [Y] [R] [K] [M] [W] [S] [G] [A] [C] [U or T] n/a

Numbers in Sequences?

The AE2 alignment editor (used by the CRW Project) defaults to disallowing nucleotide edits, but does not prevent the insertion of numbers into a sequence. Users should treat number characters as dashes when they appear (or discard the affected sequence if translating to dashes is not possible).

DIVIDER Sequences:

Sequences that are marked as DIVIDERs are intended to assist with visual inspection of alignments by grouping related sets of sequences together. For rRNA alignments, the DIVIDERs typically delimit taxonomic groups (using the NCBI Taxonomy). For intron alignments, we subdivide by both subgroup and intron position. These DIVIDER sequences should not be used for analysis.

Everything Else:

Despite our best efforts, we don't manage to catch every odd entry. Thus, should you encounter any other anomalies in a sequence from these alignments, please contact us, using a Subject: line of "Alignment Issues." Your feedback 1) will help us to keep the alignments and this page useful to the scientific community and 2) is greatly appreciated.