Substituent Notation

Monosaccharides often carry substituents, which form an additional level of complexity in monosaccharide notation. Substituents are treated in various ways in the different notations. In GlycoCT, for instance, all substituents are handled as separate residues, while e.g. the notation includes most substituents in the monosaccharide name. Sometimes, especially in BCSDB residue names, the substituents are split, i.e. part of the substituent is included in the monosaccharide name, while another part of the same substituent is regarded as a separate residue. For example, of an "n-acetyl" substituent, the amino part is represented by an "N" in the BCSDB monosaccharide name, while the acetyl part is added as a separate "Ac" residue.

In some notations, more than one name is used for the same substituent. In some cases this is necessary to distinguish between different linkage types (see below), but often this introduces ambivalence. To be able to read the various names but on at the same time to generate unique names, MonosaccharideDB contains manually curated alias lists for substituent names. These lists contain only one primary alias per notation scheme and linkage type, which is used by the encoder routines to generate the residue names. In addition, various secondary alias names can be present, which are used by the importer routines when parsing a residue name into the internal representation.

Linkage Type

Substituents can be linked to the monosaccharide basetype by a number of linkage types, which are listed in the subsequent table:

H_AT_OHA standard O-linked substituent, i.e. the substituent replaces the hydrogen of an OH group.
DEOXYThe substituent is linked directly to the basetype backbone by replacing the OH group.
H_LOSEThe substituent is linked directly to the basetype backbone by replacing the hydrogen atom.
R_CONFIGThe substituent is linked directly to the basetype backbone by replacing a hydrogen atom at a terminal position, which would be non-chiral without the substituent, resulting in an R-configuration of the carbon.
S_CONFIGSame as R_CONFIG, but resulting in an S-Configuration of the carbon.

Apart from GlycoCT and the MonosaccharideDB internal notation, the linkage type is not stated explicitly. Instead, it is implied in the substituent's name. In CarbBank notation, for example, a methyl residue is called "Me" or "OMe" if it is linked with an H_AT_OH linkage, while "CMe" is used to denote a methyl that is linked via an H_LOSE linkage. Therefore, the synonyms list on the notation tab of the MonosaccharideDB substituent entry pages lists the linkage type that is implied in the alias names.