Basetype Notation
The basetype of a monosaccharide describes residue size, the stereochemistry (incl. absolute configuration and anomeric) and the ring closure. In addition, it may contain a number of core modifications.
Absolute Configuration
For the use of the configurational symbols and prefixes, see the IUPAC definition 2-Carb-4.
Anomeric
For the definition of the anomeric, see the IUPAC definition 2-Carb-6.
Core Modifications
The monosaccharide basetype can feature a number of core modifications. Several of them result in achiral positions and thus influence stereochemistry.
The subsequent table summarizes the core modifications that are used in MonosaccharideDB.
Name | Description | Valence | Comment |
DEOXY | Deoxygenation of a position: The OH group is removed and replaced by a hydrogen atom. | 1 |
|
KETO | A carbonyl group in the open chain version of a monosaccharide. This modification is omitted if it is only present at position 1 (standard aldose). | 1 |
|
ALDI | Alditol: Reduction of the aldehyde group to CH2OH. | 1 |
|
ACID | Carboxyl (COOH) group. | 1 |
|
EN | Double bond in the basetype backbone. This modification implies that - unless explicitly stated with a deoxy modification - hydroxyl groups are preserved. | 2 |
|
ENX | Double bond in the basetype backbone with unknown deoxygenation pattern. | 2 |
|
YN | Triple bond in the basetype backbone. | 2 |
|
ANHYDRO | Intramolecular anhydride. | 2 | |
SP | Triple bond to a substituent. | 1 |
|
SP2 | Double bond to a substituent. | 1 |
|
GEMINAL | Loss of stereochemistry due to identical substituents with DEOXY and H_LOSE linkage types at a single position. | 1 |
|
Stereochemistry
Many monosaccharides only differ in the stereochemistry of the basetype backbone carbons. In most notations, this stereochemistry is denoted using the IUPAC stem type ("parent") names.
In addition to this indirect description of the stereochemistry based on parent names, MonosaccharideDB features a Stereocode field, which contains a direct description of the stereochemistry.
The stereocode is a String that contains one character for each carbon of the basetype backbone.
"1" indicates that the corresponding carbon is in L-Configuration (OH-group pointing left in Fischer projection), "2" marks a D-Configuration (OH-group pointing right in Fischer projection), and "0" is used to describe achiral positions.
D-Glucose in open chain form, for example, has the stereocode "021220":
When a ring is formed from this, the anomeric center (position 1 in this example) becomes a chiral atom and thus the stereocode of that position is adjusted depending on the anomer. For example, the stereocode of β-D-Glcp is "121220", that of α-D-Glcp is "221220".
In case MonosaccharideDB is queried with a residue name, in which the absolute configuration is not given (e.g. "a-Fucp" in CarbBank notation), the stereocode is given based on the D-Configuration, and "1" and "2" are replaced by "3" and "4", respectively. Thus, the stereocode "443340" is assigned to the CarbBank residue "a-Fucp".
Extended Stereocode
The stereocode described above only distiguishes between D- and L-configuration and achiral positions. The latter, however, can be caused by various core modificartions or simply by a terminal position. The "Extended Stereocode" takes this into account. While symbols for chiral positions remain the same as in the standard stereocode, various symbols are used instead of "0" for achiral positions, depending on the cause of the achirality:
Symbol | Description |
h | "head or tail group", CH2OH group at a terminal position |
d | DEOXY core modification at non-terminal position |
m | DEOXY core modification at terminal position ("methyl" group) |
a | ACID core modification |
o | aldehyde group |
k | KETO core modification at non-terminal position |
e | EN + deoxy core modifications |
n | EN core modification without DEOXY core modification |
E | EN core modification with unknown deoxygenation status |
y | YN core modification at non-terminal position |
s | SP2 core modifation |
t | SP core modifiation (always at terminal position) |
1 | "L-Configuration" carbon atom |
2 | "D-Configuration" carbon atom |
x | unknown configuration (D or L) carbon atom |